I have many machines (20+) connected in a network. each machine accesses a central database, queries it, processes the information queried, and then writes the results to files on its local hard drive.
Following the processing, I'd like to be able to 'grab' all these files (from all the remote machines) back to the main machine for storage.
I thought of three possible ways to do so:
(1) rsync to each remote machine from the main machine, and 'ask' for the files
(2) rsync from every remote machine to the main machine, and 'send' the files
(3) create a NFS share on each remote machine, to which the main machine can access and read the files (no 'rsync' is needed in such a case)
Is one of the ways better than others? are there better ways I am not aware of?
All machines use Ubuntu 10.04LTS. Thanks in advance for any suggestions.
You could create one NFS share on the master machine and have each remote machine mount that. Seems like less work.
Performance-wise, it's practically the same. You are still sending files over a (relatively) slow network connection.
Now, I'd say which approach you take depends on where you want to handle errors or irregularities. If you want the responsibility to lie on your processing computers, use rsync back to the main one; or the other way round if you want the main one to work on assembling the data and assuring everything is in order.
As for the shared space approach, I would create a share on the main machine, and have the others write to it. They can start as soon as the processing finishes, ensure the file is transferred correctly, and then verify checksums or whatever.
I would prefer option (2) since you know when the processing is finished on the client machine. You could use the same SSH key on all client machines or collect the different keys in the authorized_keys file on the main machine. It's also more reliable if the main machine is unavailable for some reason, you can still sync the results later while in the NFS setup the clients are blocked.