Scrap the SCP. How to copy data fast using pigz and nc Intermediate SQLHave you ever heard that the speed of the system is determined by its slowest component I am made painfully aware of that every time I do data migrations. I. e. it doesnt matter if you have 64 Gb of memory on either end if the majority of time is spent waiting for data to trickle across a slow 1 Gb network link. Watching data trickle for hours, while the rest of the system is doing nothing is a pretty frustrating experience. But limitations breed creativity so lately, Ive been experimenting with several different copy techniques to see if there is any way transfer speed can be improved, perhaps using some of the idle capacity to speed things up. Here is the short summary of my experiments transferring 10 Gb ORACLE data file across the WAN, which I summarized as a speed and effect comparison table. You can judge for yourself with all the usual caveats that your results will depend on your system configuration and will probably vary yada yada yada Method Transfer Time Network Capacity Used CPU Used Effective Rate scp 4 min 50 Mb per second 5 50 Mb per second bbcp 2 min 20 Mb per second 2 100 Mb per second ncp gzip 2 min 1 second 100 Mb per second 100 Mb per second ncp pigz 30 Gb of memory on either end if the majority of time is spent waiting for data to trickle across a slow 1 Gb network link. Watching data trickle for hours, while the rest of the system is doing nothing is a pretty frustrating experience. But limitations breed creativity so lately, Ive been experimenting with several different copy techniques to see if there is any way transfer speed can be improved, perhaps using some of the idle capacity to speed things up. Here is the short summary of my experiments transferring 1. Gb ORACLE data file across the WAN, which I summarized as a speed and effect comparison table. You can judge for yourself with all the usual caveats that your results will depend on your system configuration and will probably vary yada yada yada Method Transfer Time Network Capacity Used CPU Used Effective Rate scp 4 min 5. Mb per second 5 5. Mb per second bbcp 2 min 2. Mb per second 2 1. Mb per second ncp gzip 2 min 1 second 1. Mb per second 1. Mb per second ncp pigz 3. Mb per second 5. Mb per second ncp pigz parallel degree limited 1 min 1. Mb per second 2. Mb per second And here is the longer explanation if you are really interested Copying data using SCPTraditionally, many people are using scp to copy files between systems, something like scpu. The problem is scp is NOT very fast. I. e. regular speeds, achieved by scp are in the range of 2. Mbs per second. To put it in perspective, it takes from 4 to 1. Gb file between systems. Multiple it by, say, 8 files and now you are wasting 0. Which begs the question how can we do better Why SCP is slow by default. The first observation with scp is that even at the top of the range the transfer speeds are NOT approaching the true NIC capacity which, for 1 Gb NIC is slightly more than 1. Mbs per second. So, we should do much better if we are able to fill the pipe completely. Filling the pipe Remote copy with BBCPFilling the pipe is precisely what bbcp command does it opens multiple network streams and transfers a file in parallel, using most of the network capacity in the process. In my tests, bbcp consistently outperformed scp, reaching speeds of 1. Mbs per second and cutting transfer time by the factor of 2. There are, however, two problems with bbcp. First of all, its default syntax is pretty scary. I. e. in my example, it looked like this bbcp P1. Tssh x a I l U H bbcp u. But more importantly, using that much network for copy is dangerous as it does not leave much bandwidth for anything else on the host i. ORACLE connections by apps. Plus, it may affect other unrelated hosts if you happen to have multiple machines using the same network path and a slightly oversubscribed network. In other words, bbcp should be used only if you do not care whether database on the box is accessible and also do not share the hostrackrouters with anybody else. To be fair, you can use bbcp options to limit how much bandwidth it is using. But if you do that, the copy speed essentially reverts back to scp as it directly correlates to how much data you are pushing over the wire. Bottom line, bbcp not good, if your system is actually used Is there another alternative The magic bullet Compression. Yes. Apparently, ORACLE data files are pretty compressible. We can gzip them on the source, transfer 5 1. The problem, however is that instead of running a simple scp command, we need to run 3 commands on 2 separate systems Source g SourceTarget transfer, i. Target unzipwhich is a bit too complex if we just want to copy a bunch of files. Network streaming. Fortunately, this technique can be simplified and generalized by using network streaming tools. Here is an example of copying the same file using gzip and netcat. We still need to run 2 commands, but they are pretty simple SOURCE tar cf u. TARGET nc lt source host 8. Cnc here is a network streamer that sends data over to the wire on the sending end port 8. I ran many such copies and every single time md. Moreover, when something breaks such as when a certain DBA would run CtrlC on either end, this event is very visible you will recognize that an error has occurred and you need to re transfer. In most of my tests this combination of commands was even faster than bbcp giving me an additional 1. Mbps over the wire even scp puts 6. And, finally, parallelism. But we are still not done. Transfer speeds can be improved further if we are willing to use a bit of CPU on the source host. As you might know, gzip is a sequential single threaded application, but we also have a parallel zip, named rather expressively as pigz SOURCE tar cf u. TARGET nc lt source host 8. CPigz is a essentially a gzip, but can use multiple parallel streams to compressdecompress the data. If we replace gzip with pigz, we can achieve fantastical speeds and cut our transfer time again by the factor of 2 1. A few notes and observations. Bbcp compression. If you can fill network pipe completely i. Can we combine compression and multistream transfer for even faster speeds. As it happens, bbcp command has compress me option for input streams, so it seems a natural candidate here However as hard as I tried I couldnt make it work properly. In all of my tests, when bbcp compression was turned on, there was definite improvement in network utilization, but the transfer itself was dead slow much slower than that of the original scp. If anybody knows how to use bbcp compression efficiently, Ill appreciate the learning experience. Still, the rather straightforward workaround is to still use tarpigznc and just run several copies in parallel. Monitoring transfer progress with nc. Pigznc transfer might be significantly faster, but, it might not be the easiest to monitor. While scp has a nice progress bar, pigznc just gives you a blank screen for the entire duration of the transfer. Fortunately, it is very easy to correct if you drop in a pipe viewer tool within pigznc pipe. END print s pigz   nc l. GB 0 0. 0 1. 58. MBs                        7 ETA 0 0. Using one command for copy, instead of two. While sourcetransfer commands are not too complex to master, there are still two commands that you need to run. To make things easier, it makes sense to script them together to remove another advantage of scp ncpu. Finally, there is one advantage that scp still holds its transfer is secure while pigznc transfers data in clear text. So, if you are using unsecured networks, this option is probably not for you. Cheers,Maxym Kharchenko.