How to copy raw partition over the net

Earlier today I worked on migrating a D³ database server to a VMware ESX environment. The tool that we used for migration did a good job in converting the RHEL3 operating system and all the Linux filesystems, but failed to copy the D3’s raw data partition:

old-server:~# fdisk -l /dev/sda
[...]
/dev/sda12        10965     17816  55038658+  d3  Unknown

Now what? I didn’t have enough space neither on the source server nor on the destination VM to dump the partition contents to a file, copy across and reload back to /dev/sda12 on the VM. It had to be done online.

Gladly, SSH has the ability to run a command remotely and feed its standard output to some other program running locally. Using that feature it’s easy to copy the raw partition — simply run dd if=/dev/sda12 on the source server and dd of=/dev/sda12 on the destination VM. The first dd without any other parameters will print whatever it reads from /dev/sda12 on the old, source server to its standard output. The second dd, inversely, will write whatever it reads from the standard input down to /dev/sda12 on the new virtual machine. Glue it together with this ssh command:

new-server:~# ssh old-server "dd if=/dev/sda12" | dd of=/dev/sda12

That’s all sweet, but … dd doesn’t provide any progress tracking. In my case I had to transfer over 50GB of data and had only a vague idea how fast it goes. Should I wait? Or leave it overnight? Hmm, hard to tell.

Finally I came up with a simple solution for checking progress of dd — get a sample data, say 1kB, from a given offset on both the source and destination partition and compare their checksums:

old-server:~# dd if=/dev/sda12 bs=1k count=1 skip=5M | md5sum
7b10e9e1029c4c0f3901ee13db18a927  —

new-server:~# dd if=/dev/sda12 bs=1k count=1 skip=5M | md5sum
0f343b0931126a20f133d67c2b018a3b  —

OK, the checksums didn’t match. Yet. I kept re-running the command on the new server and as soon as it returned the same checksum as on the old one I knew it just copied 20GBs.

A side note here: I used skip=5MB and claim it copied 5GB — why’s that? Because skip= skips a given amount of ibs-sized records. In our example ibs=bs=1kB and therefore skip=5M skips 5 millions records, which means it skips to an offset 5GB in /dev/sda12. And reads one kilobyte from there.

Re-running the hashing command every minute or so manually is boring. Instead I wrote a little shell script to record the timestamp when the hashes at a given offset match:

#!/bin/sh
## check-progress.sh from http://hintshop.ludvig.co.nz/show/copy-raw-partition-over-net/
DEVICE=$1
OFFSET=$2
REQ_HASH=$3

if [ "${REQ_HASH}" = "" ]; then
   echo "Usage: $0 {device} {offset} {required-hash}"
   exit 1
fi

while (true) do
   HASH=$(dd if=${DEVICE} count=1 bs=1k skip=${OFFSET} 2>/dev/null | md5sum)
   HASH=${HASH:0:32}
   if [ ${HASH} = ${REQ_HASH} ]; then
      echo "Hashes match: $(date)"
      exit 0
   fi
   echo "Not yet..."
   sleep 30
done

To use it I needed a hash value from a given offset on the old-server and then run the script on the new-server:

new-server:~# ./check-progress.sh /dev/sda12 5M 7b10e9e1029c4c0f3901ee13db18a927
Not yet...
Not yet...
...
Hashes match: Sat Feb 14 11:22:33 NZDT 2009

Once it recorded the timestamp I set it up again with a new offset, say 6M, and waited. That way I was able to track the progress and approximately measure the transfer speed. Precise enough to realise it did about 8GB per 15 minutes. Then I knew I’ve had enough time to go and get some lunch.

Leave a Reply Cancel reply