Earlier today I worked on migrating a D3 database server to a VMware ESX environment. The tool that we used for migration did a good job in converting the RHEL3 operating system and all the Linux filesystems, but failed to copy the D3’s raw data partition:
old-server:~# fdisk -l /dev/sda [...] /dev/sda12 10965 17816 55038658+ d3 Unknown
Now what? I didn’t have enough space neither on the source server nor on the destination VM to dump the partition contents to a file, copy across and reload back to /dev/sda12 on the VM. It had to be done online.
Gladly, SSH has the ability to run a command remotely and feed its standard output to some other program running locally. Using that feature it’s easy to copy the raw partition — simply run dd if=/dev/sda12
on the source server and dd of=/dev/sda12
on the destination VM. The first dd
without any other parameters will print whatever it reads from /dev/sda12
on the old, source server to its standard output. The second dd
, inversely, will write whatever it reads from the standard input down to /dev/sda12
on the new virtual machine. Glue it together with this ssh
command:
new-server:~# ssh old-server "dd if=/dev/sda12" | dd of=/dev/sda12
That’s all sweet, but … dd
doesn’t provide any progress tracking. In my case I had to transfer over 50GB of data and had only a vague idea how fast it goes. Should I wait? Or leave it overnight? Hmm, hard to tell.
Finally I came up with a simple solution for checking progress of dd
— get a sample data, say 1kB, from a given offset on both the source and destination partition and compare their checksums:
old-server:~# dd if=/dev/sda12 bs=1k count=1 skip=5M | md5sum 7b10e9e1029c4c0f3901ee13db18a927 — new-server:~# dd if=/dev/sda12 bs=1k count=1 skip=5M | md5sum 0f343b0931126a20f133d67c2b018a3b —
OK, the checksums didn’t match. Yet. I kept re-running the command on the new server and as soon as it returned the same checksum as on the old one I knew it just copied 20GBs.
A side note here: I used skip
=5MB and claim it copied 5GB — why’s that? Because skip=
skips a given amount of ibs
-sized records. In our example ibs=bs=1kB
and therefore skip=5M skips 5 millions records, which means it skips to an offset 5GB in /dev/sda12
. And reads one kilobyte from there.
Re-running the hashing command every minute or so manually is boring. Instead I wrote a little shell script to record the timestamp when the hashes at a given offset match:
#!/bin/sh ## check-progress.sh from http://hintshop.ludvig.co.nz/show/copy-raw-partition-over-net/ DEVICE=$1 OFFSET=$2 REQ_HASH=$3 if [ "${REQ_HASH}" = "" ]; then echo "Usage: $0 {device} {offset} {required-hash}" exit 1 fi while (true) do HASH=$(dd if=${DEVICE} count=1 bs=1k skip=${OFFSET} 2>/dev/null | md5sum) HASH=${HASH:0:32} if [ ${HASH} = ${REQ_HASH} ]; then echo "Hashes match: $(date)" exit 0 fi echo "Not yet..." sleep 30 done
To use it I needed a hash value from a given offset on the old-server and then run the script on the new-server:
new-server:~# ./check-progress.sh /dev/sda12 5M 7b10e9e1029c4c0f3901ee13db18a927
Not yet...
Not yet...
...
Hashes match: Sat Feb 14 11:22:33 NZDT 2009
Once it recorded the timestamp I set it up again with a new offset, say 6M, and waited. That way I was able to track the progress and approximately measure the transfer speed. Precise enough to realise it did about 8GB per 15 minutes. Then I knew I’ve had enough time to go and get some lunch.