As a follow-up to the LXD article I wrote a while back, I thought I would provide an update on LXD in a production environment.
First a recap of the key points from the previous article which are the focus of this one. The initial deployment of LXD was on an Ubuntu 16.04 VM running under VirtualBox, where I chose to use ZFS filesystem as opposed the native structure so we can leverage the performance of the Copy On Write and Snapshot features.
My early fears of the ZFS pool becoming a VirtualBox disk image problem due to its excessive thrashing of disk did in fact come true and generated large and unwieldy VM disk images. Frequent rotations of VirtualBox snapshots to the keep host disk space under control soon led to inefficient operational practices.
I must point out that this isn’t due to any deficiencies in VirtualBox, LXD or ZFS, but purely to the side effect of heavy use as the development team found the containers really useful for improving the development cycle. Increased use led to increased impact on VirtualBox and the host OS storage.
All well and good for a while
Long before the situation led to bare metal migration, the signs of continued growth signalled the need to head off any bottle necks with a addition of extra hardware resources being made available to the VM. This flexibility is as you would expect from virtualisation, an easy process.
Still, the extra performance made the VM a victim of its own success and snapshot disk usage was becoming too much of a hassle to manage. Coupled with the fact that the server was hosting a VM in a constrained way, and that we were considering more use of QEMU too (that’s a story for another article) which would introduce issues with the current infrastructure a decision was made to take another step with the LXD migration: break it out of a VM and place on bare metal.
Most of us who go through the virtualisation process tend to take bare metal servers and virtualise them so we can cram as many machines in a box as possible. Fair enough. Plenty of documentation on this process and it more-or-less works. In this case I was planning the opposite. Could it be done?
The answer is in fact “yes of course”! VirtualBox provides great tools for disk migration to and from a raw state. First however I had to contend with a rather large set of disk images on a live and critical machine. How long was this process going to take?
That answer was easy too, a long time, and I wasn’t quite sure if exporting a disk image would pull in all of its dependant linked images. Time was against me for the next ideal time window, so sticking to methods I knew that worked, albeit slowly, was going to have to do. That was really my only unknown. I’m glad about that. It’s one of the great plus points of Linux; being able to move an entire installed machine to another hard disk and/or hardware with very few issues.
Migration began over night while the devs weren’t around. The first steps were easy and scriptable though I kept an eye on them just in case. I wanted this to work first time otherwise I would have to wait for the next window.
With the VM shutdown, to flatten out all of those large snapshot disk images I used:
vboxmanage export <vm> -o lxd.ova --manifest --ovf10
This resulted in still a large file but it was just the one!
The reverse process to import the lxd.ova VM was then performed on a system where there was absolutely heaps of space. Why? Well think about it, that VM once imported will contain a lot of data, then extraction of that VM disk image into a raw format which would be the native disk size is going to take even more space. As a helpful hint, vboxmanage did in fact tell me during testing that it wouldn’t be able to fit the raw form onto one of my targets, so you don’t need to worry about having an unexpected disk full hours after you commit to this!
Anyway, with the VM imported, and the single disk image ready to extract (I didn’t need to start the VM up of course), conversion to raw format was done with:
vboxmanage clonehd disk.vmdk disk.raw --format RAW
I prepared myself for a very long wait as the disk image expanded to the raw physical disk size, and a couple of hours it finished without error.
Once the ice age passed I prepared for the next and most scary stage, writing the raw disk image to physical disk. I don’t need to tell you how dangerous this is do I? Another reason I did this on a different machine with scratch disks to reduce the risk of complete disaster.
A check, and double check for the destination drive name and then I used:
dd if=disk.raw of=<hdd dev> bs=8MB
Of course dd being very very quiet isn’t the most helpful, and while fine for a small job, this was going to take a long time. It’s helpful to remember that a:
kill -usr1 <dd pid>
will cause dd to spit out its current position. Combining that with a watch made the experience as fun as watching paint dry.
With this written out we were on the home stretch. Next I rebooted and started up the gpartd tools to extend the file system (yes putting it on a much larger hard disk too as I didn’t fancy going through this again).
Another reboot and almost live; unfortunately of course for one aspect, device names. I realised that we were coming from a VM which had different network interfaces, and so required an edit of /etc/network/interfaces to fix the correct ones and the flick of DHCP entries because of MAC address change.
And there we had a VM broken out onto bare metal. A much happier and easier to manage server.
To pre-empt the devs filling up the ZFS pool of the LXD containers thought it would be a good idea to extend the ZFS pool even though they are barely touching it due to compression being enabled. ZFS is not something I tinker with often so would be a good exercise to see what I needed to do.
Some research prior to the migration turned out quite a few conflicting ways to deal with zpool extension, only one of which worked for me in testing and was the solution I went with. A simple one too which I liked, with the idea to add another (virtual disk) with the increased space, pause and let the zpool mirror to it, and then mark the existing smaller one as failed.
Before we add more space there was an important setting I had to to make:
zpool set autoexpand=on lxd
This allows the zpool to expand to all the space available otherwise it needs to be explicitly set each time.
I created the new virtual disk to the size we want, say 500GB in this example:
truncate -s 500G /var/lib/lxd/newzfs.img
Add it to the zpool:
zpool attach lxd /var/lib/lxd/zfs.img /var/lib/lxd/newzfs.img
ZFS will saw it as a degraded disk and started the mirror (resilver) process which I monitored via:
Once that process has completed (and it was very very slow) I then could remove the original disk with:
zpool detach lxd /var/lib/lxd/zfs.img
A quick check on the status with:
Revealed a new and expanded system. Should the space increase dramatically and if performance suffers (the use of virtual drive files are not a great solution) I think I will switch to using ZFS on raw drives which should be a similar process with switching the files to disk name.
Now we have a wonderful fast and most importantly non-VM system with unmanageable disk images.