Hyper-V 3.0, Server 2012 deduplication (yay), and VHDX files

As you may have heard, Windows Server 2012 has hit RTM and should be available to you technetters, MSDNers and Volume License holders in about a week or so.  This post will be a work in progress documenting the curious issues I’ve encountered while working with the exciting new deduplication feature of Server 2012 for the past 6 months or so, moving from Beta to RC and now (sadly confirmed that the issue is still here) RTM.

I don’t know about you, but pretty much all of our servers these days are virtualized (we are a Hyper-V shop and I’m curious to see how this breaks down over in the VMDK camp), and the idea of “free” deduplication sounded like a fantastic boon for our file servers.  Less storage space being eaten up in the guests means smaller VHD (now VHDX) files, and less storage space eaten up across the board.  If only it were so!

First let me say that I have tested and confirmed this issue on a variety of hardware and configurations (Dell, HP, SAS, SATA, SAN, local storage, you name it) – so this definitely appears to be something inherent to Microsoft’s implementation.  What I have NOT done is test this on fixed VHDX files as they pose two problems:

1) they can’t be shrunk!

2) if you start at your smaller target size, they aren’t large enough to hold your initial data set prior to the dedup pass!

Like many of you, we use DFS Replication (and use it heavily!).  So a new file server build consists of getting the OS and associated DFS (and now dedup) roles and features installed, then letting DFSR seed the data.  With a dynamic VHDX, even if we try to stay on top of it and run dedup passes multiple times during the initial seed, the VHDX files ultimately end up quite a bit larger than the deduped data set.  This is certainly to be expected to a certain degree, but what I was not expecting was my subsequent results when trying to reclaim all that space.

An example of one of our datasets is a 250 GB share that, after being fully deduped, shrunk down to 150 GB of used space being reported by the guest OS (w00t!).  This is a fantastic savings.  Unfortunately, our VHDX file after all this was sitting right around 275 GB (some overhead for DFSR staging and other odds and ends is to be expected).  So I shut down the guest and used the Hyper-V management console to attempt to shrink the VHD.  It worked for about 7 seconds or so and completed.  As you can imagine, the file was no smaller.  After digging up a couple of suggestions on TechNet forums I mounted the VHDX on the host (making sure that dedup was installed on the host first so it could properly work with the deduped volume) and defragged it.  I then used the optmize-vhd commandlet in “full” mode rather than the gui to shrink the file – and it did shrink quite a bit down to 222 GB.  A big improvement over 275, however not so great considering the actual data is only 150 GB and the pre-dedup dataset was only 250!  A second defrag and optimize pass squeezed another 7 GB of blood from the stone.  Obviously the VHDX file’s lack of “shrinkability” makes the gains from deduplication moot, especially considering all the manual steps (which of course can be scheduled and scripted eventually).

Now having confirmed the same behavior under RTM with a couple of different server scenarios and datasets, my next plan is to attempt to take advantage of the fact that Windows Server Backup under 2012 is dedup-aware.  My hope is that WSB may be able to properly create a backup of the 150 GB dataset and restore it without bloating the VHDX file.  I will also attempt restoring to a fixed VHDX (although I’m not sure if WSB will accept a volume smaller than the backup volume was).  Stay tuned!

UPDATE 8/12: Well Windows Server Backup doesn’t appear to be able to do an optimized (dedup-aware) backup of my disks…  Tried on a couple of different servers and am getting the same error when I select the deduped volume for backup:

UPDATE 8/19: At this point I need to move the dedup project forward so my workaround has been this: Take one of the DFS replication partners (we have two at each site) and configure it with a fixed size VHDX, then replicate and dedup, enlarging the fixed VHDX manually as needed.  Once one is finished and we know the proper target size, the other peers are rebuilt with a fixed VHDX of that size, once again seeding and deduping as we go to prevent the virtual disk from filling up.  The process is rather manual and requires a bit of attention, but it isn’t that bad until a better solution can be found (the dedupes can easily be scripted/scheduled).