Rogue VM Sprawl? Really?
I keep hearing about the impending doom of (specifically) rogue VM sprawl — our infrastructure overrun with the unchecked proliferation of virtual machines running amok across our enterprises. Oh the horror!
Most of the examples use the consolidation of server VM's onto hosts as delivered by virtualization as their example.
I have to ask you though, given what it takes to spin up a VM on a platform such as VMware, how can you have a "rogue" VM sprawling its way across your enterprise!?
Someone — an authorized administrator — had to have loaded it into inventory, configured its placement on a virtual switch, and spun it up via VirtualCenter or some facsimile thereof depending upon platform.
That's the definition of a rogue? I can see where this may be a definitional issue, but the marketeers are getting frothed up over this very issue, whispering in your ear constantly about the impending demise of your infrastructure…and undetectable hypervisor rootkits, too. 🙂
It may be that the ease of which a VM *can* be spun up legitimately can lead to the overly-exhuberant deployment of VM's without understanding the impact this might have on the infrastructure, but can we please stop grouping stupidity and poor capacity planning/impact analysis with rogueness? They're two different things.
If administrators are firing off VMs that are unauthorized, unhardened, and unaccounted for, you have bigger problems than that of virtualization and you ought to consider firing them off.
The inventory of active VMs is a reasonably easy thing to keep track of; if it's running, I can see it.
I know "where" it is and I can turn it off. To me, the bigger problem is represented by the offline VMs which can live outside that inventory window, just waiting to be reactivated from their hypervisorial hibernation.
But does that represent "rogue?"
You want an example of a threat which represents truly rogue VM "sprawl" that people ought to be afraid of? OK, here's one, and it happened to me. I talk about it all the time and people usually say "Oh, man, I never thought of that…" usually because we're focused on server virtualization and not the client side.
We take distributed sniffer traces. Trackback through firewall, IDS and IPS logs and isolate the MAC address in the CAM tables of the 96 port switch to which the offending DHCP server appears to be plugged, although we can't ping it.
My analyst is now on a mission to unplug the port, so he undocks his laptop and the alarms silence.
I look over at him. He has a strange look on his face. He docks his laptop again. Seconds later the alarms go off again.
The Culprit: Turns out said analyst was doing research at home on our W2K AD/DHCP server hardening scripts. He took our standard W2K server image, loaded it as a VM in VMware Workstation and used it at home to validate funtionality.
The image he used had AD/DHCP services enabled.
When he was done at home the night before, he minimized VMware and closed his laptop.
When he came in to work the next morning, he simply docked and went about reading email, forgetting the VMW instance was still running. Doing what it does, it started responding to DHCP requests on the network.
Because he was using shared IP addresses for his VM and was "behind" the personal firewall on his machine which prohibits ICMP requests based on the policy (but obviously not bootp/DHCP) we couldn't ping it or profile the workstation…
Now, that's a rogue VM. An accidental rogue VM. Imagine if it were an attacker. Perhaps he/she was a legitimate user but disgruntled. Perhaps he/she decided to use wireless instead of wired. How much fun would that be?
Stop with the "rogue (server) VM" FUD, wouldya?