Rational Survivability

Home > Cisco, Virtualization, VMware > Virtual Routing – The Anti-Matter of Network SECURITY…

Virtual Routing – The Anti-Matter of Network SECURITY…

December 16th, 2008 beaker Leave a comment Go to comments

Here's a nod to Rich Miller who pointed over (route the node, not the packet) to a blog entry from Andreas Antonopoulos titled "Virtual Routing – The anti-matter of network routing."

The premise, as brought up by Doug Gourlay from Cisco at the C-Scape conference was seemingly innocuous but quite cool:

"How about using netflow information to re-balance servers in a data center"

Routing: Controlling the flow of network traffic to an optimal path between two nodes

Virtual-Routing or Anti-Routing: VMotioning nodes (servers) to optimize the flow of traffic on the network.

Using netflow information, identify those nodes (virtual servers)
that have the highest traffic "affinity" from a volume perspective (or
some other desired metric, like desired latency etc) and move (VMotion,
XenMotion) the nodes around to re-balance the network. For example,
bring the virtual servers exchanging the most traffic to hosts on the
same switch or even to the same host to minimize traffic crossing
multiple switches. Create a whole-data-center mapping of traffic flows,
solve for least switch hops per flow and re-map all the servers in the
data center to optimize network traffic.

My first reaction was, yup, that makes a lot of sense from a network point of view, and given who made the comment, it does make sense. Then I choked on my own tongue as the security weenie in me started in on the throttling process, reminding me that while this is fantastic from an autonomics perspective, it's missing some serious input variables.

Latency of the "network" and VM spin-up aside, the dirty little secret is that what's being described here is a realistic and necessary component of real time (or adaptive) infrastructure. We need to get ultimately to the point where within context, we have the ability to do this, but I want to remind folks that availability is only one leg of the stool. We've got the other nasty bits to concern ourselves with, too.

Let's look at this from two perspectives: the network plumber and the security wonk

From the network plumbers' purview, this sounds like an awesome idea; do what is difficult in non-virtualized environments and dynamically adjust and reallocate the "location" of an asset (and thus flows to/from it) in the network based upon traffic patterns and arbitrary metrics. Basically, optimize the network for the lowest latency and best performance or availability by moving VM's around and re-allocating them across the virtual switch fabric (nee DVS) rather than adjusting how the traffic gets to the static nodes.

It's a role reversal: the nodes become dynamic and the network becomes more static and compartmentalized. Funny, huh?

—

The security wonk is unavailable for comment. He's just suffered a coronary event. Segmented network architecture based upon business policy, security, compliance and risk tolerances make it very difficult to perform this level of automation via service governors today, especially in segmented network architecture based upon asset criticality, role or function as expressed as a function of (gulp) compliance, let's say.

Again, the concept works great in a flat network where asset grouping is, for the most part, irrelevant (hopefully governed by a policy asserting such) where what you're talking about is balancing the compute with network and storage, but the moment you introduce security, compliance and risk management as factors into the decision fabric, things get very, very difficult.

Now, if you're Cisco and VMware, the
models for how the security engines that apply policy consistently
across these fluid virtualized networks is starting to take shape, but what we're
missing are the set of compacts or contracts that consistently define
and enforce these policies no matter where they move (and control *if* they can move) and how they factor these requirements into
the governance layer.

The standardization of governance approaches — even at the network layer — is lacking.
There are lots of discrete tools available but the level of integration
and the input streams and output telemetry are not complete.

If you take a look, as an example, at CIRBA's exceptional transformational analytics and capacity management solution, replete with their multi-dimensional array of business process, technical infrastructure and resource mapping, they have no input for risk assessment data, compliance or "security" as variables.

When you look at the utility brought forward by the dynamic, agile and flexible capabilities of virtualized infrastructure, it's hard not to extrapolate all the fantastic things we could do.

Unfortunately, the crushing weight of what happens when we introduce security, compliance and risk management to the dance means we have a more sobering discussion about those realities.

Here's an example reduced to the ridiculous: we have an interesting time architecting networks to maximize throughput, reduce latency and maximize resilience in the face of what can happen with convergence issues and flapping when we have a "routing" problem.

Can you imagine what might happen when you start bouncing VM's around the network in response to maximizing efficiency while simultaneously making unavailable the very resources we seek to maximize the availability of based upon disassociated security policy violations? Fun, eh?

While we're witnessing a phase shift in how we design and model our networks to support more dynamic resources and more templated networks, we can't continue to mention the benefits and simply assume we'll catch up on the magical policy side later.

So for me, Virtual Routing is the anti-matter of network SECURITY, not network routing…or maybe more succinctly, perhaps security doesn't matter at all?

/Hoff

Categories: Cisco, Virtualization, VMware Tags:

Comments (10) Trackbacks (0) Leave a comment Trackback

Magnus

December 17th, 2008 at 02:01 | #1

Reply | Quote

Well, moving the resource doesn't mean it's unavailable for more than a few milliseconds, but you already know that.
I think it's an interesting idea overall, and I think the security ramifications can be addressed through some sort of constraint language. Whether the implementation is worth it is another question.
There are several examples of where self-optimising systems come up with solutions that baffle and surprise the creators of said systems. Especially in _really_ large systems.
Andreas Antonopoulos

December 17th, 2008 at 07:25 | #2

Reply | Quote

This is exactly the kind of discussion we need to have about the possibility of policy based mass-moves that drive topology change. Security is affected, but could also be driving the policy for where nodes can go and where they can't. This is definitely a thorny issue, but it seems like a natural extrapolation of developments in virtualization. The plot thickens. The security pros remain employed…
Andrew Jaquith

December 17th, 2008 at 09:49 | #3

Reply | Quote

Hoff,
Insightful post. I agree that in the abstract that whizzing nodes around the network is the sort of thing that should give security people pause, at least when said node whizzes past a security zone boundary. In practice, though, most enterprise networks are flatland. They have very few security zones other than the DMZ plus a big squishy one for everything else.
So, in most cases, the simple rule would be this: thou shalt not whizz thine VMs between the DMZ and the corporate network. Failure to do so shall cause the offending admin to be slapped across the face with a pickled herring.
But this seems like common sense, right? Am I missing something?
Christofer Hoff

December 17th, 2008 at 10:21 | #4

Reply | Quote

Re-posting my reply from Twitter to AJ:
I think the only thing that's missing is that virtualization (internally and on the DMZ) is a forcing function for segmentation.
Over the last 6 months, internal network segmentation has (informally by conversational queries by me) shown an increase in orders of mag. driven primarily by non-network/security folks (the sysadmins) creating VLAN/vSwitch partitioning to contain VM's in zones for management.
The SysAdmins are actually better equipped and in some cases more motivated to create this partitioning/zoning that gets us closer to a point that the bureaucracy of siloed functions has heretofore made difficult; because the SysAdmins "own" the (now) private virtual networks, they can do what they like. Sometimes this is a good thing, sometimes it is not.
You'll note that the Cisco/VMware VN-Link clawback of the access layer by reclaiming the virtual switch with a Cisco version substitutes one challenge for another in this case…
So while the DMZ argument is obvious, the non-DMZ segmentation coupled with RTI/internal cloud architectures are accelerating this.
/Hoff
Omar Sultan

December 18th, 2008 at 08:15 | #5

Reply | Quote

Hoff:
The implicit shift in this scenario to keep the security wonk out out of the cardiac care unit is to decouple the implementation of security policy from the infrastructure needed to deliver it. So, if you go back to Doug's scenario, what if you could define a infrastructure security policy and have that policy follow an app/VM around the data center?
We have done this to some degree with VN-Link and the Nexus 1000V–we can define a port policy for a VM that includes things like ACLs, private VLAN policy, Cisco TrustSec policy, and the like and have that policy follow the VM around a VI cluster regardless of where it ends up.
Now, this is only a first step and Doug's example requires a much more sophisticated and encompassing implementation of this concept, but I think the overall approach is feasible. But that's just Omar the Plumber talking… 🙂
Brett Eldridge

December 18th, 2008 at 09:32 | #6

Reply | Quote

Interesting. I used to work at NetScreen and we developed an interesting technology we called "Dynamic Routed VPNs." (I'm not in the network security world anymore so there isn't a business angle here).
Moving on.
A similar concept could apply here.
First, a overly-simplified explanation of how the technology worked:
1. Create a tunnel interface (think of it as a special loopback) that has associated IPsec parameters and keys.
2. "Bind" a routing daemon to the tunnel interface and advertise routes
3. Send packets and they traverse the VPN via whatever route the routing tables tell it. You get dynamic failover between peers, etc.
The interesting part of this is that you can "segment" your possible VPN tunnels since IPsec will only peer with other nodes that have the correct key information.
For example, you can have a dynamic DMZ full-mesh VPN network that would automatically route to the "best" peer for traffic. That traffic could never be routed through an INTERNAL zone since the IPsec peers could never form a tunnel.
Anyway, food for thought on how to create a dynamic infrastructure while maintaining some security (in fact, you could even ensure that only valid clients were allowed to use a specific tunnel using IPsec).
Michael

December 18th, 2008 at 10:04 | #7

Reply | Quote

Hello /hoff,
First of all, we can attach a security template and policy to the VM and follow the move.
Second, the automated (no-downtime) mobility feature requires that the VM stay within the same VLAN/Portgroup at all times. This usually means that network segmentation will remain intact. Additionally, we support a finer grain trust zone overlay to provide ACL and even packet level segmentation between virtual machines on the same virtualized network. So I don't see a problem at the network access layer at all.
Third, I think you're right from a DOS/availability point of view as this behavior will place two or more critical machines on a single host. Or worse yet create a mobility thrash effect as the CPU load balancer moves machines apart and the network balancer keeps moving them back together — I've seen this happen in the lab already and it tends to require a VM maestro's touch to set the VM<->Host affinities right.
Michael
John Blessing

December 18th, 2008 at 10:41 | #8

Reply | Quote

Interesting point Andrew,
'So, in most cases, the simple rule would be this: thou shalt not whizz thine VMs between the DMZ and the corporate network. Failure to do so shall cause the offending admin to be slapped across the face with a pickled herring.'
What I find in my enterprise is that the original Admin who starts these configurations which cause the VM's to 'whiz' around is not available to ask the question "why are all my VM's whizzing around?" because I was forced to do some administrative function (upgrade a critical LB server) that I wouldn't normally do. When the "freak-out" stage happens there isn't any traceability to who did it because machines did it, because the machines read a line of code or algorithim to tell them to do it. The actual 'compliance' trail will be cold and we will have all these pickled herrings and no one to hit.
colin

December 18th, 2008 at 16:11 | #9

Reply | Quote

In most non-virtualised environments, machines are classified based on the data that they contain. Virtual Machines should not be classified any differently. Moving a VM from a particular classified network to another classification (either the same classification or higher, never lower) is not something that I can see that would be an issue for me. With the courting of the virtual environment by cisco (the Nexus V switches), application of appropriate security policy based on either ACLs or load balanced firewalls should allow for this particular architecture to be achieved without the security angst that I can see here.
Setting affinity and working out the algorithm that would force a VMotion would be the hard part..
-colin.
Walt

December 24th, 2008 at 08:05 | #10

Reply | Quote

This has been the thorn in the side for shops running the i5/OS for years. Well, that and the fact that the TCP/IP stack for i5/OS can't support HTTP over load.