Posts Tagged ‘Cloud Networking’

Dear Public Cloud Providers: Please Make Your Networking Capabilities Suck Less. Kthxbye

December 4th, 2009 6 comments

sucklessThere are lots of great discussions these days about how infrastructure and networking need to become more dynamic and intelligent in order to more fully enable the mobility and automation promised by both virtualization and cloud computing.  There are many examples of how that’s taking place in the enterprise.

Incumbent networking vendors and emerging cloud/network startups are coming to terms with the impact of virtualization and cloud as juxtaposed with that of (and you’ll excuse the term) “pure” cloud vendors and those more traditional (Inter)networking service providers who have begun to roll out Cloud services atop or alongside their existing portfolio of offerings.

  • On the one hand we see hardware-based networking vendors adding software-based virtual switching and virtual appliance extensions in order to claw back the networking and security functions which have been abstracted into the virtualization and cloud stacks.  This is a big deal in the enterprise and especially with vendors looking to stake a claim in the private cloud space which is the evolution of traditional datacenter capabilities extended with virtualization and leverages the attributes of Cloud to provide for a more frictionless computing experience.  Here is where we see innovation and evolution with the likes of converged data and storage networking and unified fabric solutions.

  • On the other hand we see massively-scaled public cloud providers and evolving (Inter)networking service providers who have essentially absorbed the networking layers into their cloud operating platforms and rely on the software functionality embedded within to manifest the connectivity required to enable service.  There is certainly networking hardware sitting beneath these offerings, but depending upon their provenance, there are remarkable differences in the capabilities and requirements between them and those mentioned above.  Mostly, these providers are really shouting for multi-terabit layer two switching fabric interconnects to which they interface their software-enabled compute platforms.  The secret sauce is primarily in software.

For the purpose of this post, I’m not going to focus on the private Cloud camp and enterprise cloud plays, or those “Cloud” providers who replicate the same architectures to serve these customers, rather, I want to focus on those service providers/Cloud providers who offer massively scalable Infrastructure and Platform-as-a-Service offerings as in the second example above and highlight two really important points:

  1. From a physical networking perspective, most of these providers rely, in some large part, on giant, flat, layer two physical networks with the actual “intelligence,” segmentation, isolation and logical connectivity provided by the hypervisor and their orchestration/provisioning/automation layers.
  2. Most of the networking implementations in these environments are seriously retarded as it relates to providing flexible and extensible networking topologies which make for n-Tier application mapping nightmares for an enterprise looking to move a reasonable application stack to their service.

I’ve been experimenting with taking several reasonably basic n-Tier app stacks which require mutiple levels of security, load balancing and message bus capabilities and design them using several cloud platform providers offerings today.

The dirty little secret is that there are massive trade-offs with each of them, mostly due to constraints related to the very basic networking and security functionality offered by the hypervisors that power their services today.  The networking is basic.  Just the way they like it. It sucks for me.

This is a problem I demonstrated in enterprise virtualization in my Four Horsemen of the Virtualization Apocalypse presentation two years ago.  It’s much, much worse in Cloud.

Not supporting multiple virtual interfaces, not supporting multiple IP addresses per instance/VM, not supporting multicast or broadcast capabilities for software-based load balancing (and resiliency of the LB engines themselves)…these are nasty issues that in many cases require wholesale re-engineering of app stacks and push things like resiliency and high availability into uncertain waters.

It’s also going to cost me more.

Sure, there are ways of engineering around these inadequacies, but they require additional levels of complexity, more cost, additional providers or instances and still leave me without many introspection options and detective and preventative security controls that I’m used to being able to rely on in traditional networking environments using colocation services or natively within the enterprise.

I’m sure I’ll see comments (public and private) suggesting all sorts of reasons why these are non-issues and how it’s silly to try and replicate the enterprise approach in the cloud.  I have 500 reasons why they’re wrong…the Fortune 500, that is.  You should also know I’m not apologizing for the sorry state of non-dynamic infrastructure, but I am suggesting that forcing me to re-tool app stacks to fit your flat network topologies without giving me better security and flexible connectivity options simply sucks.

In may cases, people just can’t get there from here.

I don’t want to have to re-architect my app stacks to work in the cloud simply because of a lack of maturity from a networking perspective.  I shouldn’t have to. That’s simply backward.  If the power of Cloud is its ability to quickly, flexibly, and easily allow me to provision, orchestrate and deploy services, that must include the network, also!

The networking and security capabilities of  public Cloud providers needs to improve — and quickly.  Applications that are not network topology-dependent and only require a single interface (or more specifically an IP address/socket) to communicate aren’t the problem.  It’s when you need to integrate applications and/or infrastructure solutions that require multiple interfaces, that *are* topology dependent and require insertion between these monolithic applications that things break down. Badly.

The “app on a stick” model doesn’t work when enterprises struggle with taking isolated clusters of applications (tiers) and isolate/protect them with physical or virtual appliances that require multiple interfaces to do so.  ACL’s don’t cut it, not when I need FW, IPS, DLP, WAF, etc. functionality.  Let’s not forget dedicated management, storage or backup interfaces.  These are many of the differences between public and private cloud offerings.

I can’t do many of the things I need to do easily in the Cloud today, not without serious trade-offs that incur substantial cost and given the immaturity of the market as a whole put me at risk.

For the large enterprise, if the fundamental networking and security architectures don’t allow for easy portability that does not require massive re-engineering of app stacks, these enterprises are going to turn to niche or evolving (Inter)networking providers who offer them the capability to do so, even if they’re not as massively scaleable, or they’ll simply build private clouds instead.


Great InformationWeek/Dark Reading/Black Hat Cloud & Virtualization Security Virtual Panel on 12/9

December 3rd, 2009 No comments


I wanted to let you know about about a cool virtual panel I’m moderating as part of the InformationWeek/Dark Reading/Black Hat virtual event titled “IT Security: The Next Decade” on December 9th.

There are numerous awesome speakers throughout the day, but the panel I’m moderating is especially interesting to me because I was able to get an amazing set of people to participate.  Here’s the rundown — check out the panelists:

Virtualization, Cloud Computing, And Next-Generation Security

The concept of cloud computing creates new challenges for security, because sensitive data may no longer reside on dedicated hardware.  How can enterprises protect their most sensitive data in the rapidly-evolving world of shared computing resources? In this panel, Black Hat researchers who have found vulnerabilities in the cloud and software-as-a-service models meet other experts on virtualization and cloud computing to discuss the question of cloud computing’s impact on security and the steps that will be required to protect data in cloud environments.

Panelists: Glenn Brunette, Distinguished Engineer and Chief Security Architect, Sun Microsystems; Edward Haletky, Virtualization Security Expert; Chris Wolf, Virtualization Analyst, Burton Group; Jon Oberheide, Security Researcher; Craig Balding, Cloud Security Expert,

Moderator: Christofer Hoff, Contributing Editor, Black Hat

I wanted the perspective of architects/engineers, practitioners, researchers and analysts — and I couldn’t have asked for a better group.

Our session is 6:15pm – 7:00 EST.

Hope to “see” you there.


The Cloud & eHarmony’s 29 Dimensions Of Compatability…

November 23rd, 2009 7 comments

I speak to many customers — large companies in numerous verticals and service providers —  who are for the reasons we are all very well aware of, engaging in projects large and small focused on Cloud adoption.

On the enterprise side, the dialog almost inevitably goes like this:

We’re working on taking applications and data which are not heavily regulated/compliance scoped, business critical or contain sensitive information and move them to a public cloud provider like AWS — we’re also considering virtual private clouds to use public cloud infrastructure in private ways.

We’ve had great success with low-hanging fruit and grid-like utility offerings, but we’re having a bear of a time with real “applications” — taking them as they run today internally and making them run the same way on someone elses’ kit.  It’s not always the application, either, but rather the attendant dependencies on other critical IT-centric functions that cause the issues (Ed: “metastructure”)

In parallel we’re engaging in building private clouds for critical applications that either have complex development and support/integration issues that are not ready for running on others’ infrastructure and/or have compliance and regulatory requirements that prevent us from moving them off our infrastructure.

We’re continuing to invest and optimize our internal virtualization deployments; we’re reducing footprint but really increasing compute, network and storage density.  Don’t let the smaller physical space fool you, we’re getting bigger in more efficient floor plans.  We’ve standardized on VMware. We’re figuring out how vSphere and vCloud intersect and what that means in the long term and how that impacts our choice of Cloud providers.

We understand that using the same vendor we use for virtualization to ultimately deliver our private cloud should yield easier portability and workload interoperability, but we’re worried about vendor lock-in…sort of.

We’d really like to be able to move workloads/applications/information in and out of private clouds to public/virtual public offerings and support workloads/applications/information that were born in the cloud on our private cloud, too.  These present a whole host of security and lifecycle management issues.

In the long term, what we want to do is build a self-service portal (not unlike that depending upon business logic and security/compliance requirements, etc. will allow a business constituent consumer to deploy packaged or bespoke workloads/application/information and not have to care about where it runs.

That would be nice.  We’d like to be able to do that with the thousands of applications we already support today.

We’re investigating cloud brokers currently, but most don’t do what they advertise they do or have severe limitations. While they often plug the gaps between the various cloud providers, we trade one vendor lock-in problem for another with custom orchestration and provisioning frameworks.  We’re trying to roll our own — cobbling together bits and pieces — but it’s an integration nightmare.

The lack of standard APIs and competing implementation semantics with immature sets of management, security, provisioning, orchestration and governance solutions really makes this all very, very difficult.

What should we do?

This story is the same over and over.

It’s literally the Cloud equivalent of’s 29 dimensions of compatibility; it’s such a multidimensional problem in large enterprises that have a huge number of applications (thousands) and a ton of sunk infrastructure, mature decades-old operational practices, cultural dispositions, and economic pressures that it’s hard to figure out what to do.

For large enterprises (and the service providers who cater to them) Cloud is not a simple undertaking, at least not to those who have to deal with bridging the gap between the “old world” and the new shiny bits glimmering off in the distance.

Consider that the next time you hear a story of cloud successes and scrutinize what that really means.


Incomplete Thought: The Cloud Software vs. Hardware Value Battle & Why AWS Is Really A Grid…

October 18th, 2009 2 comments

Some suggest in discussing the role and long-term sustainable value of infrastructure versus software in cloud that software will marginalize bespoke infrastructure and the latter will simply commoditize.

I find that an interesting assertion, given that it tends to ignore the realities that both hardware and software ultimately suffer from a case of Moore’s Law — from speeds and feeds to the multi-core crisis, this will continue ad infinitum.  It’s important to keep that perspective.

In discussing this, proponents of software domination almost exclusively highlight Amazon Web Services as their lighthouse illustration.  For the purpose of simplicity, let’s focus on compute infrastructure.

Here’s why pointing to Amazon Web Services (AWS) as representative of all cloud offerings in general to anchor the hardware versus software value debate is not a reasonable assertion:

  1. AWS delivers a well-defined set of services designed to be deployed without modification across a massive number of customers; leveraging a common set of standardized capabilities across these customers differentiates the service and enables low cost
  2. AWS enjoys general non-variability in workload from *their* perspective since they offer fixed increments of compute and memory allocation per unit measure of exposed abstracted and virtualized infrastructure resources, so there’s a ceiling on what workloads per unit measure can do. It’s predictable.
  3. From AWS’ perspective (the lens of the provider) regardless of the “custom stuff” running within these fixed-sized containers, the main focus of their core “cloud” infrastructure actually functions like a grid — performing what amounts to a few tasks on a finely-tuned platform to deliver such
  4. This yields the ability for homogeneity in infrastructure and a focus on standardized and globalized power efficient, low cost, and easy-to-replicate components since the problem of expansion beyond a single unit measure of maximal workload capacity is simply a function of scaling out to more of them (or stepping up to one of the next few rungs on the scale-up ladder)

Yup, I just said that AWS is actually a grid whose derivative output is a set of cloud services.

Why does this matter?  Because not all IaaS cloud providers are architected to achieve this — by design — and this has dramatic impact on where hardware and software, leveraged independently or as a total solution, play in the debate.

This is because AWS built and own the entire “CloudOS” stack from customized hardware through to the VMM, management and security planes (much as Google does the same) versus other providers who use what amounts to more generic software offerings from the likes of VMware and lean on API’s and an ecosystem to extend it’s capabilities as well as big iron to power it.  This will yield more customizable offerings that likely won’t scale as highly as AWS.

That’s because they’re not “grids” and were never designed to be.

Many other IaaS providers that have evolved from hosting are building their next-generation offerings from unified fabric and unified computing platforms (so-called “big iron”) which are the furtherest thing from “commodity” hardware you can get.  Further, SaaS and PaaS providers generally tend to do the same based on design goals and business models.  Remember, IaaS is not representative of all things cloud — it’s only one of the service models.

Comparing AWS to most other IaaS cloud providers is a false argument upon which to anchor the hardware versus software debate.


Amazon Web Services: It’s Not The Size Of the Ship, But Rather The Motion Of the…

October 16th, 2009 3 comments
From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

Carl Brooks (@eekygeeky) gets some fantastic, thought-provoking interviews.  His recent article wherein he interviewed Peter DeSantis, VP of EC2, Amazon Web Services, was titled: “Amazon would like to remind you where the hype started” is another great example.

However, this article left a bad taste in my mouth and ultimately invites more questions than it answers. Frankly I felt like there was a large amount of hand-waving in DeSantis’ points that glossed over some very important issues related to security issues of late.

DeSantis’ remarks implied, per the title of the article, that to explain the poor handling and continuing lack of AWS’ transparency related to the issues people like me raise,  the customer is to blame due to hype and overly aggressive, misaligned expectations.

In short, it’s not AWS’ fault they’re so awesome, it’s ours.  However, please don’t remind them they said that when they don’t live up to the hype they help perpetuate.

You can read more about that here “Transparency: I Do Not Think That Means What You Think That Means…

I’m going to skip around the article because I do agree with Peter DeSantis on the points he made about the value proposition of AWS which ultimately appear at the end of the article:

“A customer can come into EC2 today and if they have a website that’s designed in a way that’s horizontally scalable, they can run that thing on a single instance; they can use [CloudWatch[] to monitor the various resource constraints and the performance of their site overall; they can use that data with our autoscaling service to automatically scale the number of hosts up or down based on demand so they don’t have to run those things 24/7; they can use our Elastic Load Balancer service to scale the traffic coming into their service and only deliver valid requests.”

“All of which can be done self-service, without talking to anybody, without provisioning large amounts of capacity, without committing to large bandwidth contracts, without reserving large amounts of space in a co-lo facility and to me, that’s a tremendously compelling story over what could be done a couple years ago.”

Completely fair.  Excellent way of communicating the AWS value proposition.  I totally agree.  Let’s keep this definitional firmly in mind as we go on.

Here’s where the story turns into something like a confessional that implies AWS is sadly a victim of their own success:

DeSantis said that the reason that stories like the DDOS on (and the non-cloud Sidekick story) is because people have come to expect always-on, easily consumable services.

“People’s expectations have been raised in terms of what they can do with something like EC2. I think people rightfully look at the potential of an environment like this and see the tools, the multi- availability zone, the large inbound transit, the ability to scale out and up and fundamentally assume things should be better. “ he said.

That’s absolutely true. We look at what you offer (and how you offered/described it above) and we set our expectations accordingly.

We do assume that things should be better as that’s how AWS has consistently marketed the service.

You can’t reasonably expect to bitch about people’s perception of the service based on how it’s “sold” and then turn around when something negative happens and suggest that it’s the consumers’ fault for setting their expectational compass with the course you set.

It *is* absolutely fair to suggest that there is no release from not using common sense, not applying good architectural logic to deployment of services on AWS, but it’s also disingenuous to expect much of the target market to whom you are selling understands the caveats here when so much is obfuscated by design.  I understand AWS doesn’t say they protect against every threat, but they also do not say they do not…until something happens where that becomes readily apparent 😉

When everything is great AWS doesn’t go around reminding people that bad things can happen, but when bad things happen it’s because of incorrectly-set expectations?

Here’s where the discussion turns to an interesting example —  the BitBucket DDoS issue.

For instance, DeSantis said it would be trivial to wash out standard DDOS attacks by using clustered server instances in different availability zones.

Okay, but four things come to mind:

  1. Why did it take 15 hours for AWS to recognize the DDoS in the first place? (They didn’t actually “detect” it, the customer did)
  2. Why did the “vulnerability” continue to exist for days afterward?
  3. While using different availability zones makes sense, it’s been suggested that this DDoS attack was internal to EC2, not externally-generated
  4. While it *is* good practice and *does* make sense, “clustered server instances in different avail. zones, costs money

Keep those things in the back of your mind for a moment…

“One of the best defenses against any sort of unanticipated spike is simply having available bandwidth. We have a tremendous amount on inbound transit to each of our regions. We have multiple regions which are geographically distributed and connected to the internet in different ways. As a result of that it doesn’t really take too many instances (in terms of hits) to have a tremendous amount of availability – 2,3,4 instances can really start getting you up to where you can handle 2,3,4,5 Gigabytes per second. Twenty instances is a phenomenal amount of bandwidth transit for a customer.” he said.

So again, here’s where I take issue with this “bandwidth solves all” answer. The solution being proposed by DeSantis here is that a customer should be prepared to launch/scale multiple instances in response to a DoS/DDoS, in effect making it the customers’ problem instead of AWS detecting and squelching it in the first place?

Further, when you think of it, the trickle-down effect of DDoS is potentially good for AWS’ business. If they can absorb massive amounts of traffic, then the more instances you have to scale, the better for them given how they charge.  Also, per my point #3 above, it looks as though the attack was INTERNAL to EC2, so ingress transit bandwidth per region might not have done anything to help here.  It’s unclear to me whether this was a distributed DoS attack at all.

Lori MacVittie wrote a great post on this very thing titled “Putting a Price on Uptime” which basically asks who pays for the results of an attack like this:

A lack of ability in the cloud to distinguish illegitimate from legitimate requests could lead to unanticipated costs in the wake of an attack. How do you put a price on uptime and more importantly, who should pay for it?

This is exactly the point I was raising when I first spoke of Economic Denial Of Sustainability (EDoS) here.  All the things AWS speaks to as solutions cost more money…money which many customers based upon their expectations of AWS’ service, may be unprepared to spend.  They wouldn’t have much better options (if any) if they were hosting it somewhere else, but that’s hardly the point.

I quote back to something I tweeted earlier “The beauty of cloud and infinite scale is that you get the benefits of infinite FAIL”

The largest DDOS attacks now exceed 40Gbps. DeSantis wouldn’t say what AWS’s bandwidth ceiling was but indicated that a shrewd guesser could look at current bandwidth and hosting costs and what AWS made available, and make a good guess.

The tests done here showed the capability  to generate 650 Mbps from a single medium instance that attacked another instance which, per Radim Marek, was using another AWS account in another availability zone.  So if the “largest” DDoS attacks now exceed 40 Gbps” and five EC2 instances can handle 5Gb/s, I’d need 8 instances to absorb an attack of this scale (unknown if this represents a small or large instance.)  Seems simple, right?

Again, this about absorbing bandwidth against these attacks, not preventing them or defending against them.  This is about not only passing the buck by squeezing more of them out of you, the customer.

“ I don’t want to challenge anyone out there, but we are very, very large environment and I think there’s a lot of data out there that will help you make that case.” he said.

Of course you wish to challenge people, that’s the whole point of your arguments, Peter.

How much bandwidth AWS has is only one part of the issue here.  The other is AWS’ ability to respond to such attacks in reasonable timeframes and prevent them in the first place as part of the service.  That’s a huge part of what I expect from a cloud service.

So let’s do what DeSantis says and set our expectations accordingly.


Cloud Providers and Security “Edge” Services – Where’s The Beef?

September 30th, 2009 16 comments

usbhamburgerPreviously I wrote a post titled “Oh Great Security Spirit In the Cloud: Have You Seen My WAF, IPS, IDS, Firewall…” in which I described the challenges for enterprises moving applications and services to the Cloud while trying to ensure parity in compensating controls, some of which are either not available or suffer from the “virtual appliance” conundrum (see the Four Horsemen presentation on issues surrounding virtual appliances.)

Yesterday I had a lively discussion with Lori MacVittie about the notion of what she described as “edge” service placement of network-based WebApp firewalls in Cloud deployments.  I was curious about the notion of where the “edge” is in Cloud, but assuming it’s at the provider’s connection to the Internet as was suggested by Lori, this brought up the arguments in the post
above: how does one roll out compensating controls in Cloud?

The level of difficulty and need to integrate controls (or any “infrastructure” enhancement) definitely depends upon the Cloud delivery model (SaaS, PaaS, and IaaS) chosen and the business problem trying to be solved; SaaS offers the least amount of extensibility from the perspective of deploying controls (you don’t generally have any access to do so) whilst IaaS allows a lot of freedom at the guest level.  PaaS is somewhere in the middle.  None of the models are especially friendly to integrating network-based controls not otherwise supplied by the provider due to what should be pretty obvious reasons — the network is abstracted.

So here’s the rub, if MSSP’s/ISP’s/ASP’s-cum-Cloud operators want to woo mature enterprise customers to use their services, they are leaving money on the table and not fulfilling customer needs by failing to roll out complimentary security capabilities which lessen the compliance and security burdens of their prospective customers.

While many provide commoditized solutions such as anti-spam and anti-virus capabilities, more complex (but profoundly important) security services such as DLP (data loss/leakage prevention,) WAF, Intrusion Detection and Prevention (IDP,) XML Security, Application Delivery Controllers, VPN’s, etc. should also be considered for roadmaps by these suppliers.

Think about it, if the chief concern in Cloud environments is security around multi-tenancy and isolation, giving customers more comfort besides “trust us” has to be a good thing.  If I knew where and by whom my data is being accessed or used, I would feel more comfortable.

Yes, it’s difficult to do properly and in many cases means the Cloud provider has to make a substantial investment in delivery platforms and management/support integration to get there.  This is why niche players who target specific verticals (especially those heavily regulated) will ultimately have the upper hand in some of these scenarios – it’s not socialist security where “good enough” is spread around evenly.  Services like these need to be configurable (SELF-SERVICE!) by the consumer.

An example? How about Google: where’s DLP integrated into the messaging/apps platforms?  Amazon AWS: where’s IDP integrated into the VMM for introspection?

I wrote a couple of interesting posts about this (that may show up in the automated related posts lists below):

My customers in the Fortune 500 complain constantly that the biggest providers they are being pressured to consider for Cloud services aren’t listening to these requests — or aren’t in a position to respond.

That’s bad for everyone.

So how about it? Are services like DLP, IDP, WAF integrated into your Cloud providers’ offerings something you’d like to see rather than having to add additional providers as brokers and add complexity and cost back into Cloud?


The Emotion of VMotion…

September 29th, 2009 8 comments
VMotion - Here's Where We Are Today

VMotion - Here's Where We Are Today

A lot has been said about the wonders of workload VM portability.

Within the construct of virtualization, and especially VMware, an awful lot of time is spent on VM Mobility but as numerous polls and direct customer engagements have shown, the majority (50% and higher) do not use VMotion.  I talked about this in a post titled “The VM Mobility Myth:

…the capability to provide for integrated networking and virtualization coupled with governance and autonomics simply isn’t mature at this point. Most people are simply replicating existing zoned/perimertized non-virtualized network topologies in their consolidated virtualized environments and waiting for the platforms to catch up. We’re really still seeing the effects of what virtualization is doing to the classical core/distribution/access design methodology as it relates to how shackled much of this mobility is to critical components like DNS and IP addressing and layer 2 VLANs.  See Greg Ness and Lori Macvittie’s scribblings.

Furthermore, Workload distribution (Ed: today) is simply impractical for anything other than monolithic stacks because the virtualization platforms, the applications and the networks aren’t at a point where from a policy or intelligence perspective they can easily and reliably self-orchestrate.

That last point about “monolithic stacks” described what I talked about in my last post “Virtual Machines Are the Problem, Not the Solution” in which I bemoaned the bloat associated with VM’s and general purpose OS’s included within them and the fact that VMs continue to hinder the notion of being able to achieve true workload portability within the construct of how programmatically one might architect a distributed application using an SOA approach of loosely coupled services.

Combined with the VM bloat — which simply makes these “workloads” too large to practically move in real time — if one couples the annoying laws of physics and current constraints of virtualization driving the return to big, flat layer 2 network architecture — collapsing core/distribution/access designs and dissolving classical n-tier application architectures — one might argue that the proposition of VMotion really is a move backward, not forward, as it relates to true agility.

That’s a little contentious, but in discussions with customers and other Social Media venues, it’s important to think about other designs and options; the fact is that the Metastructure (as it pertains to supporting protocols/services such as DNS which are needed to support this “infrastructure 2.0”) still isn’t where it needs to be in regards to mobility and even with emerging solutions like long-distance VMotion between datacenters, we’re butting up against laws of physics (and costs of the associated bandwidth and infrastructure.)

While we do see advancements in network-driven policy stickiness with the development of elements such as distributed virtual switching, port profiles, software-based vSwitches and virtual appliances (most of which are good solutions in their own right,) this is a network-centric approach.  The policies really ought to be defined by the VM’s themselves (similar to SOA service contracts — see here) and enforced by the network, not the other way around.

Further, what isn’t talked about much is something that @joe_shonk brought up, which is that the SAN volumes/storage from which most of these virtual machines boot, upon which their data is stored and in some cases against which they are archived, don’t move, many times for the same reasons.  In many cases we’re waiting on the maturation of converged networking and advances in networked storage to deliver solutions to some of these challenges.

In the long term, the promise of mobility will be delivered by a split into three four camps which have overlapping and potentially competitive approaches depending upon who is doing the design:

  1. The quasi-realtime chunking approach of VMotion via the virtualization platform [virtualization architect,]
  2. Integration distribution and “mobility” at the application/OS layer [application architect,] or
  3. The more traditional network-based load balancing of traffic to replicated/distributed images [network architect.]
  4. Moving or redirecting pointers to large pools of storage where all the images/data(bases) live [Ed. forgot to include this from above]

Depending upon the need and capability of your application(s), virtualization/Cloud platform, and network infrastructure, you’ll likely need a mash-up of all three four.  This model really mimics the differences today in architectural approach between SaaS and IaaS models in Cloud and further suggests that folks need to take a more focused look at PaaS.

Don’t get me wrong, I think VMotion is fantastic and the options it can ultimately delivery intensely useful, but we’re hamstrung by what is really the requirement to forklift — network design, network architecture and the laws of physics.  In many cases we’re fascinated by VM Mobility, but a lot of that romanticization plays on emotion rather than utilization.

So what of it?  How do you use VM mobility today?  Do you?


Incomplete Thought: Virtual Machines Are the Problem, Not the Solution…

September 25th, 2009 40 comments

simplicity_complexityI’m an infrastructure guy. A couple of days ago I had a lightbulb go on.  If you’re an Apps person, you’ve likely already had your share of illumination.  I’ve just never thought about things from this perspective.  Please don’t think any less of me 😉

You can bet I’m talking above my pay grade here, but bear with my ramblings for a minute and help me work through this (Update: I’m very happy to see that Surendra Reddy [@sureddy – follow him] did just that with his excellent post – cross-posted in the comments below, here. Also, check out Simon Crosby’s (Citrix CTO) post “Wither the venerable OS“)

It comes down to this:

Virtual machines (VMs) represent the symptoms of a set of legacy problems packaged up to provide a placebo effect as an answer that in some cases we have, until lately, appeared disinclined and not technologically empowered to solve.

If I had a wish, it would be that VM’s end up being the short-term gap-filler they deserve to be and ultimately become a legacy technology so we can solve some of our real architectural issues the way they ought to be solved.

That said, please don’t get me wrong, VMs have allowed us to take the first steps toward defining, compartmentalizing, and isolating some pretty nasty problems anchored on the sins of our fathers, but they don’t do a damned thing to fix them.

VMs have certainly allowed us to (literally) “think outside the box” about how we characterize “workloads” and have enabled us to begin talking about how we make them somewhat mobile, portable, interoperable, easy to describe, inventory and in some cases more secure. Cool.

There’s still a pile of crap inside ’em.

What do I mean?

There’s a bloated, parasitic resource-gobbling cancer inside every VM.  For the most part, it’s the real reason we even have mass market virtualization today.

It’s called the operating system:


If we didn’t have resource-inefficient operating systems, handicapped applications that were incestuously hooked to them, and tons of legacy networking stuff to deal with that unholy affinity, imagine the fun we could have.  Imagine how agile and flexible we could become.

But wait, isn’t server virtualization the answer to that?

Not really.  Server virtualization like that pictured in the diagram above is just the first stake we’re going to drive into the heart of the frankenmonster that is the OS.  The OS is like Cousin Eddie and his RV.

The approach we’ve taken today is that the VMM/Hypervisor abstracts the hardware from the OS.  The applications are still stuck on top of operating systems that don’t provide much in the way of any benefit given the emergence of development frameworks/languages such as J2EE, PHP, Ruby, .NET, etc. that were built around the notions of decoupled, distributed and mashable application “fabrics.”

Every ship travels with an anchor, in the case of the VM it’s the OS.

Imagine if these applications didn’t have to worry about the resource-hogging, control-freak, I/O limiting, protected mode schizophrenia and de-privileged ring spoofing of hypervisors associated with trying not conflict with or offend the OS’s sacred relationship with the hardware beneath it.

Imagine if these application constructs were instead distributed programmatically, could intercommunicate using secure protocols and didn’t have to deal with legacy problems. Imagine if the VMM/Hypervisor really was there to enable scale, isolation, security, and management.  We’d be getting rid of an entire layer.

If that crap in the middle of the sandwich makes for inefficiency, insecurity and added cost in virtualized enterprises, imagine what it does at the Infrastructure as a Service (IaaS) layer in Cloud deployments where VMs — in whatever form — are the basis for the operational models.  We have these fat packaged VMs with OS overhead and attack surfaces that really don’t need to be there.

For example, most of the pre-packaged AMIs found on AWS are bloated general purpose operating systems with some hardening applied (if at all) but there’s just all that code… sitting there…doing nothing except taking up storage, memory and compute resources.

Why do we need this?   Why don’t we at at least see more of a push towards JEOS (Just Enough OS) in the meantime?

I think most virtualization vendors today who are moving their virtualization offerings to adapt to Cloud, are asking themselves the same questions and answering them by realizing that the real win in the long term — once enterprises are done with consolidation and virtualization and hit the next “enterprise application modernization” cycle — will be  to develop and engineer applications directly around platforms which obviate the OS.

So these virtualization players are  making acquisitions to prepare them for this next wave — the real emergence of Platform as a Service (PaaS.)

Some like Microsoft with Azure are simply starting there.  Even SaaS vendors have gone down-stack and provided PaaS offerings to further allow for connectivity, integration and security in the place they think it belongs.

In the case of VMware and their acquisition of SpringSource, that piece of bloat in the middle can be seen as simply going away; whatever you call it, it’s about disintermediating the OS completely and it seems to me that the entire notion of vApps addresses this very thing.  I’m sure there are a ton of other offerings that I simply didn’t get before that are going to make me go “AHA!” now.

I’m not sure organizationally or operationally that most enterprises can get their arms around what it means to not have that OS layer in the middle in the short term, but this new platform-oriented Cloud is really interesting.  It makes those folks who may have made the conversion from server-hugger to VM-hugger and think they were done adapting, quite uncomfortable.

It makes me uncomfortable…and giddy.

All the things I know and understand about how things at the Infrastructure layer s interacts with applications and workloads at the Infostructure layer will drastically change.  The security models will change.  The solutions will change.  Even the notion of vMotion — moving VM’s around — will change.  In fact, in this model, vMotion isn’t really relevant.

Admittedly, I’ve had to call into question over the last few days just how relevant the notion of “Infrastructure 2.0” is within this model — at least how it’s described today.

Cloud v1.0 with all it’s froth and hype is going to be nothing compared to Cloud 2.0 — the revenge of SOA, web services, BPM, enterprise architecture and the developer.  Luckily for the sake of us infrastructure folks we still have time to catch up and see the light as the VMM buys us visibility and a management plane.  However, the protocols and models for how applications interact with the network are sure going to change and accelerate due to Cloud — at least they should.

Just look at how developments such as XMPP and LISP are going to play in a PaaS-centric world…

Like I said, it’s an incomplete thought and I’m not enlightened enough to frame a better discussion in written form, but I can’t wait until the next Infrastructure 2.0 Working Group to bring this up.

I have an appreciation for such a bigger piece of the conversation now.  I just need to get more edumacated.

My head ahsplodes.

Any of this crap make sense to you?


Quick Question: Any Public Cloud Providers Using Intel TXT?

September 15th, 2009 3 comments

Does anyone know of any Public Cloud Provider (or Private for that matter) that utilizes Intel’s TXT?

Specifically, does anyone know if Amazon makes use of Intel’s TXT via their Xen-derivative VMM?

Anyone care to share whether they know of any Cloud provider that PLANS to?

Thanks in advance.

Email responses welcome also [hoff @ packetfilter .com]


NESSessary Question: Will Virtualization Undermine Network Equipment Vendors?

August 30th, 2009 1 comment

Greg Ness touched off an interesting discussion when he asked “Will Virtualization Undermine Network Equipment Vendors?”  It’s a great read summarizing how virtualization (and Cloud) are really beginning to accelerate how classical networking equipment vendors are re-evaluating their portfolios in order to come to terms with these disruptive innovations.

I’ve written so much about this over the last three years and my response is short and sweet:

Virtualization has actually long been an enabler for network equipment vendors — not server virtualization, mind you, but network virtualization.  The same goes in the security space. The disruption caused by server virtualization is only acting as an accelerant — pushing the limits of scale, redefining organizational and operational boundaries, and acting as a forcing function causing wholesale reconsideration of archetypal network (and security) topologies.

The compressed timeframe associated with the disruption caused by virtualization and its adoption in conjunction with the arrival of Cloud Computing may seem unnatural given the relatively short window associated with its arrival, but when one takes the longer-term view, it’s quite natural.  We’ve seen it before in vignettes across the evolution of computing, but the convergence of economics, culture, technology and consumerism have amplified its relevance.

To answer Greg’s question, Virtualization will only undermine those network equipment vendors who were not prepared for it in the first place.  Those that were building highly virtualized, context-enabled routing, switching and security products will embrace this swing in the hardware/software pendulum and develop hybrid solutions that span the physical and virtual manifestations of what the “network” has become.

As I mentioned in my blog titled “Quick Bit: Virtual & Cloud Networking – Where It ISN’T Going…

Specifically, as it comes to understanding how the network plays in virtual and Cloud architectures, it’s not where the network *is* in the increasingly complex virtualized, converged and unified computing architectures, it’s where networking *isn’t.*

Where ISN'T The Network?

Where ISN'T The Network?

Take a look at your network equipment vendors.  Where do they play in that stack above?  Compare and contrast that with what is going on with vendors like Citrix/Xen with the Open vSwitch, VyattaArista with vEOS and Cisco with the Nexus 1000v*…interesting times for sure.


*Disclosure: I work for Cisco.