Archive for January, 2010

Where Are the Network Virtual Appliances? Hobbled By the Virtual Network, That’s Where…

January 31st, 2010 15 comments

Allan Leinwand from GigaOm wrote a great article asking “Where are the network virtual appliances?” This was followed up by another excellent post by Rich Miller.

Allan sets up the discussion describing how we’ve typically plumbed disparate physical appliances into our network infrastructure to provide discrete network and security capabilities such as load balancers, VPNs, SSL termination, firewalls, etc.  He then goes on to describe the stunted evolution of virtual appliances:

To be sure, some networking devices and appliances are now available in virtual form.  Switches and routers have begun to move toward virtualization with VMware’s vSwitch, Cisco’s Nexus 1000v, the open source Open vSwitch and routers and firewalls running in various VMs from the company I helped found, Vyatta.  For load balancers, Citrix has released a version of its Netscaler VPX software that runs on top of its virtual machine, XenServer; and Zeus Systems has an application traffic controller that can be deployed as a virtual appliance on Amazon EC2, Joyent and other public clouds.

Ultimately I think it prudent for discussion’s sake to separate routing, switching and load balancing (connectivity) from functions such as DLP, firewalls, and IDS/IPS (security) as lumping them together actually abstracts the problem which is that the latter is completely dependent upon the capabilities and functionality of the former.  This is what Allan almost gets to when describing his lament with the virtual appliance ecosystem today:

Yet the fundamental problem remains: Most networking appliances are still stuck in physical hardware — hardware that may or may not be deployed where the applications need them, which means those applications and their associated VMs can be left with major gaps in their infrastructure needs. Without a full-featured and stateful firewall to protect an application, it’s susceptible to various Internet attacks.  A missing load balancer that operates at layers three through seven leaves a gap in the need to distribute load between multiple application servers. Meanwhile, the lack of an SSL accelerator to offload processing may lead to performance issues and without an IDS device present, malicious activities may occur.  Without some (or all) of these networking appliances available in a virtual environment, a VM may find itself constrained, unable to take full advantage of the possible economic benefits.

I’ve written about this many, many times. In fact almost three years ago I created a presentation called  “The Four Horsemen of the Virtualization Security Apocalypse” which described in excruciating detail how network virtual appliances were a big ball of fail and would be for some time. I further suggested that much of the “best-of-breed” products would ultimately become “good enough” features in virtualization vendor’s hypervisor platforms.

Why?  Because there are some very real problems with virtualization (and Cloud) as it relates to connectivity and security:

  1. Most of the virtual network appliances, especially those “ported” from the versions that usually run on dedicated physical hardware (COTS or proprietary) do not provide feature, performance, scale or high-availability parity; most are hobbled or require per-platform customization or re-engineering in order to function.
  2. The resilience and high availability options from today’s off-the-shelf virtual connectivity does not pair well with the mobility and dynamism of de-coupled virtual machines; VMs are ultimately temporal and networks don’t like topological instability due to key components moving or disappearing
  3. The performance and scale of virtual appliances still suffer when competing for I/O and resources on the same physical hosts as the guests they attempt to protect
  4. Virtual connectivity is a generally a function of the VMM (or a loadable module/domain therein.) The architecture of the VMM has dramatic impact upon the architecture of the software designed to provide the connectivity and vice versa.
  5. Security solutions are incredibly topology sensitive.  Given the scenario in #1 when a VM moves or is distributed across the pooled infrastructure, unless the security capabilities are already present on the physical host or the connectivity and security layers share a control plane (or at least can exchange telemetry,) things will simply break
  6. Many virtualization (and especially cloud) platforms do not support protocols or topologies that many connectivity and security virtual appliances require to function (such as multicast for load balancing)
  7. It’s very difficult to mimic the in-line path requirements in virtual networking environments that would otherwise force traffic passing through the connectivity layers (layers 2 through 7) up through various policy-driven security layers (virtual appliances)
  8. There is no common methodology to express what security requirements the connectivity fabrics should ensure are available prior to allowing a VM to spool up let alone move
  9. Virtualization vendors who provide solutions for the enterprise have rich networking capabilities natively as well as with third party connectivity partners, including VM and VMM introspection capabilities. As I wrote about here, mass-market Cloud providers such as Amazon Web Services or Rackspace Cloud have severely crippled networking.
  10. Virtualization and cloud vendors generally force many security vs. performance tradeoffs when implementing introspection capabilities in their platforms: third party code running in the kernel, scheduler prioritization issues, I/O limitations, etc.
  11. Much of the basic networking capabilities are being pushed lower into silicon (into the CPUs themselves) which makes virtual appliances even further removed from the guts that enable them
  12. Physical appliances (in the enterprise) exist en-mass.  Many of them provide highly scalable solutions to the specific functions that Alan refers to.  The need exists, given the limitations I describe above, to provide for integration/interaction between them, the VMM and any virtual appliances in order to offload certain functions as well as provide coverage between the physical and the logical.

What does this mean?  It means that ultimately to ensure their own survival, virtualization and cloud providers will depend less upon virtual appliances and add more of the basic connectivity AND security capabilities into the VMMs themselves as its the only way to guarantee performance, scalability, resilience and satisfy the security requirements of customers. There will be new generations of protocols, APIs and control planes that will emerge to provide for this capability, but this will drive the same old integration battles we’re supposed to be absolved from with virtualization and Cloud.

Connectivity and security vendors will offer virtual replicas of their physical appliances in order to gain a foothold in virtualized/cloud environments in order to intercept traffic (think basic traps/ACL’s) and then interact with higher-performing physical appliance security service overlays or embedded line cards in service chassis.  This is especially true in enterprises but poses many challenges in software-only, mass-market cloud environments where what you’ll continue to get is simply basic connectivity and security with limited networking functionality.  This implies more and more security will be pushed into the guest and application logic layers to deal with this disconnect.

This is exactly where we are today with Cloud providers like Amazon Web Services: basic ingress-only filtering with a very simplistic, limited and abstracted set of both connectivity and security capability.  See “Dear Public Cloud Providers: Please Make Your Networking Capabilities Suck Less. Kthxbye”  Will they add more functionality?  Perhaps. The question is whether they can afford to in order to limit the impact that connecitivity and security variability/instability can bring to an environment.

That said, it’s certainly achievable, if you are willing and able to do so, to construct a completely software-based networking environment, but these environments require a complete approach and stack re-write with an operational expertise that will be hard to support for those who have spent the last 20 years working in a different paradigm and that’s a huge piece of this problem.

The connectivity layer — however integrated into the virtualized and cloud environments they seem — continues to limit how and what the security layers can do and will for some time, thus limiting the uptake of virtual network and security appliances.

Situation normal.


Reblog this post [with Zemanta]

Hacking Exposed: Virtualization & Cloud Computing…Feedback Please

January 30th, 2010 26 comments

Craig Balding, Rich Mogull and I are working on a book due out later this year.

It’s the latest in the McGraw-Hill “Hacking Exposed” series.  We’re focusing on virtualization and cloud computing security.

We have a very interesting set of topics to discuss but we’d like to crowd/cloud-source ideas from all of you.

The table of contents reads like this:

Part I: Virtualization & Cloud Computing:  An Overview
Case Study: Expand the Attack Surface: Enterprise Virtualization & Cloud Adoption
Chapter 1: Virtualization Defined
Chapter 2: Cloud Computing Defined

Part II: Smash the Virtualized Stack
Case Study: Own the Virtualized Enterprise
Chapter 3: Subvert the CPU & Chipsets
Chapter 4: Harass the Host, Hypervisor, Virtual Networking & Storage
Chapter 5: Victimize the Virtual Machine
Chapter 6: Conquer the Control Plane & APIs

Part III: Compromise the Cloud
Case Study: Own the Cloud for Fun and Profit
Chapter 7: Undermine the Infrastructure
Chapter 8: Manipulate the Metastructure
Chapter 9: Assault the Infostructure

Part IV: Appendices

We’ll have a book-specific site up shortly, but if you’d like to see certain things covered (technology, operational, organizational, etc.) please let us know in the comments below.

Also, we’d like to solicit a few critical folks to provide feedback on the first couple of chapters. Email me/comment if interested.


/Hoff, Craig and Rich.

Reblog this post [with Zemanta]

MashSSL – An Excellent Idea You’ve Probably Never Heard Of…

January 30th, 2010 No comments

I’ve been meaning to write about MashSSL for a while as it occurs to me that this is a particularly elegant solution to some very real challenges we have today.  Trusting the browser, operator of said browser or a web service when using multi-party web applications is a fatal flaw.

We’re struggling with how to deal with authentication in distributed web and cloud applications. MashSSL seems as though it’s a candidate for the toolbox of solutions:

MashSSL allows web applications to mutually authenticate and establish a secure channel without having to trust the user or the browser. MashSSL is a Layer 7 security protocol running within HTTP in a RESTful fashion. It uses an innovation called “friend in the middle” to turn the proven SSL protocol into a multi-party protocol that inherits SSL’s security, efficiency and mature trust infrastructure

Make sure you check out the sections on “Why and How,” especially the “MashSSL Overview” section which explains how it works.

I should mention the code is also open source.


Cloud: Security Doesn’t Matter (Or, In Cloud, Nobody Can Hear You Scream)

January 25th, 2010 9 comments

In the Information Security community, many of us have long come to the conclusion that we are caught in what I call my “Security Hamster Sine Wave Of Pain.”  Those of us who have been doing this awhile recognize that InfoSec is a zero-sum game; it’s about staving off the inevitable and trying to ensure we can deal with the residual impact in the face of being “survivable” versus being “secure.”

While we can (and do) make incremental progress in certain areas, the collision of disruptive innovation, massive consumerization of technology along with the slow churn of security vendor roadmaps, dissolving budgets, natural marketspace commoditzation and the unfortunate velocity of attacker innovation yields the constant realization that we’re not motivated or incentivized to do the right thing or manage risk.

Instead, we’re poked in the side and haunted by the four letter word of our industry: compliance.

Compliance is often dismissed as irrelevant in the consumer space and associated instead with government or large enterprise, but as privacy continues to erode and breaches make the news, the fact that we’re putting more and more of our information — of all sorts — in the hands of others to manage is again beginning to stoke an upsurge in efforts to somehow measure and manage visibility against a standardized baseline of general, common sense and minimal efforts to guard against badness.

Ultimately, it doesn’t matter how “secure” Cloud providers suggest they are.  It doesn’t matter what breakthroughs in technology sprout up in the face of this new model of compute. The only measure that counts in the long run is how compliant you are.  That’s what will determine the success of Cloud.  Don’t believe me? Look at how the leading vendors in Cloud are responding today to their biggest (potential) customers — taking the “one size fits all” model of mass-market Cloud and beginning to chop it up and create one-off’s in order to satisfy…compliance.

Why?  Because it’s easier to deal with the vagaries of trust and isolation and multi-tenant environments by eliminating the latter to increase the former. If an auditor/examiner doesn’t understand or cannot measure your compliance to those things he/she is tasked to evaluate you against, you’re sunk.

The only thing that will budge the needle on this issue is how agile those who craft the regulatory guidelines are or how you can clearly demonstrate why your compensating controls mitigate the risk of the provider of service if they cannot. Given the nature and behavior of those involved in this space and where we are with putting our eggs in a vaporous basket, I wouldn’t hold my breath.  Movement in this area is glacial at best and in many cases out of touch with the realities of just how disruptive Cloud Computing is.  All it will take is one monumental cock-up due to a true Cloudtastrophe and the Cloud will hit the fan.

As I have oft suggested, the core issue we need to tackle in Cloud is trust, since the graceful surrender of such is at the heart of what Cloud requires.  Trust is comprised of Security, Control, Service Levels and Compliance.  It’s relatively easy to establish where we are today with the first three, but the last one is MIA.  We’re just *now* seeing movement in the form of SIGs to deal with virtualization.  Cloud?

When the best you have is a SAS-70, it’s time to weep.  Conversely, wishing for more regulation will simply extend the cycle.

What can you do?  Simple. Help educate your auditors and examiners. Read the Cloud Security Alliance’s guidelines. Participate in making the Automated Audit, Assertion, Assessment, and Assurance API (A6) a success so we can at least gain back some visibility and transparency which helps demonstrate compliance, since that’s how we’re measured.  Ultimately, if you’re able, focus on risk assessment in helping to advise your constituent business customers on how to migrate to Cloud Computing safely.

There are TONS of things one can do in order to make up for the shortcomings of Cloud security today.  The problem is, most of them erode the benefits of Cloud: agility, flexibility, cost savings, and dynamism.  We need to make the business aware of these tradeoffs as well as our auditors because we’re stuck.  We need the regulators and examiners to keep pace with technology — as painful as that might be in the short term — to guarantee our success in the long term.

Manage compliance, don’t let it manage you because a Cloud is a terrible thing to waste.


Reblog this post [with Zemanta]

Incomplete Thought: Batteries – The Private Cloud Equivalent Of Electrical Utility…

January 24th, 2010 21 comments

While I think Nick Carr’s power generation utility analogy was a fantastic discussion catalyst for the usefulness of a utility model, it is abused to extremes and constrains what might ordinarily be more open-minded debate on the present and future of computing.

This is a debate that continues to rise every few days on Twitter and the Blogosphere, fueled mostly by what can only be described from either side of the argument as a mixture of ideology, dogma, passionate opinion, misunderstood perspective and a squinty-eyed mistrust of agendas.

It’s all a bit silly, really, as both Public and Private Cloud have their place; when, for how long and for whom is really at the heart of the issue.

The notion that the only way “true” benefits can be realized from Cloud Computing are from massively-scaled public utilities and that Private Clouds (your definition will likely differ) are simply a way of IT making excuses for the past while trying to hold on to the present, simply limits the conversation and causes friction rather than reduces it.  I believe that a hybrid model will prevail, as it always has.  There are many reasons for this. I’ve talked about them a lot.

This got me thinking about why and here’s my goofy thought for consideration of the “value” and “utility” of Private Cloud:

If the power utility “grid” represents Public Cloud, then perhaps batteries are a reasonable equivalent for Private Cloud.

I’m not going to explain this analogy in full yet, but wonder if it makes any sense to you.  I’d enjoy your thoughts on what you think I’m referring to. 😉


“Vint & Me” – Kickin’ Butt & Takin’ Names (Unfortunately Mine…)

January 21st, 2010 4 comments

I think perhaps my choice of words were met with an unfortunate style of punctuation I was not expecting…

The Internet — once again kicking security’s ass, Karate Kid style, no less…

It seems I’m going to have to sharpen my mad skills, as the previous two meetings have led to similar results:


Categories: Uncategorized Tags:

Cloud: Over Subscription vs. Over Capacity – Two Different Things

January 15th, 2010 10 comments

There’s been a very interesting set of discussions lately regarding performance anomalies across Cloud infrastructure providers.  The most recent involves Amazon Web Services and RackSpace Cloud. Let’s focus on the former because it’s the one that has a good deal of analysis and data attached to it.

Reuven Cohen’s post (Oversubscribing the Cloud) summarizing many of these concerns speaks to the meme wherein he points to Alan Williamson’s initial complaints (Has Amazon EC2 become over subscribed?) followed by CloudKick’s very interesting experiments and data (Visual Evidence of Amazon EC2 network issues) and ultimately Rich Miller’s summary including a response from Amazon Web Services (Amazon: We Don’t Have Capacity Issues)

The thing that’s interesting to me in all of this is yet another example of people mixing metaphors, terminology and common operating methodologies as well as choosing to suspend disbelief and the reality distortion field associated with how service providers actually offer service versus marketing it.

Here’s the kicker: over subscription is not the same thing as over capacity. BY DESIGN, modern data/telecommuication (and Cloud) networks are built using an over-subscription model.

On the other hand, the sad truth is that we will have over capacity issues in cloud; it’s simply a sad intersection of the laws of physics and the delicate balance associated with cost control and service delivery.

Let me frame the following with an example: when you purchase an “unlimited data plan” from a telco or hosting company, you’ll notice normally that this does not have latency or throughput figures attached to it…same with Cloud.  You shouldn’t be surprised by this. If you are, you might want to rethink your approach to service level expectation.

Short and sweet:

  1. There is no such thing as infinite scale.  There is no such thing as an “unlimited ____ plan.”* Even in Cloud. Every provider has limits, even if they’re massive. Adding the word Cloud simply squeezes the limit balloon from you to them and it’s a tougher problem to solve at scale. It doesn’t eliminate the issue, even with “elasticity.”
  2. Allow me to repeat: over subscription is not the same thing as over capacity. BY DESIGN, modern data/telecommuication (and Cloud) networks are built using an over-subscription model.  I don’t need to explain why, I trust.
  3. Capacity refers to the ability, within service level specifications, to meet the contracted needs of the customer and operate within acceptable thresholds. Depending upon how a provider measures that and communicates it to you, you may be horribly surprised if you chose the marketing over the engineering explanations of such.
  4. Capacity is also not the same as latency, is not the same as throughput…
  5. Over capacity means that the provider’s over-subscription modeling was flawed and suggests that the usage patterns overwhelmed the capacity threshold and they had no way of adding capacity in a manner which allows them to satisfy demand

Why is this important?  Because the “illusion” of infinite scale is just that.

The abstraction at the infrastructure layer of compute, network and storage — especially delivered in software — still relies on the underlying capacity of the pipes and bit-buckets that deliver them. It’s a never-ending see-saw movement of Metcalfe’s and Moore’s laws.

The discrete packaging of each virtualized CPU compute element sizing within an AWS or Rackspace is relatively easy to forecast and yields a reasonably helpful “fixed” capacity planning data point; it has a minima of zero and a maxima associated with the peak compute hours/vCPU clock rating of the instance.

The network piece and its relationship to the compute piece is where it gets interesting.  Your virtual interface ultimately is bundled together in aggregate with other tenants colocated on the same physical host and competes for a share of pipe (usually one or more single or trunked 1Gb/s or 10Gb/s Ethernet.) Network traffic in terms of measurement, capacity planning and usage must take into consideration the facts that it is both asymmetric, suffers from variability in bucket size, and is very, very bursty. There’s not generally a published service level associated with throughput in Cloud.

This complicates things when you consider that at this point scaling out in CPU is easier to do than scaling out in the network.  Add virtualization into the mix which drives big, flat, L2 networks as a design architecture layered with a control plane that is now (in the case of Cloud) mostly software driven, provisioned, orchestrated and implemented, and it’s no wonder that folks like Google, Amazon and Facebook are desparate for hugely dense, multi-terabit, wire speed L2 switching fabrics and could use 40 and 100Gb/s Ethernet today.

Check out this interesting article.

Oh, let’s not forget that there are also now providers who are deploying converged data/storage networking of said pipes with the likes of FCoE/DCE with all sorts of interesting ramifications on the above discussion.  If you thought it was tough to get your arms around before…

If you know much about Ethernet, congestion avoidance/recovery/control, QoS, etc. you know that it’s a complex beast. If service levels relating to network performance aren’t in your contract, you’re probably figuring out why right about now.

So, wrapping this up, I have to accept AWS’ statement that they “…do not have over-capacity issues,” because quite frankly there’s nothing to suggest otherwise.  That’s not to say there aren’t performance issues are related to something else (like software or hardware in the stack) but that’s not the same as being over capacity — and you’ll notice that they didn’t say they were not “over-subscribed” but rather they were not “over capacity.” 😉


*Just ask AT&T about their network and the iPhone. This *is* a case where their over-subscription planning failed in the face of capacity…and continues to.

Categories: Cloud Computing Tags: ,

Cloud Light Presents: Real Men Of Genius – Mr. Dump All Your Crap In the Cloud Guy.

January 11th, 2010 3 comments

It’s full of awesomesauce.


Cloud Light Presents…Real Men of Genius
{Real Men of Genius…}

Today we salute you, Mr. Dump-All-Your-Crap-In-the-Cloud Guy
{Mr. Dump-All-Your-Crap-In-the-Cloud Guy}

Some seek danger in cliff diving…others? Competitive eating…flamethrowing or ferret wrestling. But You? You put data in other people’s hands in the Cloud
{You’re asking for it}

Armed with a SAS-70 and a license to commit PCI, you live your life with a simple code: Finders keepers, losers weepers
{Finders Keepers}

Some people mock you, sure. But you paid $8.32 for your EC2 spot instances and well, you just can’t get that from Dreamhost
{who’s laughin’ now?}

So crack open a cloud instance, oh King of the Cloud…we’d give you our data, but you’ve probably already lost it
{Mr. Dump-All-Your-Crap-In-the-Cloud Guy}

Cloudheiser Bushed, Poughkipsie, New Jersey…

Recording & Playback of WebEx A6 Working Group Kick-Off Call from 1/8/2010 Available

January 10th, 2010 No comments

If you’re interested in the great discussion and presentations we had during the kickoff call for the A6 (Automated Audit, Assertion, Assessment, and Assurance API) Working Group, there are two options to listen/view the WebEx recording:

Topic: A6 API Working Group – Kickoff Call-20100108 1704
Create time: 1/8/10 10:07 am
File size: 33.23MB
Duration: 1 hour 1 minute
Description: Streaming recording link:
Download recording link:…

MAKE SURE YOU VIEW THE CHAT WINDOW << It contains some really excellent discussion points.

We had two great presentations from representatives from the OGF OCCI group and CSC’s Trusted Cloud Team.

I’ll be setting up regular calls shortly and a few people have reached out to me regarding helping form the core team to begin organizing the working group in earnest.

You can also follow along via the Google Group here.


In need of a cool logo for the group by the way… 😉

To Achieve True Cloud (X/Z)en, One Must Leverage Introspection

January 6th, 2010 No comments

Back in October 2008, I wrote a post detailing efforts around the Xen community to create a standard security introspection API (Xen.Org Launches Community Project To Bring VM Introspection to Xen🙂

The Xen Introspection Project is a community effort within to leverage the existing research presented above with other work not yet public to create a standard API specification and methodology for virtual machine introspection.

That blog was focused on introspection for virtualization proper but since many of the larger cloud providers utilize Xen virtualization as an underpinning of their service architecture and as an industry we’re suffering from a lack of visibility and deployable security capabilities, the relevance of VM and VMM introspection to cloud computing is quite relevant.

I thought I’d double around and see where we are.

It looks as though there’s been quite a bit of recent activity from the folks at Georgia Tech (XenAccess Project) and the University of Alaska at Fairbanks (Virtual Introspection for Xen) referenced in my previous blog.  The vCloud API proffered via the DMTF seems to also leverage (at least some of) the VMsafe API capabilities present in VMware‘s vSphere virtualization platform.

While details are, for obvious reasons sketchy, I am encouraged in speaking to representatives from a few cloud providers who are keenly interested in including these capabilities in their offerings.  Wouldn’t that be cool?

Adoption and inclusion of introspection capabilities will overcome some of the inherent security and visibility limitations we face in highly-virtualized multi-tenant environments due to networking constraints for integrating security functionality that I wrote about here.

I plan a follow-on blog in more detail once I finish some interviews.


Reblog this post [with Zemanta]