Posts Tagged ‘Service Levels (SLA)’

Google & AWS: Just Goes To Prove You Can Have Your Cloud and, um, Eat It Too…

September 25th, 2009 4 comments

…and by “eat it” I mean that how you think I mean that.  I feel for these guys, they have big targets on their backs, but that’s what happens when you’re a market leader.

To wit, there are two polarized views expressed every time Google or Amazon have an outage or service interruption given that both are constantly held up as the poster children for Cloud Computing:

  1. Cloud Computing isn’t ready for prime time; if Google or Amazon can go down, why/how can I trust them with my most critical assets!?
  2. Google and Amazon are just service providers; service providers have issues.  This isn’t a Cloud issue, it’s just a service issue.

The truth is somewhere in the middle.

Here’s my $0.02.  You may not like it.  Refunds will be processed by mail.

If you market yourself as the shit, you can expect some back when it hits the fan:

From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

Stop apologizing and live up to the hype you’re helping create.


Hey, Uh, Someone Just Powered Off Our Firewall Virtual Appliance…

June 11th, 2009 11 comments

onoffswitchI’ve covered this before in more complex terms, but I thought I’d reintroduce the topic due to a very relevant discussion I just had recently (*cough cough*)

So here’s an interesting scenario in virtualized and/or Cloud environments that make use of virtual appliances to provide security capabilities*:

Since virtual appliances (VAs) are just virtual machines (VMs) what happens when a SysAdmin spins down or moves one that happens to be your shiny new firewall protecting your production VMs behind it, accidentally or maliciously?  Brings new meaning to the phrase “failing closed.”

Without getting into the vagaries of vendor specific mobility-enabled/enabling technologies, one of the issues with VMs/VAs is that there’s not really a good way of designating one as being “more important” or functionally differentiated such as “security” or “critical application” that would otherwise ensure a higher priority for service availability (read: don’t spin this down unless…) or provide a topological dependency hierarchy in virtualized network constructs.

Unlike physical environments where system administrators (servers) are segregated from access to network and security appliances, this isn’t the case in virtual environments. In Cloud environments (especially public, multi-tenant) where we are often reliant only upon virtual security capabilities since we have no option for physical alternatives, this is an interesting corner case.

We’ve talked a lot about visibility, audit and policy management in virtual environments and this is a poignant example.


*Despite the silly notion that the Google dudes tried to suggest I equated virtualization with Cloud as one-in-the-same, I don’t.

The Nines Have It…

June 8th, 2009 4 comments

fiveninesThere are numerous cliches and buzzwords we hear daily that creep into our lexicon without warrant of origin or meaning.

One of them that you’re undoubtedly used to hearing relates to the measurement of availability expressed as a percentage: the dreaded “nines.”

I read a story this morning on the launch of the “Stratus Trusted Cloud” that promises the following:

Since it is built on the industry’s most robust, scalable, fully redundant architecture, Stratus delivers unmatched performance, availability and security with 99.99% SLAs.

It’s interesting to note what 99.99% availability means within the context of an SLA — “four nines” means you have the equivalent of 52.6 minutes of resource unavailability per year.  That may sound perfectly wonderful and may even lead some to consider that this exceeds what many enterprises can deliver today (I’m interested in the veracity of these claims.)  However, I would ask you to consider this point:

I don’t have access to the contract/SLA to know whether this metric refers to total availability that includes both planned and unplanned downtime or only planned downtime.

This is pretty important, especially in light of what we’ve seen with other large and well-established Cloud service providers who offer similar or better  SLA’s (with or without real fiscal repercussion) and have experienced unplanned outages for hours on end.

Is four nines good enough for your most critical applications?  Do you measure this today?  Does it even matter?


Here’s a handy Wikipedia reference on availability table you can print out:

Availability % Downtime per year Downtime per month* Downtime per week
90% 36.5 days 72 hours 16.8 hours
95% 18.25 days 36 hours 8.4 hours
98% 7.30 days 14.4 hours 3.36 hours
99% 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% (“three nines”) 8.76 hours 43.2 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% (“four nines”) 52.6 minutes 4.32 minutes 1.01 minutes
99.999% (“five nines”) 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% (“six nines”) 31.5 seconds 2.59 seconds 0.605 seconds

* For monthly calculations, a 30-day month is used.