Archive

Posts Tagged ‘Amazon’

Amazon Web Services Hires a CISO – Did You Know?

May 18th, 2010 beaker No comments
Image representing Amazon Web Services as depi...
Image via CrunchBase

Just to point out a fact many/most of you may not be aware of, but Amazon Web Services hired (transferred (?) since he was an AWS insider) Stephen Schmidt as their CISO earlier this year.  He has a team that goes along with him, also.

That’s a very, very good thing. I, for one, am very glad to see it. Combine that with folks like Steve Riley and I’m enthusiastic that AWS will make some big leaps when it comes to visibility, transparency and interaction with the security community.

See. Christmas wishes can come true! Thanks, Santa! ;)

You can find more about Mr. Schmidt by checking out his LinkedIn profile.

/Hoff

Reblog this post [with Zemanta]
  • Share/Bookmark

Cloud: Over Subscription vs. Over Capacity – Two Different Things

January 15th, 2010 beaker 8 comments

There’s been a very interesting set of discussions lately regarding performance anomalies across Cloud infrastructure providers.  The most recent involves Amazon Web Services and RackSpace Cloud. Let’s focus on the former because it’s the one that has a good deal of analysis and data attached to it.

Reuven Cohen’s post (Oversubscribing the Cloud) summarizing many of these concerns speaks to the meme wherein he points to Alan Williamson’s initial complaints (Has Amazon EC2 become over subscribed?) followed by CloudKick’s very interesting experiments and data (Visual Evidence of Amazon EC2 network issues) and ultimately Rich Miller’s summary including a response from Amazon Web Services (Amazon: We Don’t Have Capacity Issues)

The thing that’s interesting to me in all of this is yet another example of people mixing metaphors, terminology and common operating methodologies as well as choosing to suspend disbelief and the reality distortion field associated with how service providers actually offer service versus marketing it.

Here’s the kicker: over subscription is not the same thing as over capacity. BY DESIGN, modern data/telecommuication (and Cloud) networks are built using an over-subscription model.

On the other hand, the sad truth is that we will have over capacity issues in cloud; it’s simply a sad intersection of the laws of physics and the delicate balance associated with cost control and service delivery.

Let me frame the following with an example: when you purchase an “unlimited data plan” from a telco or hosting company, you’ll notice normally that this does not have latency or throughput figures attached to it…same with Cloud.  You shouldn’t be surprised by this. If you are, you might want to rethink your approach to service level expectation.

Short and sweet:

  1. There is no such thing as infinite scale.  There is no such thing as an “unlimited ____ plan.”* Even in Cloud. Every provider has limits, even if they’re massive. Adding the word Cloud simply squeezes the limit balloon from you to them and it’s a tougher problem to solve at scale. It doesn’t eliminate the issue, even with “elasticity.”
  2. Allow me to repeat: over subscription is not the same thing as over capacity. BY DESIGN, modern data/telecommuication (and Cloud) networks are built using an over-subscription model.  I don’t need to explain why, I trust.
  3. Capacity refers to the ability, within service level specifications, to meet the contracted needs of the customer and operate within acceptable thresholds. Depending upon how a provider measures that and communicates it to you, you may be horribly surprised if you chose the marketing over the engineering explanations of such.
  4. Capacity is also not the same as latency, is not the same as throughput…
  5. Over capacity means that the provider’s over-subscription modeling was flawed and suggests that the usage patterns overwhelmed the capacity threshold and they had no way of adding capacity in a manner which allows them to satisfy demand

Why is this important?  Because the “illusion” of infinite scale is just that.

The abstraction at the infrastructure layer of compute, network and storage — especially delivered in software — still relies on the underlying capacity of the pipes and bit-buckets that deliver them. It’s a never-ending see-saw movement of Metcalfe’s and Moore’s laws.

The discrete packaging of each virtualized CPU compute element sizing within an AWS or Rackspace is relatively easy to forecast and yields a reasonably helpful “fixed” capacity planning data point; it has a minima of zero and a maxima associated with the peak compute hours/vCPU clock rating of the instance.

The network piece and its relationship to the compute piece is where it gets interesting.  Your virtual interface ultimately is bundled together in aggregate with other tenants colocated on the same physical host and competes for a share of pipe (usually one or more single or trunked 1Gb/s or 10Gb/s Ethernet.) Network traffic in terms of measurement, capacity planning and usage must take into consideration the facts that it is both asymmetric, suffers from variability in bucket size, and is very, very bursty. There’s not generally a published service level associated with throughput in Cloud.

This complicates things when you consider that at this point scaling out in CPU is easier to do than scaling out in the network.  Add virtualization into the mix which drives big, flat, L2 networks as a design architecture layered with a control plane that is now (in the case of Cloud) mostly software driven, provisioned, orchestrated and implemented, and it’s no wonder that folks like Google, Amazon and Facebook are desparate for hugely dense, multi-terabit, wire speed L2 switching fabrics and could use 40 and 100Gb/s Ethernet today.

Check out this interesting article.

Oh, let’s not forget that there are also now providers who are deploying converged data/storage networking of said pipes with the likes of FCoE/DCE with all sorts of interesting ramifications on the above discussion.  If you thought it was tough to get your arms around before…

If you know much about Ethernet, congestion avoidance/recovery/control, QoS, etc. you know that it’s a complex beast. If service levels relating to network performance aren’t in your contract, you’re probably figuring out why right about now.

So, wrapping this up, I have to accept AWS’ statement that they “…do not have over-capacity issues,” because quite frankly there’s nothing to suggest otherwise.  That’s not to say there aren’t performance issues are related to something else (like software or hardware in the stack) but that’s not the same as being over capacity — and you’ll notice that they didn’t say they were not “over-subscribed” but rather they were not “over capacity.” ;)

/Hoff

*Just ask AT&T about their network and the iPhone. This *is* a case where their over-subscription planning failed in the face of capacity…and continues to.

  • Share/Bookmark
Categories: Cloud Computing Tags: ,

Silent Lucidity: IaaS — Already A Dinosaur? The Evolution of PaaSasaurus Rex…

November 12th, 2009 beaker 8 comments

dinosaurSitting in an impressive room at the Google campus in Mountain View last month, I asked the collective group of brainpower a slightly rhetorical question:

How much longer do you feel pure-play Infrastructure-As-A-Service will be a relevant service model within the spectrum of cloud services?

I couched the question with previous “incomplete thoughts*” relating to the move “up-stack” by IaaS providers — providing value-added, at-cost services to both differentiate and soften the market for what I call the “PaaSification” of the consumer.  I also highlighted the move “down-stack” by SaaS vendors building out platforms to support a broader ecosystem and value proposition.

In the long term, I think ultimately the trichotomy of the SPI model will dissolve thanks to commoditization and the need for providers to differentiate — even at mass scale.  We’ll ultimately just talk about service delivery and the platform(s) used to deliver them.  Infrastructure will enable these services, of course, but that’s not where the money will come from.

Just look at the approach of providers such as Amazon, Terremark and Savvis and how they are already clawing their way up the PaaS stack, adding more features and functions that either equalize public cloud capabilities with those of the enterprise or even differentiate from it.  Look at Microsoft’s Azure.  How about Heroku, Engine Yard, Joyent?  How about VMware and Springsource?  All platform plays. Develop, click, deploy.

As I mention in my Cloudifornication presentation, I think that from a security perspective, PaaS offers the potential of eliminating entire classes of vulnerabilities in the application development lifecycle by enforcing sanitary programmatic practices across the derivate works built upon them.  I look forward also to APIs and standards that allow for consistency across providers. I think PaaS has the greatest potential to deliver this.

There are clearly trade-offs here, but as we start to move toward the two key differentiators (at least for public clouds) — management and security — I think the value of PaaS will really start to shine.

Probably just another bout of obviousness, but if I were placing bets, this is where I’d sink my nickels.

You?

/Hoff

* The most relevant “incomplete thought” is the one titled “Incomplete Thought: Virtual Machines Are the Problem, Not the Solution…” in which I kicked around the notion that virtualization-enabled IaaS and the VM containers they enable are simply an ugly solution to an uglier problem…

  • Share/Bookmark

Dear Santa: All I Want For Christmas On My Amazon Wishlist Is a Straight Answer…

October 31st, 2009 beaker 4 comments

A couple of weeks ago amidst another interesting Amazon Web Services announcement featuring the newly-arrived Relational Database Service, Werner Vogels (Amazon CTO) jokingly retweeted a remark that someone made suggesting he was like “…Santa for nerds.”

All I want for Christmas is my elastic IP...

All I want for Christmas is my elastic IP...

So, now that I have Werner following me on Twitter and a confirmed mailing address (clearly the North Pole) I thought I’d make my Christmas wish early this year.  I’ve put a lot of thought into this.

Just when I had settled on a shiny new gadget from the bookstore side of the house, I saw Amazon’s response to Eran Tromer’s (et al) research on Cloud Cartography featured in this Computerworld article written by my old friend Jaikumar Vijayan titled “Amazon downplays report highlighting vulnerabilities in its cloud service.”

I feature Eran and his team’s work in my Cloudifornication presentation.  You can read more about it on Craig’s blog here.

I quickly cast aside my yuletyde treasure list and instead decided to ask Santa (Werner/AWS) for a most important present: a straight answer from AWS that isn’t delivered by a PR spokeshole that instead speaks openly, transparently and in an engaging fashion with customers and the security community.

Here’s what torqued me (emphasis is mine):

In response, Amazon spokeswoman Kay Kinton said today that the report describes cloud cartography methods that could increase an attacker’s probability of launching a rogue virtual machine (VM) on the same physical server as another specific target VM.

What remains unclear, however, is how exactly attackers would be able to use that presence on the same physical server to then attack the target VM, Kinton told Computerworld via e-mail.

The research paper itself described how potential attackers could use so-called “side-channel” attacks to try and try and steal information from a target VM. The researchers had argued that a VM sitting on the same physical server as a target VM, could monitor shared resources on the server to make highly educated inferences about the target VM.

By monitoring CPU and memory cache utilization on the shared server, an attacker could determine periods of high-activity on the target servers, estimate high-traffic rates and even launch keystroke timing attacks to gather passwords and other data from the target server, the researchers had noted.

Such side-channel attacks have proved highly successful in non-cloud contexts, so there’s no reason why they shouldn’t work in a cloud environment, the researchers postulated.

However, Kinton characterized the attack described in the report as “hypothetical,” and one that would be “significantly more difficult in practice.”

“The side channel techniques presented are based on testing results from a carefully controlled lab environment with configurations that do not match the actual Amazon EC2 environment,” Kinton said.

“As the researchers point out, there are a number of factors that would make such an attack significantly more difficult in practice,” she said.

So while the Amazon spokesperson admits the vulnerability/capability exists, rather than rationally address that issue, thank the researchers for pointing this out and provide customers some level of detail regarding how this vulnerability is mitigated, we get handwaving that attempts to have us not focus on the vulnerability, but rather the difficulty of a hypothetical exploit.  That example isn’t the point of the paper. The fact that I could deliver a targeted attack is.

Earth to Amazon: this sort of thing doesn’t work. It’s a lousy tactic.  It simply says that either you think we’re all stupid or you’re suffering from a very bad case of incident handling immaturity. Take a look around you, there are plenty of companies doing this right. You’re not one of them.  Consistently.

Tromer and crew gave a single example of how this vulnerability might be exploited that was latched on to by the AWS spokesperson as a way of defusing the seriousness of the underlying vulnerability by downplaying this sample exploit.  There are potentially dozens of avenues to be explored here.  Craig talked about many of them in his blog (above.)  What we got instead was this:

At the same time, Amazon takes all reports of vulnerabilities in its cloud infrastructure very seriously, she said. The company will continue to investigate potential exploits thoroughly and continue to develop features bolster security for users of its cloud service, she said.

Amazon Web Services has already rolled out safeguards that prevent potential attackers from using the cartography techniques described in the paper, Kinton said without offering any details.

She also pointed to the recently launched Amazon Web Service Multi-Factor Authentication (AWS MFA) as another example of the company’s continuing effort to bolster cloud security. AWS MFA is designed to provide an extra layer access control to a customer’s Web services account, Kinton said.

Did you catch “…without offering any details” or were you simply overwhelmed by the fact that you can use a token to authenticate your single-key driven AWS console instead?

I’m not interested in getting into a “full disclosure” battle here, but being dismissive, not providing clear-cut answers and being evasive without regard for transparency about issues like this or the DDoS attacks we saw with Bitbucket, etc. are going to backfire.  I posted about this before in previous blogs here and here.

If you want to be taken seriously by large enterprises and government agencies that require real answers to issues like this, you can engage with the security community or ignore us and get focused on by it (and me) until you decide that it’s a much better idea to do the former.  You’ll gain much more credibility and an eagerness to work with you instead of against you if you choose to use the force wisely ;)

Until then, may I suggest this?  I found it in the Amazon.com bookstore:

beinghonest…you can download it to your Kindle in under a minute.

/Hoff

  • Share/Bookmark

Incomplete Thought: The Cloud Software vs. Hardware Value Battle & Why AWS Is Really A Grid…

October 18th, 2009 beaker 2 comments

Some suggest in discussing the role and long-term sustainable value of infrastructure versus software in cloud that software will marginalize bespoke infrastructure and the latter will simply commoditize.

I find that an interesting assertion, given that it tends to ignore the realities that both hardware and software ultimately suffer from a case of Moore’s Law — from speeds and feeds to the multi-core crisis, this will continue ad infinitum.  It’s important to keep that perspective.

In discussing this, proponents of software domination almost exclusively highlight Amazon Web Services as their lighthouse illustration.  For the purpose of simplicity, let’s focus on compute infrastructure.

Here’s why pointing to Amazon Web Services (AWS) as representative of all cloud offerings in general to anchor the hardware versus software value debate is not a reasonable assertion:

  1. AWS delivers a well-defined set of services designed to be deployed without modification across a massive number of customers; leveraging a common set of standardized capabilities across these customers differentiates the service and enables low cost
  2. AWS enjoys general non-variability in workload from *their* perspective since they offer fixed increments of compute and memory allocation per unit measure of exposed abstracted and virtualized infrastructure resources, so there’s a ceiling on what workloads per unit measure can do. It’s predictable.
  3. From AWS’ perspective (the lens of the provider) regardless of the “custom stuff” running within these fixed-sized containers, the main focus of their core “cloud” infrastructure actually functions like a grid — performing what amounts to a few tasks on a finely-tuned platform to deliver such
  4. This yields the ability for homogeneity in infrastructure and a focus on standardized and globalized power efficient, low cost, and easy-to-replicate components since the problem of expansion beyond a single unit measure of maximal workload capacity is simply a function of scaling out to more of them (or stepping up to one of the next few rungs on the scale-up ladder)

Yup, I just said that AWS is actually a grid whose derivative output is a set of cloud services.

Why does this matter?  Because not all IaaS cloud providers are architected to achieve this — by design — and this has dramatic impact on where hardware and software, leveraged independently or as a total solution, play in the debate.

This is because AWS built and own the entire “CloudOS” stack from customized hardware through to the VMM, management and security planes (much as Google does the same) versus other providers who use what amounts to more generic software offerings from the likes of VMware and lean on API’s and an ecosystem to extend it’s capabilities as well as big iron to power it.  This will yield more customizable offerings that likely won’t scale as highly as AWS.

That’s because they’re not “grids” and were never designed to be.

Many other IaaS providers that have evolved from hosting are building their next-generation offerings from unified fabric and unified computing platforms (so-called “big iron”) which are the furtherest thing from “commodity” hardware you can get.  Further, SaaS and PaaS providers generally tend to do the same based on design goals and business models.  Remember, IaaS is not representative of all things cloud — it’s only one of the service models.

Comparing AWS to most other IaaS cloud providers is a false argument upon which to anchor the hardware versus software debate.

/Hoff

  • Share/Bookmark

Amazon Web Services: It’s Not The Size Of the Ship, But Rather The Motion Of the…

October 16th, 2009 beaker 3 comments
From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

Carl Brooks (@eekygeeky) gets some fantastic, thought-provoking interviews.  His recent article wherein he interviewed Peter DeSantis, VP of EC2, Amazon Web Services, was titled: “Amazon would like to remind you where the hype started” is another great example.

However, this article left a bad taste in my mouth and ultimately invites more questions than it answers. Frankly I felt like there was a large amount of hand-waving in DeSantis’ points that glossed over some very important issues related to security issues of late.

DeSantis’ remarks implied, per the title of the article, that to explain the poor handling and continuing lack of AWS’ transparency related to the issues people like me raise,  the customer is to blame due to hype and overly aggressive, misaligned expectations.

In short, it’s not AWS’ fault they’re so awesome, it’s ours.  However, please don’t remind them they said that when they don’t live up to the hype they help perpetuate.

You can read more about that here “Transparency: I Do Not Think That Means What You Think That Means…

I’m going to skip around the article because I do agree with Peter DeSantis on the points he made about the value proposition of AWS which ultimately appear at the end of the article:

“A customer can come into EC2 today and if they have a website that’s designed in a way that’s horizontally scalable, they can run that thing on a single instance; they can use [CloudWatch[] to monitor the various resource constraints and the performance of their site overall; they can use that data with our autoscaling service to automatically scale the number of hosts up or down based on demand so they don’t have to run those things 24/7; they can use our Elastic Load Balancer service to scale the traffic coming into their service and only deliver valid requests.”

“All of which can be done self-service, without talking to anybody, without provisioning large amounts of capacity, without committing to large bandwidth contracts, without reserving large amounts of space in a co-lo facility and to me, that’s a tremendously compelling story over what could be done a couple years ago.”

Completely fair.  Excellent way of communicating the AWS value proposition.  I totally agree.  Let’s keep this definitional firmly in mind as we go on.

Here’s where the story turns into something like a confessional that implies AWS is sadly a victim of their own success:

DeSantis said that the reason that stories like the DDOS on Bitbucket.org (and the non-cloud Sidekick story) is because people have come to expect always-on, easily consumable services.

“People’s expectations have been raised in terms of what they can do with something like EC2. I think people rightfully look at the potential of an environment like this and see the tools, the multi- availability zone, the large inbound transit, the ability to scale out and up and fundamentally assume things should be better. “ he said.

That’s absolutely true. We look at what you offer (and how you offered/described it above) and we set our expectations accordingly.

We do assume that things should be better as that’s how AWS has consistently marketed the service.

You can’t reasonably expect to bitch about people’s perception of the service based on how it’s “sold” and then turn around when something negative happens and suggest that it’s the consumers’ fault for setting their expectational compass with the course you set.

It *is* absolutely fair to suggest that there is no release from not using common sense, not applying good architectural logic to deployment of services on AWS, but it’s also disingenuous to expect much of the target market to whom you are selling understands the caveats here when so much is obfuscated by design.  I understand AWS doesn’t say they protect against every threat, but they also do not say they do not…until something happens where that becomes readily apparent ;)

When everything is great AWS doesn’t go around reminding people that bad things can happen, but when bad things happen it’s because of incorrectly-set expectations?

Here’s where the discussion turns to an interesting example —  the BitBucket DDoS issue.

For instance, DeSantis said it would be trivial to wash out standard DDOS attacks by using clustered server instances in different availability zones.

Okay, but four things come to mind:

  1. Why did it take 15 hours for AWS to recognize the DDoS in the first place? (They didn’t actually “detect” it, the customer did)
  2. Why did the “vulnerability” continue to exist for days afterward?
  3. While using different availability zones makes sense, it’s been suggested that this DDoS attack was internal to EC2, not externally-generated
  4. While it *is* good practice and *does* make sense, “clustered server instances in different avail. zones, costs money

Keep those things in the back of your mind for a moment…

“One of the best defenses against any sort of unanticipated spike is simply having available bandwidth. We have a tremendous amount on inbound transit to each of our regions. We have multiple regions which are geographically distributed and connected to the internet in different ways. As a result of that it doesn’t really take too many instances (in terms of hits) to have a tremendous amount of availability – 2,3,4 instances can really start getting you up to where you can handle 2,3,4,5 Gigabytes per second. Twenty instances is a phenomenal amount of bandwidth transit for a customer.” he said.

So again, here’s where I take issue with this “bandwidth solves all” answer. The solution being proposed by DeSantis here is that a customer should be prepared to launch/scale multiple instances in response to a DoS/DDoS, in effect making it the customers’ problem instead of AWS detecting and squelching it in the first place?

Further, when you think of it, the trickle-down effect of DDoS is potentially good for AWS’ business. If they can absorb massive amounts of traffic, then the more instances you have to scale, the better for them given how they charge.  Also, per my point #3 above, it looks as though the attack was INTERNAL to EC2, so ingress transit bandwidth per region might not have done anything to help here.  It’s unclear to me whether this was a distributed DoS attack at all.

Lori MacVittie wrote a great post on this very thing titled “Putting a Price on Uptime” which basically asks who pays for the results of an attack like this:

A lack of ability in the cloud to distinguish illegitimate from legitimate requests could lead to unanticipated costs in the wake of an attack. How do you put a price on uptime and more importantly, who should pay for it?

This is exactly the point I was raising when I first spoke of Economic Denial Of Sustainability (EDoS) here.  All the things AWS speaks to as solutions cost more money…money which many customers based upon their expectations of AWS’ service, may be unprepared to spend.  They wouldn’t have much better options (if any) if they were hosting it somewhere else, but that’s hardly the point.

I quote back to something I tweeted earlier “The beauty of cloud and infinite scale is that you get the benefits of infinite FAIL”

The largest DDOS attacks now exceed 40Gbps. DeSantis wouldn’t say what AWS’s bandwidth ceiling was but indicated that a shrewd guesser could look at current bandwidth and hosting costs and what AWS made available, and make a good guess.

The tests done here showed the capability  to generate 650 Mbps from a single medium instance that attacked another instance which, per Radim Marek, was using another AWS account in another availability zone.  So if the “largest” DDoS attacks now exceed 40 Gbps” and five EC2 instances can handle 5Gb/s, I’d need 8 instances to absorb an attack of this scale (unknown if this represents a small or large instance.)  Seems simple, right?

Again, this about absorbing bandwidth against these attacks, not preventing them or defending against them.  This is about not only passing the buck by squeezing more of them out of you, the customer.

“ I don’t want to challenge anyone out there, but we are very, very large environment and I think there’s a lot of data out there that will help you make that case.” he said.

Of course you wish to challenge people, that’s the whole point of your arguments, Peter.

How much bandwidth AWS has is only one part of the issue here.  The other is AWS’ ability to respond to such attacks in reasonable timeframes and prevent them in the first place as part of the service.  That’s a huge part of what I expect from a cloud service.

So let’s do what DeSantis says and set our expectations accordingly.

/Hoff

  • Share/Bookmark

Transparency: I Do Not Think That Means What You Think That Means…

October 12th, 2009 beaker 5 comments

vizziniHa ha! You fool! You fell victim to one of the classic blunders – The most famous of which is “never get involved in a cloud war in Asia” – but only slightly less well-known is this: “Never go against Werner when availability is on the line!”

As an outsider, it’s easy to play armchair quarterback, point fingers and criticize something as mind-bogglingly marvelous as something the size and scope of Amazon Web Services.  After all, they make all that complexity disappear under the guise of a simple web interface to deliver value, innovation and computing wonderment the likes of which are really unmatched.

There’s an awful lot riding on Amazon’s success.  They set the pace by which an evolving industry is now measured in terms of features, functionality, service levels, security, cost and the way in which they interact with customers and the community of ecosystem partners.

An interesting set of observations and explanations have come out of recent events related to degraded performance, availability and how these events have been handled.

When something bad happens, there’s really two ways to play things:

  1. Be as open as possible, as quickly as possible and with as much detail as possible, or
  2. Release information only as needed, when pressured and keep root causes and their resolutions as guarded as possible

This, of course, is an over-simplification of the options, complicated by the need for privacy, protection of intellectual property, legal issues, compliance or security requirements.  That’s not really any different than any other sort of service provider or IT department, but then again, Amazon’s Web Services aren’t like any other sort of service provider or IT department.

So when something bad happens, it’s been my experience as a customer (and one that admittedly does not pay for their “extra service”) that sometimes notifications take longer than I’d like, status updates are not as detailed as I might like and root causes sometimes cloaked in the air of the mysterious “network connectivity problem” — a replacement for the old corporate stand-by of “blame the firewall.”  There’s an entire industry cropping up to help you with these sorts of things.

Something like the BitBucket DDoS issue however, is not a simple “network connectivity problem.”  It is, however, a problem which highlights an oft-played pantomime of problem resolution involving any “managed” service being provided by a third party to which you as the customer have limited access at various critical points in the stack.

This outage represents a disconnect in experience versus expectation with how customers perceive the operational underpinnings of AWS’ operations and architecture and forces customers to reconsider how all that abstracted infrastructure actually functions in order to deliver what — regardless of what the ToS say — they want to believe it delivers.  This is that perception versus reality gap I mentioned earlier.  It’s not the redonkulous “end-of-cloud” scenarios parroted by the masses of the great un(cloud)washed, but it’s serious nonetheless.

As an example, BitBucket’s woes of over 20+ hours of downtime due to UDP (and later TCP) DDoS floods led to the well-documented realization that support was inadequate, monitoring insufficient and security defenses lacking — from the perspective of both the customer and AWS*.  The reality is that based on what we *thought* we knew about how AWS functioned to protect against these sorts of things, these attacks should never have wrought the damage they did.  It seems AWS was equally as surprised.

It’s important to note that these were revelations made in near real-time by the customer, not AWS.

Now, this wasn’t a widespread problem, so it’s understandable to a point as to why we didn’t hear a lot from AWS with regards to this issue, but after this all played out, when we look at what has been disclosed publicly by AWS, it appears the issue is still not remedied and despite the promise to do better, a follow-on study seems to suggest that the problem may not yet be well understood or solved by AWS (See: Amazon EC2 vulnerable to UDP flood attacks) (Ed: After I wrote this, I got a notification that this particular issue has been fixed which is indeed, good news.)

Now, releasing details about any vulnerability like this could put many many customers at risk from similar attack, but the lack of transparency  of service and architecture means that we’re left with more questions than answers. How can a customer (like me) today defend themselves against an attack like this in the lurch of not knowing what causes it or how to defend against it? What happens when the next one surfaces?

Can AWS even reliably detect this sort of thing given the “socialist security” implementation of good enough security spread across its constituent customers?

Security by obscurity in cloud cannot last as the gold standard.

This is the interesting part about the black-box abstraction that is Cloud, not just for Amazon, but any massively-scaled service provider; the more abstracted the service, the more dependent upon the provider or third parties we will become to troubleshoot issues and protect our assets.  In many cases, however, it will simply take much more time to resolve issues since visibility and transparency are limited to what the provider chooses or is able to provide.

We’re in the early days still of what customers know to ask about how security is managed in these massively scaled multi-tenant environments and since in some cases we are contractually prevented from exercising tests designed to understand the limits, we’re back to trusting that the provider has it handled…until we determine they don’t.

Put that in your risk management pipe and smoke it.

The network and systems that make up our cloud providers offerings must do a better job in stopping bad things from occurring before they reach our instances and workloads or customers should simply expect that they get what they pay for.  If the provider capabilities do not improve, combined with less visibility and an inability to deploy compensating controls, we’re potentially in a much worse spot than having no protection at all.

This is another opportunity to quietly remind folks about the Audit, Assertion, Assessment and Assurance API (A6) API that is being brought to life; there will hopefully be some exciting news here shortly about this project, but I see A6 as playing a very important role in providing a solution to some of the issues I mention here.  Ready when you are, Amazon.

If only it were so simple and transparent:

Inigo Montoya: You are using Bonetti’s Defense against me, ah?
Man in Black: I thought it fitting considering the rocky terrain.
Inigo Montoya: Naturally, you must suspect me to attack with Capa Ferro?
Man in Black: Naturally… but I find that Thibault cancels out Capa Ferro. Don’t you?
Inigo Montoya: Unless the enemy has studied his Agrippa… which I have.

/Hoff

*It’s only fair to mention that depending upon a single provider for service, no matter how good they may be and not taking advantage of monitoring services (at an extra cost,) is a risk decision that comes with consequences, one of them being longer time to resolution.

  • Share/Bookmark

Cloud Providers and Security “Edge” Services – Where’s The Beef?

September 30th, 2009 beaker 16 comments

usbhamburgerPreviously I wrote a post titled “Oh Great Security Spirit In the Cloud: Have You Seen My WAF, IPS, IDS, Firewall…” in which I described the challenges for enterprises moving applications and services to the Cloud while trying to ensure parity in compensating controls, some of which are either not available or suffer from the “virtual appliance” conundrum (see the Four Horsemen presentation on issues surrounding virtual appliances.)

Yesterday I had a lively discussion with Lori MacVittie about the notion of what she described as “edge” service placement of network-based WebApp firewalls in Cloud deployments.  I was curious about the notion of where the “edge” is in Cloud, but assuming it’s at the provider’s connection to the Internet as was suggested by Lori, this brought up the arguments in the post
above: how does one roll out compensating controls in Cloud?

The level of difficulty and need to integrate controls (or any “infrastructure” enhancement) definitely depends upon the Cloud delivery model (SaaS, PaaS, and IaaS) chosen and the business problem trying to be solved; SaaS offers the least amount of extensibility from the perspective of deploying controls (you don’t generally have any access to do so) whilst IaaS allows a lot of freedom at the guest level.  PaaS is somewhere in the middle.  None of the models are especially friendly to integrating network-based controls not otherwise supplied by the provider due to what should be pretty obvious reasons — the network is abstracted.

So here’s the rub, if MSSP’s/ISP’s/ASP’s-cum-Cloud operators want to woo mature enterprise customers to use their services, they are leaving money on the table and not fulfilling customer needs by failing to roll out complimentary security capabilities which lessen the compliance and security burdens of their prospective customers.

While many provide commoditized solutions such as anti-spam and anti-virus capabilities, more complex (but profoundly important) security services such as DLP (data loss/leakage prevention,) WAF, Intrusion Detection and Prevention (IDP,) XML Security, Application Delivery Controllers, VPN’s, etc. should also be considered for roadmaps by these suppliers.

Think about it, if the chief concern in Cloud environments is security around multi-tenancy and isolation, giving customers more comfort besides “trust us” has to be a good thing.  If I knew where and by whom my data is being accessed or used, I would feel more comfortable.

Yes, it’s difficult to do properly and in many cases means the Cloud provider has to make a substantial investment in delivery platforms and management/support integration to get there.  This is why niche players who target specific verticals (especially those heavily regulated) will ultimately have the upper hand in some of these scenarios – it’s not socialist security where “good enough” is spread around evenly.  Services like these need to be configurable (SELF-SERVICE!) by the consumer.

An example? How about Google: where’s DLP integrated into the messaging/apps platforms?  Amazon AWS: where’s IDP integrated into the VMM for introspection?

I wrote a couple of interesting posts about this (that may show up in the automated related posts lists below):

My customers in the Fortune 500 complain constantly that the biggest providers they are being pressured to consider for Cloud services aren’t listening to these requests — or aren’t in a position to respond.

That’s bad for everyone.

So how about it? Are services like DLP, IDP, WAF integrated into your Cloud providers’ offerings something you’d like to see rather than having to add additional providers as brokers and add complexity and cost back into Cloud?

/Hoff

  • Share/Bookmark

Google & AWS: Just Goes To Prove You Can Have Your Cloud and, um, Eat It Too…

September 25th, 2009 beaker 4 comments

…and by “eat it” I mean that how you think I mean that.  I feel for these guys, they have big targets on their backs, but that’s what happens when you’re a market leader.

To wit, there are two polarized views expressed every time Google or Amazon have an outage or service interruption given that both are constantly held up as the poster children for Cloud Computing:

  1. Cloud Computing isn’t ready for prime time; if Google or Amazon can go down, why/how can I trust them with my most critical assets!?
  2. Google and Amazon are just service providers; service providers have issues.  This isn’t a Cloud issue, it’s just a service issue.

The truth is somewhere in the middle.

Here’s my $0.02.  You may not like it.  Refunds will be processed by mail.

If you market yourself as the shit, you can expect some back when it hits the fan:

From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

From Hoff's Preso: Cloudifornication - Indiscriminate Information Intercourse Involving Internet Infrastructure

Stop apologizing and live up to the hype you’re helping create.

/Hoff

  • Share/Bookmark

Calling All Private Cloud Haters: Amazon Just Peed On Your Fire Hydrant…

August 26th, 2009 beaker 15 comments

Werner Vogels brought a smile to my face today with his blog titled “Seamlessly Extending the Data Center – Introducing Amazon Virtual Private Cloud.”  In short:

We have developed Amazon Virtual Private Cloud (Amazon VPC) to allow our customers to seamlessly extend their IT infrastructure into the cloud while maintaining the levels of isolation required for their enterprise management tools to do their work.

In one fell swoop, AWS has:

  • Legitimized Private Cloud as a reasonable, needed, and prudent step toward Cloud adoption for enterprises,
  • Substantiated the value proposition of Private Cloud as a way of removing a barrier to Cloud entry for enterprises, and
  • Validated the ultimate vision toward hybrid Clouds and Inter-Cloud

They made this announcement from the vantage point of operating as a Public Cloud provider — in many cases THE Public Cloud provider of choice for those arguing from an exclusionary perspective that Public Cloud is the only way forward.

Now, it’s pretty clear on AWS’ position on Private Cloud; straight form the horse’s mouth Werner says “Private Cloud is not the Cloud” (see below) — but it’s also clear they’re willing to sell you some ;)

The cost for VPC isn’t exorbitant, but it’s not free, either, so the business case is clearly there (see the official VPC site)– VPN connectivity is $0.05 per VPN connection with data transfer rates of $0.10 per GB inbound and ranging from $0.17 per GB – $0.10 per GB outbound depending upon volume (with heavy data replication or intensive workloads people are going to need to watch the odometer.)

I’m going to highlight a couple of nuggets from his post:

We continuously listen to our customers to make sure our roadmap matches their needs. One important piece of feedback that mainly came from our enterprise customers was that the transition to the cloud of more complex enterprise environments was challenging. We made it a priority to address this and have worked hard in the past year to find new ways to help our customers transition applications and services to the cloud, while protecting their investments in their existing IT infrastructure. …

Private Cloud Is Not The Cloud – These CIOs know that what is sometimes dubbed “private cloud” does not meet their goal as it does not give them the benefits of the cloud: true elasticity and capex elimination. Virtualization and increased automation may give them some improvements in utilization, but they would still be holding the capital, and the operational cost would still be significantly higher.

We have been listening very closely to the real requirements that our customers have and have worked closely with many of these CIOs and their teams to understand what solution would allow them to treat the cloud as a seamless extension of their datacenter, where their standard management practices can be applied with limited or no modifications. This needs to be a solution where they get all the benefits of cloud as mentioned above [Ed: eliminates cost, elastic, removes "undifferentiated heavy lifting"] while treating it as a part of their datacenter.

We have developed Amazon Virtual Private Cloud (Amazon VPC) to allow our customers to seamlessly extend their IT infrastructure into the cloud while maintaining the levels of isolation required for their enterprise management tools to do their work.

With Amazon VPC you can:

  • Create a Virtual Private Cloud and assign an IP address block to the VPC. The address block needs to be CIDR block such that it will be easy for your internal networking to route traffic to and from the VPC instance. These are addresses you own and control, most likely as part of your current datacenter addressing practice.
  • Divide the VPC addressing up into subnets in a manner that is convenient for managing the applications and services you want run in the VPC.
  • Create a VPN connection between the VPN Gateway that is part of the VPC instance and an IPSec-based VPN router on your own premises. Configure your internal routers such that traffic for the VPC address block will flow over the VPN.
  • Start adding AWS cloud resources to your VPC. These resources are fully isolated and can only communicate to other resources in the same VPC and with those resources accessible via the VPN router. Accessibility of other resources, including those on the public internet, is subject to the standard enterprise routing and firewall policies.

Amazon VPC offers customers the best of both the cloud and the enterprise managed data center:

  • Full flexibility in creating a network layout in the cloud that complies with the manner in which IT resources are managed in your own infrastructure.
  • Isolating resources allocated in the cloud by only making them accessible through industry standard IPSec VPNs.
  • Familiar cloud paradigm to acquire and release resources on demand within your VPC, making sure that you only use those resources you really need.
  • Only pay for what you use. The resources that you place within a VPC are metered and billed using the familiar pay-as-you-go approach at the standard pricing levels published for all cloud customers. The creation of VPCs, subnets and VPN gateways is free of charge. VPN usage and VPN traffic are also priced at the familiar usage based structure

All the benefits from the cloud with respect to scalability and reliability, freeing up your engineers to work on things that really matter to your business.

Jeff Barr did a great job of giving a little more detail on his blog but also brought up a couple of points I need to noodle on from a security perspective:

Because the VPC subnets are used to isolate logically distinct functionality, we’ve chosen not to immediately support Amazon EC2 security groups. You can launch your own AMIs and most public AMIs, including Microsoft Windows AMIs. You can’t launch Amazon DevPay AMIs just yet, though.

The Amazon EC2 instances are on your network. They can access or be accessed by other systems on the network as if they were local. As far as you are concerned, the EC2 instances are additional local network resources – there is no NAT translation. EC2 instances within a VPC do not currently have Internet-facing IP addresses.

We’ve confirmed that a variety of Cisco and Juniper hardware/software VPN configurations are compatible; devices meeting our requirements as outlined in the box at right should be compatible too. We also plan to support Software VPNs in the near future.

The notion of the VPC and associated VPN connectivity coupled with the “software VPN” statement above reminds me of Cohesive F/T’s VPN-Cubed solution.  While this is an IaaS-focused discussion, it’s only fair to bring up Google’s Secure Data Connector that was announced some moons ago from a SaaS/PaaS perspective, too.

I would be remiss in my musings were I not to also suggest that Cloud brokers and Cloud service providers such as RightScale, GoGrid, Terremark, etc. were on the right path in responding to customers’ needs well before this announcement.

Further, it should be noted that now that the 800lb Gorilla has staked a flag, this will bring up all sorts of additional auditing and compliance questions, as any sort of broad connectivity into and out of security zones and asset groupings always do.  See the PCI debate (How to Be PCI Compliant In the Cloud)

At the end of the day, this is a great step forward toward — one I am happy to say that I’ve been talking about and presenting (see my Frogs presentation) for the last two years.

/Hoff

  • Share/Bookmark