Amazon Web Services: It’s Not The Size Of the Ship, But Rather The Motion Of the…
Carl Brooks (@eekygeeky) gets some fantastic, thought-provoking interviews. His recent article wherein he interviewed Peter DeSantis, VP of EC2, Amazon Web Services, was titled: “Amazon would like to remind you where the hype started” is another great example.
However, this article left a bad taste in my mouth and ultimately invites more questions than it answers. Frankly I felt like there was a large amount of hand-waving in DeSantis’ points that glossed over some very important issues related to security issues of late.
DeSantis’ remarks implied, per the title of the article, that to explain the poor handling and continuing lack of AWS’ transparency related to the issues people like me raise, the customer is to blame due to hype and overly aggressive, misaligned expectations.
In short, it’s not AWS’ fault they’re so awesome, it’s ours. However, please don’t remind them they said that when they don’t live up to the hype they help perpetuate.
You can read more about that here “Transparency: I Do Not Think That Means What You Think That Means…”
I’m going to skip around the article because I do agree with Peter DeSantis on the points he made about the value proposition of AWS which ultimately appear at the end of the article:
“A customer can come into EC2 today and if they have a website that’s designed in a way that’s horizontally scalable, they can run that thing on a single instance; they can use [CloudWatch to monitor the various resource constraints and the performance of their site overall; they can use that data with our autoscaling service to automatically scale the number of hosts up or down based on demand so they don’t have to run those things 24/7; they can use our Elastic Load Balancer service to scale the traffic coming into their service and only deliver valid requests.”
“All of which can be done self-service, without talking to anybody, without provisioning large amounts of capacity, without committing to large bandwidth contracts, without reserving large amounts of space in a co-lo facility and to me, that’s a tremendously compelling story over what could be done a couple years ago.”
Completely fair. Excellent way of communicating the AWS value proposition. I totally agree. Let’s keep this definitional firmly in mind as we go on.
Here’s where the story turns into something like a confessional that implies AWS is sadly a victim of their own success:
DeSantis said that the reason that stories like the DDOS on Bitbucket.org (and the non-cloud Sidekick story) is because people have come to expect always-on, easily consumable services.
“People’s expectations have been raised in terms of what they can do with something like EC2. I think people rightfully look at the potential of an environment like this and see the tools, the multi- availability zone, the large inbound transit, the ability to scale out and up and fundamentally assume things should be better. “ he said.
That’s absolutely true. We look at what you offer (and how you offered/described it above) and we set our expectations accordingly.
We do assume that things should be better as that’s how AWS has consistently marketed the service.
You can’t reasonably expect to bitch about people’s perception of the service based on how it’s “sold” and then turn around when something negative happens and suggest that it’s the consumers’ fault for setting their expectational compass with the course you set.
It *is* absolutely fair to suggest that there is no release from not using common sense, not applying good architectural logic to deployment of services on AWS, but it’s also disingenuous to expect much of the target market to whom you are selling understands the caveats here when so much is obfuscated by design. I understand AWS doesn’t say they protect against every threat, but they also do not say they do not…until something happens where that becomes readily apparent 😉
When everything is great AWS doesn’t go around reminding people that bad things can happen, but when bad things happen it’s because of incorrectly-set expectations?
Here’s where the discussion turns to an interesting example — the BitBucket DDoS issue.
For instance, DeSantis said it would be trivial to wash out standard DDOS attacks by using clustered server instances in different availability zones.
Okay, but four things come to mind:
- Why did it take 15 hours for AWS to recognize the DDoS in the first place? (They didn’t actually “detect” it, the customer did)
- Why did the “vulnerability” continue to exist for days afterward?
- While using different availability zones makes sense, it’s been suggested that this DDoS attack was internal to EC2, not externally-generated
- While it *is* good practice and *does* make sense, “clustered server instances in different avail. zones, costs money
Keep those things in the back of your mind for a moment…
“One of the best defenses against any sort of unanticipated spike is simply having available bandwidth. We have a tremendous amount on inbound transit to each of our regions. We have multiple regions which are geographically distributed and connected to the internet in different ways. As a result of that it doesn’t really take too many instances (in terms of hits) to have a tremendous amount of availability – 2,3,4 instances can really start getting you up to where you can handle 2,3,4,5 Gigabytes per second. Twenty instances is a phenomenal amount of bandwidth transit for a customer.” he said.
So again, here’s where I take issue with this “bandwidth solves all” answer. The solution being proposed by DeSantis here is that a customer should be prepared to launch/scale multiple instances in response to a DoS/DDoS, in effect making it the customers’ problem instead of AWS detecting and squelching it in the first place?
Further, when you think of it, the trickle-down effect of DDoS is potentially good for AWS’ business. If they can absorb massive amounts of traffic, then the more instances you have to scale, the better for them given how they charge. Also, per my point #3 above, it looks as though the attack was INTERNAL to EC2, so ingress transit bandwidth per region might not have done anything to help here. It’s unclear to me whether this was a distributed DoS attack at all.
Lori MacVittie wrote a great post on this very thing titled “Putting a Price on Uptime” which basically asks who pays for the results of an attack like this:
“A lack of ability in the cloud to distinguish illegitimate from legitimate requests could lead to unanticipated costs in the wake of an attack. How do you put a price on uptime and more importantly, who should pay for it?“
This is exactly the point I was raising when I first spoke of Economic Denial Of Sustainability (EDoS) here. All the things AWS speaks to as solutions cost more money…money which many customers based upon their expectations of AWS’ service, may be unprepared to spend. They wouldn’t have much better options (if any) if they were hosting it somewhere else, but that’s hardly the point.
I quote back to something I tweeted earlier “The beauty of cloud and infinite scale is that you get the benefits of infinite FAIL”
The largest DDOS attacks now exceed 40Gbps. DeSantis wouldn’t say what AWS’s bandwidth ceiling was but indicated that a shrewd guesser could look at current bandwidth and hosting costs and what AWS made available, and make a good guess.
The tests done here showed the capability to generate 650 Mbps from a single medium instance that attacked another instance which, per Radim Marek, was using another AWS account in another availability zone. So if the “largest” DDoS attacks now exceed 40 Gbps” and five EC2 instances can handle 5Gb/s, I’d need 8 instances to absorb an attack of this scale (unknown if this represents a small or large instance.) Seems simple, right?
Again, this about absorbing bandwidth against these attacks, not preventing them or defending against them. This is about not only passing the buck by squeezing more of them out of you, the customer.
“ I don’t want to challenge anyone out there, but we are very, very large environment and I think there’s a lot of data out there that will help you make that case.” he said.
Of course you wish to challenge people, that’s the whole point of your arguments, Peter.
How much bandwidth AWS has is only one part of the issue here. The other is AWS’ ability to respond to such attacks in reasonable timeframes and prevent them in the first place as part of the service. That’s a huge part of what I expect from a cloud service.
So let’s do what DeSantis says and set our expectations accordingly.