Archive

Archive for the ‘A6’ Category

Dear Verizon Business: I Have Some Questions About Your PCI-Compliant Cloud…

August 24th, 2010 5 comments

You’ll forgive my impertinence, but the last time I saw a similar claim of a PCI compliant Cloud offering, it turned out rather anti-climatically for RackSpace/Mosso, so I just want to make sure I understand what is really being said.  I may be mixing things up in asking my questions, so hopefully someone can shed some light.

This press release announces that:

“…Verizon’s On-Demand Cloud Computing Solution First to Achieve PCI Compliance” and the company’s cloud computing solution called Computing as a Service (CaaS) which is “…delivered from Verizon cloud centers in the U.S. and Europe, is the first cloud-based solution to successfully complete the Payment Card Industry Data Security Standard (PCI DSS) audit for storing, processing and transmitting credit card information.”

It’s unclear to me (at least) what’s considered in scope and what level/type of PCI certification we’re talking about here since it doesn’t appear that the underlying offering itself is merchant or transactional in nature, but rather Verizon is operating as a service provider that stores, processes, and transmits cardholder data on behalf of another entity.

Here’s what the article says about what Verizon undertook for DSS validation:

To become PCI DSS-validated, Verizon CaaS underwent a comprehensive third-party examination of its policies, procedures and technical systems, as well as an on-site assessment and systemwide vulnerability scan.

I’m interested in the underlying mechanicals of the CaaS offering.  Specifically, it would appear that the platform – compute, network, and storage — are virtualized.  What is unclear is if the [physical] resources allocated to a customer are dedicated or shared (multi-tenant,) regardless of virtualization.

According to this article in The Register (dated 2009,) the infrastructure is composed like this:

The CaaS offering from Verizon takes x64 server from Hewlett-Packard and slaps VMware’s ESX Server hypervisor and Red Hat Enterprise Linux instances atop it, allowing customers to set up and manage virtualized RHEL partitions and their applications. Based on the customer portal screen shots, the CaaS service also supports Microsoft’s Windows Server 2003 operating system.

Some details emerge from the Verizon website that describes the environment more:

Every virtual farm comes securely bundled with a virtual load balancer, a virtual firewall, and defined network space. Once the farm is designed, built, and named – all in a matter of minutes through the CaaS Customer Management Portal – you can then choose whether you want to manage the servers in-house or have us manage them for you.

If the customer chooses to manage the “servers…in-house (sic)” is the customer’s network, staff and practices now in-scope as part of Verizon’s CaaS validation? Where does the line start/stop?

I’m very interested in the virtual load balancer (Zeus ZXTM perhaps?) and the virtual firewall (vShield? Altor? Reflex? VMsafe-API enabled Virtual Appliance?)  What about other controls (preventitive or detective such as IDS, IPS, AV, etc.)

The reason for my interest is how, if these resources are indeed shared, they are partitioned/configured and kept isolated especially in light of the fact that:

Customers have the flexibility to connect to their CaaS environment through our global IP backbone or by leveraging the Verizon Private IP network (our Layer 3 MPLS VPN) for secure communication with mission critical and back office systems.

It’s clear that Verizon has no dominion over what’s contained in the VM’s atop the hypervisor, but what about the network to which these virtualized compute resources are connected?

So for me, all this all comes down to scope. I’m trying to figure out what is actually included in this certification, what components in the stack were audited and how.  It’s not clear I’m going to get answers, but I thought I’d ask any way.

Oh, by the way, transparency and auditability would be swell for an environment such as this. How about CloudAudit? We even have a PCI DSS CompliancePack 😉

Question for my QSA peeps: Are service providers required to also adhere to sections like 6.6 (WAF/Binary analysis) of their offerings even if they are not acting as a merchant?

/Hoff

Related articles by Zemanta

Enhanced by Zemanta

More On High Assurance (via TPM) Cloud Environments

April 11th, 2010 14 comments
North Bridge Intel G45
Image via Wikipedia

Back in September 2009 after presenting at the Intel Virtualization (and Cloud) Security Summit and urging Intel to lead by example by pushing the adoption and use of TPM in virtualization and cloud environments, I blogged a simple question (here) as to the following:

Does anyone know of any Public Cloud Provider (or Private for that matter) that utilizes Intel’s TXT?

Interestingly the replies were few; mostly they were along the lines of “we’re considering it,” “…it’s on our long radar,” or “…we’re unclear if there’s a valid (read: economically viable) use case.”

At this year’s RSA Security Conference, however, EMC/RSA, Intel and VMware made an announcement regarding a PoC of their “Trusted Cloud Infrastructure,” describing efforts to utilize technology across the three vendors’ portfolios to make use of the TPM:

The foundation for the new computing infrastructure is a hardware root of trust derived from Intel Trusted Execution Technology (TXT), which authenticates every step of the boot sequence, from verifying hardware configurations and initialising the BIOS to launching the hypervisor, the companies said.

Once launched, the VMware virtualisation environment collects data from both the hardware and virtual layers and feeds a continuous, raw data stream to the RSA enVision Security Information and Event Management platform. The RSA enVision is engineered to analyse events coming through the virtualisation layer to identify incidents and conditions affecting security and compliance.

The information is then contextualised within the Archer SmartSuite Framework, which is designed to present a unified, policy-based assessment of the organisation’s security and compliance posture through a central dashboard, RSA said.

It should be noted that in order to take advantage of said solution, the following components are required: a future release of RSA’s Archer GRC console, the upcoming Intel Westmere CPU and a soon-to-be-released version of VMware’s vSphere.  In other words, this isn’t available today and will require upgrades up and down the stack.

Sam Johnston today pointed me toward an announcement from Enomaly referencing the “High Assurance Edition” of ECP which laid claims of assurance using the TPM beyond the boundary of the VMM to include the guest OS and their management system:

Enomaly’s Trusted Cloud platform provides continuous security assurance by means of unique, hardware-assisted mechanisms. Enomaly ECP High Assurance Edition provides both initial and ongoing Full-Stack Integrity Verification to enable customers to receive cryptographic proof of the correct and secure operation of the cloud platform prior to running any application on the cloud.

  • Full-Stack Integrity Verification provides the customer with hardware-verified proof that the cloud stack (encompassing server hardware, hypervisor, guest OS, and even ECP itself) is intact and has not been tampered with. Specifically, the customer obtains cryptographically verifiable proof that the hardware, hypervisor, etc. are identical to reference versions that have been certified and approved in advance. The customer can therefore be assured, for example, that:
  • The hardware has not been modified to duplicate data to some storage medium of which the application is not aware
  • No unauthorized backdoors have been inserted into the cloud managment system
  • The hypervisor has not been modified (e.g. to copy memory state)
  • No hostile kernel modules have been injected into the guest OS
This capability therefore enables customers to deploy applications to public clouds with confidence that the confidentiality and integrity of their data will not be compromised.

Of particular interest was Enomaly’s enticement of service providers with the following claim:

…with Enomaly’s patented security functionality, can deliver a highly secure Cloud Computing service – commanding a higher price point than commodity public cloud providers.

I’m looking forward to exploring more regarding these two example solutions as they see the light of day (and how long this will take given the need for platform-specific upgrades up and down the stack) as well as whether or not customers are actually willing to pay — and providers can command — a higher price point for what these components may offer.  You can bet certain government agencies are interested.

There are potentially numerous benefits with the use of this technology including security, compliance, assurance, audit and attestation capabilities (I hope also to incorporate more of what this might mean into the CloudAudit/A6 effort) but I’m very interested as to the implications on (change) management and policy, especially across heterogeneous environments and the extension and use of TPM’s across mobile platforms.

Of course, researchers are interested in these things too…see Rutkowska, et. al and “Attacking Intel Trusted Execution Technology” as an example.

/Hoff

Related articles by Zemanta

Reblog this post [with Zemanta]

Good Interview/Resource Regarding CloudAudit from SearchCloudComputing…

April 6th, 2010 No comments

The guys from SearchCloudComputing gave me a ring and we chatted about CloudAudit. The interview that follows is a distillation of that discussion and goes a long way toward answering many of the common questions surrounding CloudAudit/A6.  You can find the original here.

What are the biggest challenges when auditing cloud-based services, particularly for the solution providers?

Christofer Hoff:: One of the biggest issues is their lack of understanding of how the cloud differs from traditional enterprise IT. They’re learning as quickly as their customers are. Once they figure out what to ask and potentially how to ask it, there is the issue surrounding, in many cases, the lack of transparency on the part of the provider to be able to actually provide consistent answers across different cloud providers, given the various delivery and deployment models in the cloud.

How does the cloud change the way a traditional audit would be carried out?

Hoff: For the most part, a good amount of the questions that one would ask specifically surrounding the infrastructure is abstracted and obfuscated. In many cases, a lot of the moving parts, especially as they relate to the potential to being competitive differentiators for that particular provider, are simply a black box into which operationally you’re not really given a lot of visibility or transparency.
If you were to host in a colocation provider, where you would typically take a box, the operating system and the apps on top of it, you’d expect, given who controls what and who administers what, to potentially see a lot more, as well as there to be a lot more standardization of those deployed solutions, given the maturity of that space.

How did CloudAudit come about?

Hoff: I organized CloudAudit. We originally called it A6, which stands for Automated Audit Assertion Assessment and Assurance API. And as it stands now, it’s less in its first iteration about an API, and more specifically just about a common namespace and interface by which you can use simple protocols with good authentication to provide access to a lot of information that essentially can be automated in ways that you can do all sorts of interesting things with.

How does it work exactly?

Hoff: What we wanted to do is essentially keep it very simple, very lightweight and easy to implement without cloud providers having to make a lot of programmatic changes. Although we’re not prescriptive about how they do it (because each operation is different), we expect them to figure out how they’re going to get the information into this namespace, which essentially looks like a directory structure.

This kind of directory/namespace is really just an organized repository. We don’t care what is contained within those directories: .pdf, text documents, links to other websites. It could be a .pdf of a SAS 70 report with a signature that refers back to the issuing governing body. It could be logs, it could be assertions such as firewall=true. The whole point here is to allow these providers to agree upon the common set of minimum requirements.
We have aligned the first set of compliance-driven namespaces to that of theCloud Security Alliance‘s compliance control-mapping tool. So the first five namespaces pretty much run the gamut of what you expect to see most folks concentrating on in terms of compliance: PCI DSS, HIPAA, COBIT, ISO 27002 and NIST 800-53…Essentially, we’re looking at both starting with those five compliance frameworks, and allowing cloud providers to set up generic infrastructure-focused type or operational type namespaces also. So things that aren’t specific to a compliance framework, but that you may find of interest if you’re a consumer, auditor, or provider.

Who are the participants in CloudAudit?

Hoff: We have both pretty much the largest cloud providers as well as virtualization platform and cloud platform providers on the planet. We’ve got end users, auditors, system integrators. You can get the list off of the CloudAudit website. There are folks from CSC, Stratus, Akamai, Microsoft, VMware, Google, Amazon Web Services, Savvis, Terrimark, Rackspace, etc.

What are your short-term and long-term goals?

Hoff: Short-term goals are those that we are already trucking toward: to get this utilized as a common standard by which cloud providers, regardless of location — that could be internal private cloud or could be public cloud — essentially agree on the same set of standards by which consumers or interested parties can pull for information.

In the long-term, we wish to be able to improve visibility and transparency, which will ultimately drive additional market opportunities because, for example, if you have various levels of authentication, anywhere from anonymous to system administrator to auditor to fully trusted third party, you can imagine there’ll be a subset of anonymized information available that would actually allow a cloud broker or consumer to poll multiple cloud providers and actually make decisions based upon those assertions as to whether or not they want to do business with that cloud provider.

…It gives you an opportunity to shop wisely and ultimately compares services or allow that to be done in an automated fashion. And while CloudAudit does not seek to make an actual statement regarding compliance, you will ultimately be provided with enough information to allow either automated tools or at least auditors to get back to the business of auditing rather than data collection. Because this data gathering can be automated, it means that instead of having a PCI audit once every year, or every 6 months, you can have it on a schedule that is much more temporal and on-demand.

What will solution providers and resellers be able to take from it? How is it to their benefit to get involved?

Hoff: The cloud service providers themselves, for the most part, are seeing this as a tremendous opportunity to not only reduce cost, but also make this information more visible and available…The reality is, in many cases, to be frank, folks that make a living auditing actually spend the majority of their time in data collection rather than actually looking at and providing good, actual risk management, risk assessment and/or true interpretation of the actual data. Now the automation of that, whether it’s done on a standard or on an ad-hoc basis, could clearly put a crimp in their ability to collect revenues. So the whole point here is their “value-add” needs to be about helping customers to actually manage risk appropriately vs. just kind of becoming harvesters of information. It behooves them to make sure that the type of information being collected is in line with the services they hope to produce.

What needs to be done for this to become an industry standard?

Hoff: We’ve already written a normative spec that we hope to submit to the IETF. We have cross-section representation across industry, we’re building namespaces, specifications, and those are not done in the dark. They’re done with a direct contribution of the cloud providers themselves, because they understand how important it is to get this information standardized. Otherwise, you’re going to be having ad-hoc comparisons done of services which may not portray your actual security services capabilities or security posture accurately. We have a huge amount of interest, a good amount of participation, and a lot of alliances that are already bubbling with other cloud standards.

Cloud computing changes the game for many security services, including vulnerability management, penetration testing and data protection/encryption, not just audits. Is the CloudAudit initiative a piece of a larger cloud security puzzle?

Hoff: If anything, it’s a light bulb in the darkness. For us, it’s allowing these folks to adjust their tools to be able to consume the data that’s provided as part of the namespace within CloudAudit, and then essentially in the same way, we suggest human auditors focus more on interpreting that data rather than gathering it.
If gathering that data was unavailable to most of the vendors who would otherwise play in that space, due to either just that data not being presented or it being a violation of terms of service or acceptable use policy, the reality is that this is another way for these tool vendors to get back into the game, which is essentially then understanding the namespaces that we have, being able to modify their tools (which shouldn’t take much, since it’s already a standard-based protocol), and be able to interpret the namespaces to actually provide value with the data that we provide.
I think it’s an overall piece here, but again we’re really the conduit or the interface by which some of these technologies need to adapt. Rather than doing a one-off by one-off basis for every single cloud provider, you get a standardized interface. You only have to do it once.

Where should people go to get involved?

Hoff: If people want to get involved, it’s an open project. You can go to cloudaudit.org. There you’ll find links about us. There’ll be a link to the farm. The farm itself is currently a Google group, which you can sign up for and participate. We have calls every Monday, which are posted on the farm and tell you how to connect. You can also replay the last of the many calls that we’ve had already as we record them each time so that people have both the audio and visual versions of what we produce and how we’re going about this, and it’s very transparent and very open and we enjoy people getting involved. If you have something to add, please do.

Related articles by Zemanta

Reblog this post [with Zemanta]

Slides from My Cloud Security Alliance Keynote: The Cloud Magic 8 Ball (Future Of Cloud)

March 7th, 2010 No comments

Here are the slides from my Cloud Security Alliance (CSA) keynote from the Cloud Security Summit at the 2010 RSA Security Conference.

The punchline is as follows:

All this iteration and debate on the future of the “back-end” of Cloud Computing — the provider side of the equation — is ultimately less interesting than how the applications and content served up will be consumed.

Cloud Computing provides for the mass re-centralization of applications and data in mega-datacenters while simultaneously incredibly powerful mobile computing platforms provide for the mass re-distribution of (in many cases the same) applications and data.  We’re fixated on the security of the former but ignoring that of the latter — at our peril.

People worry about how Cloud Computing puts their applications and data in other people’s hands. The reality is that mobile computing — and the clouds that are here already and will form because of them — already put, quite literally, those applications and data in other people’s hands.

If we want to “secure” the things that matter most, we must focus BACK on information centricity and building survivable systems if we are to be successful in our approach.  I’ve written about the topics above many times, but this post from 2009 is quite apropos: The Quandary Of the Cloud: Centralized Compute But Distributed Data You can find other posts on Information Centricity here.

Slideshare direct link here (embedded below.)

Reblog this post [with Zemanta]

Don’t Hassle the Hoff: Recent Press & Podcast Coverage & Upcoming Speaking Engagements

February 19th, 2010 No comments

Here is some of the recent coverage from the last couple of months or so on topics relevant to content on my blog, presentations and speaking engagements.  No particular order or priority and I haven’t kept a good record, unfortunately.

Important Stuff I’m Working On:

Press/Technology & Security eZines/Website/Blog Coverage/Meaningful Links:

Recent Speaking Engagements/Confirmed to  speak at the following upcoming events:

  • Govt Solutions Forum Feb 1-2 (panel |n DC)
  • Govt Solutions Forum Feb 24 D.C.
  • ESAF, San Francisco, March 1
  • Cloud Security Alliance Summit, San Francisco, March 1
  • RSA Security Conference March 1-5 San Francisco
  • Microsoft Bluehat Buenos Aires, Argentina – March 16-19th
  • ISSA General Assembly, Belgium
  • Infosec.be, Belgium
  • Codegate, South Korea, April 7-8
  • SOURCE Boston, April 21-23
  • Shot the Sherrif – Brazil – May 17th
  • Gluecon , Denver, May 26/27
  • FIRST, Miami, FL,  June 13-18
  • SANS DC – August 19th-20th

Conferences I am tentatively attending, trying to attend and/or working on logistics for speaking:

  • InterOp April 25-29 Vegas
  • Cisco Live – June 27th – July 1st Vegas
  • Blackhat 2010 – July 24-29 Vegas
  • Defcon
  • Notacon

Oh, let us not forget these top honors (buahahaha!)

  • Top 10 Sexy InfoSec Geeks (link)
  • The ThreatPost “All Decade Interview Team” (link)
  • ‘Cloud Hero’ and ‘Best Cloud Presentation’ – 2009 Cloudies Awards (link), and
  • 2010 RSA Social Security Bloggers Award nomination (link) 😉

[I often get a bunch of guff as to why I make these lists: ego, horn-tooting, self-aggrandizement. I wish I thought I were that important. 😉 The real reason is that it helps me keep track of useful stuff focused not only on my participation, but that of the rest of the blogosphere.]

/Hoff

The Automated Audit, Assertion, Assessment, and Assurance API (A6) Becomes: CloudAudit

February 12th, 2010 No comments

I’m happy to announce that the Automated Audit, Assertion, Assessment, and Assurance API (A6) working group is organizing under the brand of “CloudAudit.”  We’re doing so to enable reaching a broader audience, ensure it is easier to find us in searches and generally better reflect the mission of the group.  A6 remains our byline.

We’ve refined how we are describing and approaching solving the problems of compliance, audit, and assurance in the cloud space and part of that is reflected in our re-branding.  You can find the original genesis for A6 here in this series of posts. Meanwhile, you can keep track of all things CloudAudit at our new home: http://www.CloudAudit.org.

The goal of CloudAudit is to provide a common interface that allows Cloud providers to automate the Audit, Assertion, Assessment, and Assurance (A6) of their environments and allow authorized consumers of their services to do likewise via an open, extensible and secure API.  CloudAudit is a volunteer cross-industry effort from the best minds and talent in Cloud, networking, security, audit, assurance, distributed application and system architecture backgrounds.

Our execution mantra is to:

  • Keep it simple, lightweight and easy to implement; offer primitive definitions & language structure using HTTP(S)
  • Allow for extension and elaboration by providers and choice of trusted assertion validation sources, checklist definitions, etc.
  • Not require adoption of other platform-specific APIs
  • Provide interfaces to Cloud naming and registry services

The benefits to the cloud provider are clear: a single reference model that allows automation of many functions that today incurs large costs in both manpower and time and costs business.  The base implementation is being designed to require little to no programmatic changes in order for implementation.  For the consumer and interested/authorized third parties, it allows on-demand examination of the same set of functions.

Mapping to compliance, regulatory, service level, configuration, security and assurance frameworks as well as third party trust brokers is part of what A6 will also deliver.  CloudAudit is working closely with other alliance and standards body organizations such as the Cloud Security Alliance and ENISA.

If you want to know who’s working on making this a reality, there are hundreds of interested parties; consumers as well as providers such as: Akamai, Amazon Web Services, Microsoft, NetSuite, Rackspace, Savvis, Terremark, Sun, VMware, and many others.

If you would like to get involved, please join the CloudAudit Working Group or visit the homepage here.

Here is the slide deck from the 2/12/10 working group call (our second) and a link to the WebEx playback of the call.

Reblog this post [with Zemanta]

Recording & Playback of WebEx A6 Working Group Kick-Off Call from 1/8/2010 Available

January 10th, 2010 No comments

If you’re interested in the great discussion and presentations we had during the kickoff call for the A6 (Automated Audit, Assertion, Assessment, and Assurance API) Working Group, there are two options to listen/view the WebEx recording:

Topic: A6 API Working Group – Kickoff Call-20100108 1704
Create time: 1/8/10 10:07 am
File size: 33.23MB
Duration: 1 hour 1 minute
Description: Streaming recording link:
https://ciscosales.webex.com/ciscosales/ldr.php?AT=pb&SP=MC&rID=41631852rKey=178e8b04941e5672
Download recording link:
https://ciscosales.webex.com/ciscosales/lsr.php?AT=dw&SP=MC&rID=41631…

MAKE SURE YOU VIEW THE CHAT WINDOW << It contains some really excellent discussion points.

We had two great presentations from representatives from the OGF OCCI group and CSC’s Trusted Cloud Team.

I’ll be setting up regular calls shortly and a few people have reached out to me regarding helping form the core team to begin organizing the working group in earnest.

You can also follow along via the Google Group here.

/Hoff

In need of a cool logo for the group by the way… 😉

The A6 (Audit, Assertion, Assessment, and Assurance API) Working Group is Live. Please join & read the intro.

November 16th, 2009 No comments

For those of you following along at home, the A6 (Audit, Assertion, Assessment, and Assurance API) Working Group is Live.

I’ve setup the Google group so please join & read the introduction here.

Hope to see you there.

/Hoff

Categories: A6 Tags: , , ,

Don’t Hassle the Hoff: Recent Press & Podcast Coverage & Upcoming Speaking Engagements

October 26th, 2009 1 comment

Microphone

Here is some of the recent coverage from the last month or so on topics relevant to content on my blog, presentations and speaking engagements.  No particular order or priority and I haven’t kept a good record, unfortunately.

Press/Technology & Security eZines/Website/Blog Coverage/Meaningful Links:

Podcasts/Webcasts/Video:

Recent Speaking Engagements/Confirmed to  speak at the following upcoming events:

  • Enterprise Architecture Conference, D.C.
  • Intel Security Summit 2009, Hillsboro OR
  • SecTor 2009, Toronto CA
  • EMC Innovation Forum, Franklin MA
  • NY Technology Forum, NY, NY
  • Microsoft Bluehat v9, Redmond WA
  • Office of the Comptroller & Currency, San Antonio TX
  • Intercloud Working Group, GooglePlex CA 😉
  • CSC Leading Edge Forum, VA
  • DojoCon, VA

I also forgot to thank Eric Siebert for putting together the VMware Top 20 blog list and putting me on it as well as the fact that Rational Survivability made the Datamation 2009 Top 200 Tech Blogs list.

/Hoff

Transparency: I Do Not Think That Means What You Think That Means…

October 12th, 2009 7 comments

vizziniHa ha! You fool! You fell victim to one of the classic blunders – The most famous of which is “never get involved in a cloud war in Asia” – but only slightly less well-known is this: “Never go against Werner when availability is on the line!”

As an outsider, it’s easy to play armchair quarterback, point fingers and criticize something as mind-bogglingly marvelous as something the size and scope of Amazon Web Services.  After all, they make all that complexity disappear under the guise of a simple web interface to deliver value, innovation and computing wonderment the likes of which are really unmatched.

There’s an awful lot riding on Amazon’s success.  They set the pace by which an evolving industry is now measured in terms of features, functionality, service levels, security, cost and the way in which they interact with customers and the community of ecosystem partners.

An interesting set of observations and explanations have come out of recent events related to degraded performance, availability and how these events have been handled.

When something bad happens, there’s really two ways to play things:

  1. Be as open as possible, as quickly as possible and with as much detail as possible, or
  2. Release information only as needed, when pressured and keep root causes and their resolutions as guarded as possible

This, of course, is an over-simplification of the options, complicated by the need for privacy, protection of intellectual property, legal issues, compliance or security requirements.  That’s not really any different than any other sort of service provider or IT department, but then again, Amazon’s Web Services aren’t like any other sort of service provider or IT department.

So when something bad happens, it’s been my experience as a customer (and one that admittedly does not pay for their “extra service”) that sometimes notifications take longer than I’d like, status updates are not as detailed as I might like and root causes sometimes cloaked in the air of the mysterious “network connectivity problem” — a replacement for the old corporate stand-by of “blame the firewall.”  There’s an entire industry cropping up to help you with these sorts of things.

Something like the BitBucket DDoS issue however, is not a simple “network connectivity problem.”  It is, however, a problem which highlights an oft-played pantomime of problem resolution involving any “managed” service being provided by a third party to which you as the customer have limited access at various critical points in the stack.

This outage represents a disconnect in experience versus expectation with how customers perceive the operational underpinnings of AWS’ operations and architecture and forces customers to reconsider how all that abstracted infrastructure actually functions in order to deliver what — regardless of what the ToS say — they want to believe it delivers.  This is that perception versus reality gap I mentioned earlier.  It’s not the redonkulous “end-of-cloud” scenarios parroted by the masses of the great un(cloud)washed, but it’s serious nonetheless.

As an example, BitBucket’s woes of over 20+ hours of downtime due to UDP (and later TCP) DDoS floods led to the well-documented realization that support was inadequate, monitoring insufficient and security defenses lacking — from the perspective of both the customer and AWS*.  The reality is that based on what we *thought* we knew about how AWS functioned to protect against these sorts of things, these attacks should never have wrought the damage they did.  It seems AWS was equally as surprised.

It’s important to note that these were revelations made in near real-time by the customer, not AWS.

Now, this wasn’t a widespread problem, so it’s understandable to a point as to why we didn’t hear a lot from AWS with regards to this issue, but after this all played out, when we look at what has been disclosed publicly by AWS, it appears the issue is still not remedied and despite the promise to do better, a follow-on study seems to suggest that the problem may not yet be well understood or solved by AWS (See: Amazon EC2 vulnerable to UDP flood attacks) (Ed: After I wrote this, I got a notification that this particular issue has been fixed which is indeed, good news.)

Now, releasing details about any vulnerability like this could put many many customers at risk from similar attack, but the lack of transparency  of service and architecture means that we’re left with more questions than answers. How can a customer (like me) today defend themselves against an attack like this in the lurch of not knowing what causes it or how to defend against it? What happens when the next one surfaces?

Can AWS even reliably detect this sort of thing given the “socialist security” implementation of good enough security spread across its constituent customers?

Security by obscurity in cloud cannot last as the gold standard.

This is the interesting part about the black-box abstraction that is Cloud, not just for Amazon, but any massively-scaled service provider; the more abstracted the service, the more dependent upon the provider or third parties we will become to troubleshoot issues and protect our assets.  In many cases, however, it will simply take much more time to resolve issues since visibility and transparency are limited to what the provider chooses or is able to provide.

We’re in the early days still of what customers know to ask about how security is managed in these massively scaled multi-tenant environments and since in some cases we are contractually prevented from exercising tests designed to understand the limits, we’re back to trusting that the provider has it handled…until we determine they don’t.

Put that in your risk management pipe and smoke it.

The network and systems that make up our cloud providers offerings must do a better job in stopping bad things from occurring before they reach our instances and workloads or customers should simply expect that they get what they pay for.  If the provider capabilities do not improve, combined with less visibility and an inability to deploy compensating controls, we’re potentially in a much worse spot than having no protection at all.

This is another opportunity to quietly remind folks about the Audit, Assertion, Assessment and Assurance API (A6) API that is being brought to life; there will hopefully be some exciting news here shortly about this project, but I see A6 as playing a very important role in providing a solution to some of the issues I mention here.  Ready when you are, Amazon.

If only it were so simple and transparent:

Inigo Montoya: You are using Bonetti’s Defense against me, ah?
Man in Black: I thought it fitting considering the rocky terrain.
Inigo Montoya: Naturally, you must suspect me to attack with Capa Ferro?
Man in Black: Naturally… but I find that Thibault cancels out Capa Ferro. Don’t you?
Inigo Montoya: Unless the enemy has studied his Agrippa… which I have.

/Hoff

*It’s only fair to mention that depending upon a single provider for service, no matter how good they may be and not taking advantage of monitoring services (at an extra cost,) is a risk decision that comes with consequences, one of them being longer time to resolution.