Unsafe At Any Speed: The Darkside Of Automation

Home > Automation, Cloud Computing, Cloud Security, Disruptive Innovation > Unsafe At Any Speed: The Darkside Of Automation

Unsafe At Any Speed: The Darkside Of Automation

July 29th, 2011 beaker Leave a comment Go to comments

I’m a huge proponent of automation. Taking rote processes from the hands of humans & leveraging machines of all types to enable higher agility, lower cost and increased efficacy is a wonderful thing.

However, there’s a trade off; as automation matures and feedback loops become more closed with higher and higher clock rates yielding less time between execution, our ability to both detect and recover — let alone prevent — within a cascading failure domain is diminished.

Take three interesting, yet unrelated, examples:

The premise of the W.O.P.R. in War Games — Joshua goes apeshit and almost starts WWIII by invoking a simulated game of global thermonuclear war
The Airbus 380 failure – the luck of having 5 pilots on-board and their skill to override hundreds of cascading automation failures after an engine failure prevented a crash that would have killed hundreds of people.*
The AWS EBS outage — the cloud version of Girls Gone Wild; automated replication caught in a FOR…NEXT loop

These weren’t “maliciously initiated” issues, they were accidents. But how about “events” like Stuxnet? What about a former Gartner analyst having his home automation (CASA-SCADA) control system hax0r3d!? There’s another obvious one missing, but we’ll get to that in a minute (hint: Flash Crash)

How do we engineer enough failsafe logic up and down the stack that can function at the same scale as the decision and controller logic does? How do we integrate/expose enough telemetry that can be produced and consumed fast enough to actually allow actionable results in a timeframe that allows for graceful failure and recovery (nee survivability.)

One last example that is pertinent: high frequency trading (HFT) — highly automated, computer driven, algorithmic-based stock trading at speeds measured in millionths of a second.

Check out how this works:

[Check out James Urquhart’s great Wisdom Of the Clouds blog post: “What Cloud Computing Can Learn From Flash Crash“]

In the use-case of HFT, ruthlessly squeezing nanoseconds from the processing loops — removing as much latency as possible from every element of the stack — literally has implications in the millions of dollars.

Technology vendors are doing many very interesting and innovative things architecturally to achieve these goals — some of them quite audacious — and anything that gets in the way or adds latency is generally not considered “useful.” Security is usually one of them.

There are most definitely security technologies that allow for very low latency insertion of things like firewalls that have low single-digit microsecond latency figures (small packet,) but interestingly enough we’re also governed by the annoying laws of physics and things like propagation delay, serialization delay, TCP/IP protocol overhead, etc. all adds up.

Thus traditional approaches to “in-line” security — both detective and preventative — are not generally sustainable in these environments and thus require some deep thought so as to provide solutions that will scale as well as these HFT systems do…no short order.

I think this is another good use for “big data” and security data analytics. Consider very high speed side-band systems that function along with these HFT systems that could potentially leverage the logic in these transactional trading systems to allow us to get closer to being able to solve the challenges of these environments. Integrate these signaling and telemetry planes with “fabric-enabled” security capabilities and we might get somewhere useful.

This tees up nicely my buddy James Arlen’s talk at Blackhat on the insecurity of high frequency trading systems: “Security when nano seconds count” You should plan on checking it out…I know I will.

/Hoff

How Interoperability between Bombay Stock Exchange and National Stock Exchange Will Benefit High-Frequency Trading (your-story.org)
Trading Stocks in One-Billionth of a Second (247wallst.com)
SecurityAutomata: A Reference For Security Automation… (rationalsurvivability.com)
Monkeys with guns: high frequency trading (capital-chronicle.com)
Automated stock trading poses fraud risk, researcher says (news.cnet.com)
Unfair to Blame High-Frequency Trading for Market Crashes, The Speed Traders Edgar Perez to CNBC’s Oriel Morrison (your-story.org)
(Physical, Virtualized and Cloud) Security Automation – An API Example (rationalsurvivability.com)
More On Security & Big Data…Where Data Analytics and Security Collide (rationalsurvivability.com)
InfoSecFail: The Problem With Big Data Is Little Data (rationalsurvivability.com)

*H/T to @reillyusa who also pointed me to “Questions Raised About Airbus Automated Control System” regarding the doomed Air France 447 flight. Also, serendipitously, @etherealmind posted a link to a a story titled “Volkswagen demonstrates ‘Temporary Auto Pilot'” — what could *possibly* go wrong? 😉

Categories: Automation, Cloud Computing, Cloud Security, Disruptive Innovation Tags: 2010 Flash Crash, Automation, Cascading failure, Cloud Computing, Flash Crash, HFT, High Frequency Trading, James Urquhart, Stuxnet

Comments (5) Trackbacks (6) Leave a comment Trackback

@somic

July 29th, 2011 at 13:37 | #1

Reply | Quote

Accidents that are results of unforeseen interaction among systems within a larger complex system, in systems design literature are called "normal accidents." Unfortunately, in the long term they are unavoidable.

Consider a book by Charles Perrow titled "Normal Accidents: Living with High-Risk Technologies"
Russell Wurth

July 29th, 2011 at 14:29 | #2

Reply | Quote

Great post. This reminded me of a similar blog on the dangers of automation in price setting for marketplaces like Amazon. Two used book vendors had an algorithm to adjust their prices, one applying a discount, the other, an increase. Since the increase in price outweighed the discount, the books were going for $24 million in a short time frame. The prevalence of APIs, distributed and disconnected logic will ensure this kind of situation again, unless proper safety controls and rules are put in place http://www.michaeleisen.org/blog/?p=358
@s_crawford

July 30th, 2011 at 13:02 | #3

Reply | Quote

Great example. Adding this to my collection re "Data-Driven Security"
Guest

August 12th, 2011 at 08:49 | #4

Reply | Quote

Great post !
For the record, AF447 was not the brand new A380 but a A330. That's the only inconstency I found 🙂 http://en.wikipedia.org/wiki/Air_France_Flight_44… http://en.wikipedia.org/wiki/Airbus_A330
Omar Sultan

April 1st, 2012 at 11:27 | #5

Reply | Quote

I think we will see a few more “sorcerer’s apprentice” moments before folks start to catch on. IT always operates in a systematic manner and if one part of the system gets too far ahead of the rest, bad things happen (think a three-legged race, but with more people tied together).

Right now, we see an emphasis on automating and speeding execution speed without commensurate updates to control and auditing systems–like cranking up the horsepower in a car without upgrading the steering and brakes–at some point you are going in a ditch.

Inserting people in-line into the process is not feasible–we are relatively slow, expensive, and often some of these systems are so complex they are beyond the scope of one person to understand (Airbus example, Flash Crash). I think that Hoff has the right idea that we need to leverage automation and analytics on the control and audit side.

Machines policing machines — how SkyNet-y. 🙂

Omar

July 30th, 2011 at 18:11 | #1

Windows Azure and Cloud Computing Posts for 7/29/2011+ – Windows Azure Blog
August 1st, 2011 at 15:08 | #2

Windows Azure and Cloud Computing Posts for 8/1/2011+ – Windows Azure Blog
August 2nd, 2011 at 08:52 | #3

Regulation, automation, and cloud computing | Graham Integration Management Inc.
September 18th, 2011 at 21:10 | #4

Windows Azure, Build and Cloud Computing Posts for 9/12/2011+ (Part II) – Windows Azure Blog
December 10th, 2011 at 08:01 | #5

Unsafe At Any Speed: The Darkside Of Automation | Consulting & Business Intelligence Services Private Limited
March 19th, 2012 at 08:13 | #6

Regulation, Automation, and Cloud Computing | The Wisdom of Clouds – CNET News | Newvem

Quick Blip: Hoff In The Cube at VMworld 2011 – On VMware Security More On Security & Big Data…Where Data Analytics and Security Collide

Rational Survivability

Unsafe At Any Speed: The Darkside Of Automation

Related

Recent Posts

Recent Comments

Categories

Archives

Rational Survivability

Unsafe At Any Speed: The Darkside Of Automation

Related articles

Related

Recent Posts

Recent Comments

Tag Cloud

Categories

Archives