San Francisco is DOWN: The Fragility of Web 2.0 Ecosystem – Common Sense Must Not Have Made the Feature List
When I fired up Firefox this morning (too much wine last night to care) I was surprised to say the least.
I am just awestruck by the fact that yesterday’s PG&E power outage in San Francisco took down some of the most popular social networking and blogging sites on the planet. Typepad (and associated services,) Craigslist, Technorati, NetFlix etc…all DOWN.
(see bottom of post for a most interesting potential cause.)
I’m sure there were some very puzzled, distraught and disconnected people yesterday. No blogging, no secondlife, no on-line video rentals. Oh, the humanity!
I am, however, very happy for all of the people who were able to commiserate with one another as they apparently share the same gene that renders them ill-prepared for what is one of the most common outage causalities on the planet: power outages.
Here’s what the TypePad status update said this morning:
Update: commenting is again available on TypePad blogs; thank you for your patience. We are continuing to monitor the service closely.
TypePad blogs experienced some downtime this afternoon due to a
power outage in San Francisco, and we wanted to provide you with the
basic information we have so far:
- The outage began around 1:50 pm Pacific Daylight Time
- TypePad blogs and the TypePad application were affected, as well as LiveJournal, Vox and other Six Apart-hosted services
- No data has been lost from blogs. We have restored access to blogs as well as access to the TypePad application.
There. (See update above.)
may be some remaining issues for readers leaving comments on blogs; we
are aware of this and are working as quickly as possible to resolve the
- TypePad members with appropriate opt-in settings should have
received an email from us this afternoon about the outage. We will
send another email to members when the service has been fully restored.
- We will also be posting more details about today’s outage to Everything TypePad.
We are truly sorry for the frustration and inconvenience that
you’ve experienced, and will provide as much additional information as
possible as soon as we have it. We also appreciate the commiseration
from the teams at many of the other sites that were affected, such as
Craigslist, Technorati, Yelp, hi5 and several others.
I don’t understand how the folks responsible for service delivery of these sites, given the availability and affordability of technology and hosting capability on-demand, don’t have BCP/DR sites or load-balanced distributed data centers to absorb a hit like this. The management team of Sixapart has experience in companies that understand that the network and connectivity represent the lifeblood of their existence; what the hell happened here in that there’s no contingency for power outages?
Surely I’m missing something here.
Craigslist and Technorati are services I don’t pay for, so one might suggest taking the service disruption with a grain of SLA salt (or not, because it still doesn’t excuse not preparing for issues like this with contingencies) but TypePad is something I *pay* for. Even my little hosting company that houses my personal email and website has a clue. I’m glad I’m not a Netflix customer, either. At least I can walk down to Blockbuster…
Yes, I’m being harsh, but I there’s no excuse for this sort of thing in today’s Internet-based economy. It affects too many people and services but really does show the absolute fragility of our Internet-tethered society.
Common sense obviously didn’t make the feature list on the latest production roll. Somebody other than me ought to be pissed off about this. Maybe when Data Center 3.0 is ready to roll, we won’t have to worry about this any longer
Interestingly, one of the other stories of affected sites relayed the woes of 365 Main, a colocation company, whose generators failed to start when the outage occurred. I met the the CEO of 365 Main when he presented at the InterOp data center summit on the topic of flywheel UPS systems which are designed to absorb the gap between failure detection and GenStart. This didn’t seem to work as planned, either.
You can read all about this interesting story here. This was problematic because the company had just issued a press release about a customer’s 2-year uninterrupted service the same day
Valleywag reported that the cause of the failure @ 365 Main was due to a drunk employee who went berserk! This seemed a little odd when I read it, but check out how the reporter from Valleywag is now eating some very nasty Crow … his source was completely bogus!