Ouch!
The Pyro Poodle Puddle has been down all day, and Jane Hamsher has taken to Twitter to express her opinion of Rackspace.
I have a customer service tip for service providers – if you forget to tell people you have scheduled maintenance, when they start calling, apologize, tell them that you will have the system back up as quickly as possible, but the fire department has to declare the area safe, and none of the injuries appeared to be life-threatening. Do not say – wow, I thought you knew we were going to do this.
6 comments
I am just flabbergasted that they didn’t migrate her virtual machines to a different availability zone.
Oh wait, I’m thinking about Amazon AWS. An outfit run by professionals. Who have their own outages from time to time, but all friggin’ day?!
They get some hefty prices for hosting at Rackspace, so the length of the down time is totally unacceptable. They should have had people diverted to an information screen, not receiving a blank screen time out.
My guys are small, but they provide a lot more information than what Jane is receiving. Whoever is monitoring Twitter should be working as a gopher to fix the problem. Jane is losing revenue every second it continues, so they should expect her to move.
The new definition of customer service: What a stallion does to a mare.
Alrighty, then!
– Badtux the Snarky Penguin
PPP finally came back this afternoon, and now the rumor is that the cloud collapsed or there was a major meltdown at a server farm.
Jane talking about RAID problems didn’t make a lot of sense to me, because my host had one last week and things slowed for while as they worked on it. That was a load balancing RAID with a drive slowing down as it headed to the land of Pat Boone CDs. Hell, when the electrician cut power to mains without telling my host, they were only down 18 hours, including the work that the electrician was doing. [Note: the work did not include my host’s racks, so they weren’t notified. Instead of just killing power to what he was working on, the electrician killed the mains.]
Turns out that Rackspace is in some deep yogurt with Wall Street over missed targets, and with governments at multiple levels in Texas over lower than promise job growth, that had been the basis of tax incentives. This would be a good time to leave them.
If one of my RAID arrays goes down, I can flip to its mirror fairly easily for the NFS shares (just takeover the IP address) and slightly longer for the iSCSI shares (have to go to the five VM’s that are currently importing iSCSI shares and point them at the mirror target, can’t just hijack IP address because the target ID is different, just another way that iSCSI sucks). But unless the entire server burned down that’s pretty difficult to have happen, because I have a fully-prepped backup chassis that I can swap the drives into if the motherboard dies, and if a drive dies I get notified and I swap in a new drive (shrug). And I’ve never had a RAID card die or a backplane die, they appear to be far more reliable than disk drives. (Also note that my boot/OS drives are mirrored, so if one of them dies I get notified too and can swap in a mirror drive).
So yeah, if little ole’ me with a budget of practically nothing isn’t going to get zapped by a RAID array going down, it pretty much had to have been a cloud meltdown. That’s one reason why, when I was designing the new infrastructure, I did not go with OpenStack (or Eucalyptus, its main competitor). I looked at the number of moving pieces/parts and came to the realization that I would never be able to figure out how to get it going again if it decided to die. So I went with just straight-forward Linux storage and straight-forward Linux libvirtd virtualization hosts, rather than something fancy with lots of moving pieces/parts.
But given that Rackspace *wrote* OpenStack (mostly), it’s still amazing that they couldn’t get it going again within a reasonably short amount of time…
The only reason my host experienced an outage is because the drive slowdown wasn’t reported as a failure, which annoyed the hell out of the techs there. They have off-site status reporting and assume that you understand geek-speak if you use them. The few minutes I was down was the result of the change over and the virtual addressing they use to conserve their IPv4 addresses. If I went with the IPv6 option I would get permanent numbers.
Coming after a bad 1st quarter, Rackspace really didn’t need this. The ‘cloud’ really rained on their parade. 😈