British Airways had a bad weekend. Its IT system fell over, leading to hundreds of flight cancellations and spoiling the travel plans of tens of thousands of passengers. It’s likely to cost the airline millions. A similar glitch cost Delta Airlines US$100 million last year. And that’s not counting the hit to its reputation.
So, yeah. Bad weekend.
BA’s chief executive said the root cause was a “power-supply issue”. It’s too early to say exactly what went wrong (and what went right), but presumably there was a power failure at one or more of BA’s data centres.
That’s the same problem that hit Delta last year. Although backup power kicked in there, some Delta servers weren’t connected – so the whole system failed.
What can we learn at this stage? There’s no 100% foolproof solution, but there are ways you can minimise the risk of this happening – and reduce the impact if it does.
Use the cloud
As we wrote earlier in the year, cloud computing is your best preparation to deal with power outages. Our cloud servers have uninterruptible power supplies and backup diesel generators. They keep going if the power goes out. And we have data centres in multiple locations, so even if those backups couldn’t work, another data centre could take over.
Plan your systems carefully – and test them!
Of course, in Delta’s case, they had backup power – it just wasn’t designed correctly.
You need a system that’s designed to stay up even when it’s hit hard. No single points of failure, no servers left behind. Best way to do that? Work with people with the experience and know-how. And you might not be surprised to hear that we reckon that’s us. It’s not unearned, though! Between us, we’ve designed thousands of systems over the years. We know the pitfalls. We’ve learnt from what’s worked… and, yeah, sometimes what hasn’t worked. And we test it to make sure it holds up.
Have a business continuity plan
No business is 100% automatic. So make sure you’ve thought about what would happen if the worst happened. Plan for every situation: power failure, natural disaster, cyber-attack. It’s infinitely better than an ambulance at the bottom of a cliff.
Put disaster recovery in place
Disaster recovery (DR) helps your business get back up and running in the event of a disaster. If your system crashes, or comes under attack, you can get it switched back over to a previously-replicated version without the problems.
TotalDR is our premium disaster recovery service, hosted in our Auckland and Christchurch data centres. It replicates your on-premise virtual machines each night. And we don’t just leave you to do that yourself. TotalDR comes with guaranteed CommArc resource to get your IT environment working again in the wake of disaster.