This week, Australia’s news and social media channels have been flooded with complaints about one company: Optus.
More than 10 million customers and hundreds of businesses were affected when one of Australia’s biggest telecommunication providers had a network outage on Wednesday. People, businesses, and even vital services, were left unable to access the internet or use phone systems for at least nine hours.
This got me thinking: what do we expect from businesses and organisations when something goes wrong?
Dealing with disasters
Now, I am not a youngster. I have faced my fair share of critical IT incidents over the past 20 years. I’ve seen an entire 1,500-person warehouse shut down, 100,000 team members affected by one error and, in one particularly memorable incident, the creation of enough cream cake orders to feed the equivalent of the Inner West.
But the one thing I learnt very early on from my mentor Steve Butler was that stakeholder and customer communication is paramount (or “king” as Steve would often say in his Essex accent.)
This was especially the case in one incident in 2005. While working in Service Delivery in one of the UK’s largest supermarkets, we experienced a failure in the Active Directory, causing a mass shutdown of the Windows XP environment. All desktops and laptops were essentially offline with no email and no access to the majority of core systems. We were lucky that stores could trade and warehouses could pick and deliver to stores, but otherwise, it was armageddon.
What did we do, what did we learn?
After the initial, “oh &*%# what just happened?” moment, the Business Continuity Plan (BCP) processes were invoked and ITIL 101 was followed.
1. The Ops Team opened up the adjacent room to the monitoring centre and created a crisis room.
2. An Incident Manager and Communications Manager were appointed.
3. Call trees were invoked to begin communications.
4. The CIO invoked executive communications.
5. A crisis bridge (open phone line) was set up to enable anyone to join and get hourly updates (remember – no email and back then no cloud).
6. A crisis bridge (incident) was set up for all technical teams to communicate on.
7. All line managers received a conference call brief.
8. All Blackberry and mobile users were sent an SMS, advising of the situation and to contact their managers
9. 50% of network and Microsoft teams were pulled into small squads to investigate.
10. All service delivery teams were engaged to communicate to their customers and advise on timeliness.
11. A crisis record was created of all steps and logs taken.
It was an all-hands to-the-pump situation and the steps to recover took almost 60 hours. But throughout we ensured vital communications were available to everyone affected. After 3 days of continuous communications and multiple handovers between exhausted teams, we were relieved when it was over. By Monday morning, the world continued as if nothing had happened.
So, back to my original question.
What do we expect from businesses when something goes wrong?
The key thing is: we expect stability from our service providers, with disasters like an Active Directory Failure or outages to be a rare occurrence. But when an incident does occur, we expect to be communicated to, whether we’re an internal business team affected by the issue or a customer. We want to know what is going on, and to have regular communication via channels that are available and not to be left in the dark.
Communication is King, after all.