Having worked on 24/7 mission critical systems, a system down is the ultimate answer to the ‘what keeps you up a night?’ question, but is it acceptable today to have 24-hour outages? I can sympathise, as issues occur and we have all been there, trying to keep our heads and find the root cause/fix, while everyone else around is losing theirs. IT has penetrated our world to the extent that we depend on it for many everyday activities, so as a provider there is nowhere to hide when there is a system fault, and there is limited customer patience, who ultimately will vote with their feet.
If HSBC suffered an update issue it was probably the 1st in 100s of previous successful updates, as the pace of system updates is now continuous and often seamless. How often have you been surprised by a new feature appearing on your favourite software overnight? Whether it is to stay relevant, find a competitive edge, drive efficiency, or save cost, businesses need to continuously update systems to survive. In the eagerness to push these updates through and realise the associated benefits ASAP, it can be easy to become complacent. For me there are two key areas requiring focus before any upgrade:
✅ QA – It may be the last step holding up getting your wonderful enhancement out to users, but it is the most vital step. The days of patching single versions of software with trepidation as you never knew if it would work until you finished are long gone. It is now common practice to have multiple QA environments, use of CI/CD principles to accommodate and test incremental code changes, automate testing, use standby systems, and utilise cloud technologies for temporary testing or exploiting elasticity and scalability benefits. Always have an efficient roll back plan to avoid a lengthy recovery process. It is wiser to be risk averse and if in doubt, roll back and come back to fight another day.
✅ Customer focus – The official line from HSBC was “an issue with our internal systems” whether that was down to an update, capacity issue or something else, we will never know, but the timing could not have been worse being the morning of black Friday, one of the busiest online transactions days of the year! Put yourself in your customers shoes and consider the impact on them. Not just the timing of your update but understand what is critical to them day to day when using your system, as that will focus your attention on any appropriate contingency plans to put in place. Have a comms plan, whether that is to inform your customers of the impending update or what to say to them if something does go wrong. Good issue and impact management will always lessen the blow and restrict reputational damage.
Unfortunately, the fact that your systems have been up 99.99% of the time will go unnoticed and the headlines will be focused on the 0.01% downtime, so a little more focus on your update events is well worth the effort.

