How Severe Is A Critical Power Outage For A Datacentre?

A power outage is a break in the mains power supply which can last from milliseconds to minutes or even hours. In recent years the number of power outages recorded within Western Europe has increased. Momentary breaks in the electrical supply are increasing due to more severe and disruptive weather conditions, the switching of substations transformers, grid breakdowns and a rising demand for electricity. Demand for electricity continues to rise as economies decarbonise and move to electric transportation whilst also introducing more power from renewables. This is in addition to an increasing dependence on datacentre services fuelled by Edge and 5G connectivity.

The Uptime Institute Tier-rating System for Datacentres

The Uptime Institute is well known for its datacentre resilience Tier-rating system and has now announced a new system for rating the severity of power outages. There are four tier ratings running from 1-4 with each representing a progressively higher level of resilience and uptime.

  • Tier 1 Datacentres: has single path for critical power and cooling and few if any redundant and backup systems. The expected uptime is 99.675% which equates to 22.8hours of downtime per annum. A single UPS system is an example.
  • Tier 2 Datacentres: has a single critical power and cooling path with some redundancy and backup system. The expected uptime is 99.671% with 28/8 hours of predicted annual downtime. A UPS system with N+X1 redundancy i.e. two UPS systems operating in parallel to share the load with both rated to provide the full power required by the load if there is a unit failure.
  • Tier 3 Datacentres: has multiple critical power and cooling paths in place to allow concurrent maintenance without system downtime. The datacentre has an expected uptime of 99.982% and 1.6hours of annual downtime.
  • Tier 4 Datacentres: is a completely fault tolerant installation where each component and system has redundancy including the incoming A and B supplies. The expected uptime is 99.995% with 26.3 minutes of annual downtime predicted.

The Tier-rating system can be just as easily applied to server rooms as to datacentres. The rating system not only allows for certification by the Uptime Institute but also for colocation datacentres to differentiate themselves when competing for business.

What the Tier-ratings also do is provide an indirect measure of how capably a datacentre can ride through a power outage. As the critical power path can not only include power distribution and UPS systems but generating sets and even substation transformers.

How the of ‘UPS as a Reserve’ will impact the Tier-rating system has yet to be decided. Here a UPS system is installed with a lithium-ion battery and may be connected to a National Grid demand side response (DSR) program. The UPS/li-ion battery enabled system can be used to operate the entire datacentre load to help reduce demand on the local grid whilst the datacentre operator receives an annual service connection fee and then a feed-in-tariff like payment for every minute of operation on battery when the grid mains power supply is available. The use of lithium-ion batteries also allows the UPS system to function as an energy storage system, storing energy either from the grid as a standard lead-acid battery does or from renewable power sources including local solar PV, wind turbine or hydro power installations.

The Uptime Institute Tier-rating System for Outage Severity

The new Outage Severity Rating (OSR) from the Uptime Institute is designed to help critical infrastructure and datacentre operators to better understand and classify the severity of outages in terms of how the outage incidents affect operations. The OSR system is the outcome of a three-year project monitoring power outages and investigating their causes and impact for digital infrastructure owners. From 2016 to 2017, the Uptime Institute recorded a 288% increase in power outages with the increase proportional related to the increasing complexity of the digital operation whether this was in-house, colocation or cloud or some combination of these. The single biggest cause of outages was however power-related.

In less complex server room and datacentre environments an outage could be considered as a binary event with the services provided either being ‘online’ or ‘offline’. The Uptime Institute OSR is designed to help organisations focus on service resilience and interdependencies and to build in the appropriate Tier-rating to ensure business continuity. There are five classification rating for outage severity:

  1. Negligible: whilst recorded and reported the outage has little or no obvious impact on business services and there are no service disruptions.
  2. Minimal: some IT business services suffer disruption or a degradation in service but with minimal impact on users, customers or reputation.
  3. Significant: here the outage is significant with obvious disruptions to customer and user services but the outage has a limited scope, duration and minimal or no financial effect. There could be some compliance or reputational impacts.
  4. Serious: there is service and/or operational disruption that can lead to financial loss, compliance breaches, reputation damage and possibly safety issues.
  5. Severe: a mission critical outage has occurred with major impacts, disrupting services and/or operations, with large financial losses, possible safety issues, compliance breaches, customer losses and damage to reputation.

For more information visit: https://missioncriticalpower.uk/uptime-institute-announces-outage-severity-rating/

Both the Tier and OSR systems are linked by their use of critical power and cooling systems. Power related incidents are the biggest cause of outages and can be mitigated against by adopting a progressive approach that builds redundancy into each sector of the critical power path.

The same can be said for cooling systems where even when power is present, a failure of a computer room air conditioner or liquid cooling system can lead to over temperatures, failure of a cold-aisle containment system and a potential cooling-related outage.

Using the Tier-rating system a server room or datacentre operator can judge how to provide the most appropriate level of power protection. The system is relatively straight forward from both a design and audit or certification perspective. The Outage Severity Rating system can also be used as an audit tool in terms of predicting how sever an outage could be to the operation of an installation, as well as classifying an actual impact for organisation or industry reporting.

Summary

The datacentre industry is poor at sharing learning from outages and there is no statutory or association led obligations to do so. It is only when a severe outage is publicly reported in the press, that sometimes information is shared later as part of an investigation. The introduction of the Outage Severity Rating system may help to turn the tide on this, as the ratings provide a standardised way for organisations to classify the outages they experience.

As the number of outages continues to increase it is not inconceivable to say that most server rooms and datacentres experience the less severe classifications of outage at least once a year. Shared experience will help to improve server room and datacentre design as well as increase their availability within an ever more complex environment.