We’ll be making our own internal investigation on how recovery went on our side and improving our process to improve the recovery time from similar events.
The datacenter has shared their RFO with us:
Full Outage Summary
Utility is lost and both Gen A and Gen B start running. Gen A then tripped off immediately after load transfer for an undetermined reason. Gen A continued to try to restart, completely draining and killing the battery even though it recently passed annual PM. UPS 1A and 2A fully discharged their batteries. Once that happened STS 1A and 2A changed sources to secondary source, which was the Catcher UPS which was being fed from Gen B. This overloaded the Catcher UPS and forced it to bypass without fully discharging its batteries. This loading on Gen B caused significant voltage fluctuations and caused UPS 1B and 2B to declare the source unavailable and continued to stay on battery until fully discharged then the units went offline. Load on block 1A and 2A remained on powered through the Catcher UPS in bypass fed from Gen B. When the utility returned the open transition between Gen B and Utility caused the remaining online equipment to trip offline resulting in all the PDU main input breakers to have tripped. This is why the customer load was not immediately restored with the utility.
Timeline
5:57 AM – Utility power to the facility was lost.
- Gen A starts but eventually fails and drains battery trying to restart.
- Gen B starts and continues to operate until utility returns with significant voltage and frequency instability. (Due to overloading)
5:58 AM – Overall System Status
- UPS 1A / 2A discharging due to no Generator power available.
- UPS 1B / 2B discharging due to ATS transition.
- Mechanical fed from Gen B back online (Half capacity)
5:58 AM – ATS-1B and ATS-2B Load transferred to Generator.
6:01 AM – UPS-1A Offline, STS-1A transfers to Source 2 (Catcher UPS)
6:03 AM – UPS-2A Offline, STS-2A transfers to Source 2 (Catcher UPS)
6:04 AM – Load on Catcher exceeds catcher capacity forcing Catcher UPS to bypass fed from Gen B
6:05 AM – Gen B starts to experience significant voltage and frequency fluctuations due to overloading.
6:06 AM – UPS-1B and UPS-2B due to poor power quality from the generator declare input power to be substandard, resumes discharging from batteries.
6:15 AM – UPS-1B with input power considered bad, fully discharges batteries and downstream load is lost. Downstream STS-1B unable to switch to source 2 (catcher) due to power quality and shuts down. Downstream PDU’s open main input breakers.
6:21 AM – UPS-2B with input power considered bad, fully discharges batteries and downstream load is lost. Downstream STS-2B unable to switch to source 2 (catcher) due to power quality and shuts down. Downstream PDU’s open main input breakers.
6:28 AM – Utility Comes back, all CRACs come back. When ATS-B performs the open transition from Generator to Utility, the remaining PDU’s operating on Gen B lose power due to having no remaining UPSs with battery capacity. Opening the remaining PDU main input breakers.
7:15 AM – Technician identifies UPS 1A, 1B and 2A are all offline. UPS 2B is online in bypass with no load.
8:20 AM – All UPS’s reset and brought back online in normal operation.
8:50 AM – All tripped PDU main input breakers reset, and customer load restored.