Microsoft suffered an outage at one of its San Antonio, Texas data centers on Tuesday, September 4th, causing service interruptions for millions of Office 365 users and third-party services reliant on Azure Cloud Services.
Severe weather and lighting strikes in the San Antonio area compromised the cooling system in the affect data center, causing a temperature spike that triggered automatic shutdown procedures to prevent permanent damage to vital data center hardware. The automatic shutdown took portions of the data center offline, resulting in cascading effects for Microsoft Office 365 users and third parties platforms relying on Azure hosted Active Directory services to authorize and grant access to user accounts.
Engineers are in the process of restoring power to affected data center devices. Resources in South Central and potentially other regions may experience impact. Please refer to your portal, https://azure.microsoft.com/en-us/status/ and/or Twitter for updates.
— Azure Support (@AzureSupport) September 4, 2018
The dense configuration and enormous amount of heat generated by modern data center hardware makes the cooling system one of the most vital components. Automatic shutdown procedures are common for data centers of this size to prevent a “meltdown” (irreparable damage to servers and other critical hardware), but it can take time to power everything back up once the underlying issue has been resolved.
In this case, Microsoft first reported the issue with the cooling system at 2:30am PT, and as of 3:15pm PT they reported that power had been fully restored to the effected systems. Unfortunately, restoring the power is only half the battle, as Microsoft’s data center engineers must then reset and restore the various services running on the effected hardware.
You can see the status yourself here: https://azure.microsoft.com/en-us/status/