TechCrunch

August 21, 2015

Google data center loses data following four lightning strikes


Google must have done something to anger the gods, for they have blasted one of Google’s European data centers with lightning not once, not twice, not thrice, but four times. The incident was serious enough that the data center actually lost some data, which is exactly what data centers are supposed to avoid.
This incident occurred at Google’s europe-west1-b data center in Belgium on August 17th. This data center houses a variety of content, but the affected disks were handling Google Compute Engine (GCE) instances. The GCE service allows businesses to store data and run virtual computers in the cloud. After the four lightning strikes, some of these drives started returning I/O errors in their GCE instances. At the height of the calamity, about 5% of the disks in the data center were experiencing I/O errors. Google was able to restore many of the drives to working condition and salvage the data, but 0.000001% of the data in europe-west1-b was irrecoverably lost.
Big data centers have systems in place to prevent data loss in the event of electrical interference, and Google is obviously no exception. However, four successive lightning strikes on the electrical systems of its data center pushed the buffering and backups to their limits. The servers have battery backups, and the building itself has a full auxiliary power system. Google says these both flipped on as expected to prevent damage to the disks. However, some recently written data was stored on systems that were more susceptible to power failure or repeated battery drain. This would be the 5% of originally affected storage.



Google says it is already in the process of transitioning all its storage hardware away from the configuration that made this failure possible, and most of it is already running on the new system. That’s why only a small fraction of GCE instances were affected. So I guess the good news is that even four lightning strikes on the data center’s power system wasn’t enough to affect most of the disks google is running. If this had happened a few months down the line, there might not have been any negative impact.
That might not be comforting to the handful of customers who permanently lost data in their GCE instances. While Google accepts full responsibility for the failure, it also points out that GCE is by its nature tied to a single data center. Customers who are particularly worried might want to use GCE snapshots and Google Cloud Storage for geographically independent systems.

No comments:

Post a Comment