This is an excellent description and explanation of what happened: The biggest-ever global outage: lessons for software engineers (pragmaticengineer.com)

Coldstreams