What we Need to Learn After the CrowdStrike Outage
The Association for Computing Machinery’s US Technology Policy Committee (USTPC) recently issued a “Statement on Mass Cybersecurity Incidents Likely to Recur” in light of a significant global outage caused by CrowdStrike.
To recap: On July 18, 2024, CrowdStrike, a prominent US-based cybersecurity company, released a sensor configuration update that inadvertently led to a widespread outage, affecting an estimated 8.5 million computers across the globe. The incident had far-reaching consequences, disrupting critical infrastructure sectors such as airlines, 911 emergency systems, banks, government agencies, healthcare facilities, and hospitals worldwide.
While CrowdStrike has provided some initial explanations regarding the outage, the USTPC emphasizes the need for a thorough and public investigation into the incident. This inquiry is crucial so system operators, technologists, and policymakers can understand the incident’s root causes and develop strategies to prevent similar occurrences in the future.
Jody Westby, CEO of Global Cyber Risk LLC and a principal author of the USTPC Statement, highlighted two significant vulnerabilities exposed by the CrowdStrike incident. First, they underscored the fragility of the global technical infrastructure. Second, the incident revealed that existing legal and policy frameworks are inadequate to effectively respond to such large-scale cyber disruptions. Westby stressed that both technical and legal infrastructures require substantial reinforcement, and she expressed hope that the USTPC Statement would draw attention to these critical needs.
The USTPC also noted the global nature of the outage, emphasizing the need for improved international cooperation and coordination in responding to cybersecurity incidents. During the CrowdStrike outage, accessing information about the event, news of government efforts, and technical guidance were largely insufficient, leaving many to navigate the crisis on their own—especially if their systems were compromised.
Carl Landwehr, a visiting professor at the University of Michigan and another principal author of the ACM Statement, described the scale of the CrowdStrike incident as unprecedented and deeply concerning, particularly due to its impact on critical infrastructure. However, he acknowledged that for computer scientists familiar with the underlying technologies, such incidents are not entirely surprising and, perhaps most alarming, are likely to recur.
Landwehr emphasized the importance of learning from this event to mitigate the risk of similar disasters in the future. The USTPC, as a non-partisan organization of computer scientists, has outlined eight key questions that should guide a public investigation.
- How did some systems avoid the consequences of this error while others did not?
- Why was the errant software released without thorough testing?
- What lessons can we draw concerning the architecture and implementation of systems?
- What are the best practices for automatic system updates?
- Why were some systems able to come back up faster than others?
- What were the most efficient ways to restart systems that required manual intervention?
- What notification should be required?
The USTPC members have called for the US government’s Cyber Safety Review Board (CSRB) to lead a public investigation into the CrowdStrike incident. Such an investigation is vital for understanding the vulnerabilities the event exposed and preventing future occurrences of a similar scale.
The full USTPC Statement is available for further reading, providing detailed insights and recommendations on the necessary steps forward.