How a Small Database Change Broke the Internet: An Analysis of the Cloudflare Outage

See Liquibase in Action

Accelerate database changes, reduce failures, and enforce governance across your pipelines.

A single database change briefly broke part of the Internet.

On November 18, the Internet failed in a way few people expected. Traffic through Cloudflare dropped sharply. Websites stalled. Authentication flows froze. Workers KV struggled under the weight of timeouts cascading across the globe. For nearly three hours, a company known for its resilience watched its backbone misfire in slow, rhythmic waves.

The cause was not malicious. It was not a DDoS attack or a routing catastrophe. It was the quiet consequence of a single change to a database’s permissions. A minor adjustment, routine in most organizations, touched a hidden part of Cloudflare’s architecture and awakened a dependency no one had considered dangerous. That subtle shift doubled the size of a configuration file that feeds a machine learning model inside Cloudflare’s Bot Management system. The file exceeded an internal limit deep within Cloudflare’s core proxy. The proxy reacted in the worst possible way and collapsed.

What happened next revealed how modern systems fail today. Nodes that loaded the expanded file went dark. Nodes that loaded the old file continued to serve traffic. The network oscillated, recovering for minutes at a time before failing again, as if trapped between two different realities. Cloudflare’s engineers initially suspected an external attack because the symptoms felt coordinated and hostile. Only later did the true cause emerge. A metadata query began returning more rows than before because new permissions exposed an additional schema. The downstream system that consumed those rows had never been designed to handle that variation.

Once every shard of the ClickHouse cluster adopted the new permissions, every file produced was oversized and every proxy that touched it entered the same panic. Cloudflare froze the file generator, inserted a known good file, restarted core services, and worked through the tail of cascading failures until the last system recovered. By the end of the day, the company published a transparent postmortem that told a story far larger than the outage itself.

Cloudflare is one of the most capable engineering organizations in the world. Their systems are built to survive pressure that would overwhelm most companies. Their teams live in incident response. Their infrastructure is distributed, hardened, and instrumented with extraordinary detail. Yet the event that brought them down started with a quiet change in who could read what inside a database.

This should unsettle everyone who builds or operates modern systems. Today’s architectures are faster, more distributed, and more interdependent than at any other point in digital history. Everything regenerates. Everything adapts. Everything assumes the data beneath it will remain stable. When that foundation shifts, even slightly, the blast radius can reach across continents.

Outages like this are now board level events, not operations incidents. Executives understand that failure at the data layer no longer results in a brief technical interruption. It creates exposure. It undermines trust. It invites questions from regulators and customers who expect reliability even in the face of rapid innovation.

Cloudflare’s outage was not a story about a proxy limit. It was a story about the unseen assumptions that hold modern systems together. A metadata query expected a single view of the world. A downstream component expected a fixed number of features. A global propagation system expected uniformity in the file it distributed. None of those expectations were unreasonable. All of them were reasonable in isolation. Together, they created a perfect storm.

This is the fragility most enterprises underestimate. Many organizations still treat database change as something quieter and less consequential than application code. They wrap it in manual scripts, tribal reviews, and processes held together by institutional memory. They assume that the database is slow and stable. In reality, it has become one of the most dynamic components of modern infrastructure. It shapes ML features, runtime logic, access control, personalization, routing decisions, scoring models, and analytics flows. When a schema, permission rule, or metadata contract shifts unexpectedly, it does not stay contained. It ripples outward into every system that depends on it.

The arrival of AI heightens this risk. Models depend on structured signals. Pipelines depend on predictable metadata. Agents generate SQL that reaches directly into production systems. Automated build systems treat data as a living input. A harmless variation in a table’s shape can distort predictions, corrupt features, and undermine trust in automated reasoning. Modern companies are building AI on top of a data layer that often lacks the same controls, lineage, and governance applied to code.

Cloudflare’s incident showed how dangerous that assumption has become. In most enterprises, the level of visibility Cloudflare has would be considered exceptional. The speed with which they diagnosed and recovered would be nearly impossible. If a routine metadata change can break one of the most sophisticated networks on earth, what does that mean for the organizations that lack Cloudflare’s discipline and tooling?

The lesson from November 18 is not that Cloudflare stumbled. It is that the Internet runs on an increasingly delicate mesh of interconnected systems that depend on the stability of the data beneath them. When the data layer shifts without guardrails, everything above it inherits the risk. Application code will not save you. Infrastructure automation will not save you. Even best in class observability may only help you understand the blast after it has already begun.

The only real path forward is a new level of discipline at the data layer. Databases must be governed with the same rigor applied to application pipelines. Schema and metadata changes must be versioned, validated, and controlled. Drift across environments must be observable. The systems that depend on structured data must be able to trust that the shape of that data will not change without warning. Organizations that fail to adopt this posture will continue to experience failures that appear sudden, unpredictable, and inexplicable, even though the root cause is often simple and internal.

If your database changes are still moving through email threads and ticket queues, you are not governing a critical control point. You are hoping it holds.

Incidents like this will not stop. They will only get stranger and harder to diagnose as AI, automation, and distributed systems stack more logic on top of fragile data contracts. The one thing that can change is whether those contracts are governed or left to chance. On November 18, a database permission change broke the Internet. It is tempting to see this as a one off incident. It is wiser to see it as a preview. This is how modern systems fail now. Not through a single dramatic blow, but through a tiny shift in the layer that everything else assumes is immovable. The next major outage will follow the same pattern. The question is whether the next organization is prepared.

The future of resilience begins with how you govern database change.

Ryan McCurdy

VP of Marketing

Ryan brings more than 14 years of experience leading marketing at hyper-growth technology companies. He has built and scaled high-performing marketing organizations across cybersecurity, SaaS, and developer tools, driving revenue growth through a combination of brand storytelling, product marketing, and data-driven demand generation. Prior to joining Liquibase, Ryan held marketing leadership roles at companies including Astronomer, Bolster, Lacework, and Druva. Ryan holds a BA in Film Production from Brooks Institute and an MBA from Walden University.