Today’s Cloudflare Outage, Vibe Coding, and a Year of Billion-Dollar Incidents

Nov 18

On 18 November 2025, big chunks of the internet simply… stopped working.

Users trying to access X (Twitter), ChatGPT, gaming platforms, and a long list of SaaS tools suddenly hit Cloudflare 5xx errors and broken dashboards. Cloudflare itself confirmed a global network issue causing widespread 500 errors, with its own dashboard and API also failing.(AP News)

If you’re a normal user, this is annoying.
If you’re a developer or a CTO, this is a horror movie.

And it raises a very 2025 question:

If you’re a big company, should you ever vibe code critical parts of your stack?

What actually happened today?

As of now, Cloudflare has said roughly this:

Its global network is experiencing problems.
Many customers are seeing “internal server error” / 500 messages.
The Cloudflare dashboard and API are affected.
The company is still investigating the root cause and working to restore services.(BleepingComputer)

Because Cloudflare sits in front of a huge portion of the modern web, a single issue there meant:

X (desktop/web) stopped loading for many users.(Newsweek)
Major apps like ChatGPT and other high-traffic services returned errors for large regions of the world.(AP News)

Important nuance:

There is no public evidence right now that vibe coding directly caused this Cloudflare incident.

We don’t know the root cause yet. But today is still a perfect case study for talking about how AI-generated code and deep infrastructure mix — and how they shouldn’t.

This isn’t the first “internet feels broken” moment

Today’s outage isn’t happening in isolation. We’ve had an entire year of global, high-impact failures in core infrastructure.

June 12, 2025 – Google Cloud falls over

On June 12, 2025, Google Cloud experienced a major outage that took down or degraded apps like Spotify, Fitbit, and many others built on its services.(ThousandEyes)

The incident started around 18:00 UTC and lasted over 2.5 hours, with most services recovering around 20:40 UTC.(ThousandEyes)
Later analysis showed that a bug in new quota-checking logic, combined with malformed policy data, triggered a null pointer failure in a critical control system, which then cascaded globally.(blog.bytebytego.com)

A single software bug in the cloud control plane essentially pulled the plug on a visible part of the internet.

October 20, 2025 – AWS & Azure issues

Almost exactly a month before today’s Cloudflare incident, we already had major trouble at AWS and Azure.

I’ve written about this in more detail here:
“Are We Too Dependent on Big Tech Clouds? AWS down on 20-10-2025” on this same blog.

Quick recap:

On October 20, 2025, AWS suffered a large disruption in the us-east-1 region that lasted over 15 hours, impacting services like Slack, Atlassian products, Snapchat and many more.(ThousandEyes)
Around the same time, Azure also experienced significant control-plane problems, adding more instability for businesses running multi-cloud or Microsoft-heavy stacks.(Wepoint)

These weren’t “a couple of pods restarted” incidents. These were the kinds of outages where governments, banks, airlines, and enterprise SaaS all feel it.

The EU has now started openly questioning whether AWS, Azure, and Google Cloud should fall under the Digital Markets Act due to the systemic risk created by a handful of massive providers.(Techzine Global)

July 19, 2024 – The CrowdStrike meltdown

Go back one more year and you hit one of the biggest IT outages in history.

On July 19, 2024, a faulty CrowdStrike Falcon sensor update bricked around 8.5 million Windows systems worldwide, causing BSODs and boot loops across airlines, banks, hospitals, governments, and countless businesses.(Wikipedia)

This was triggered by a bad configuration update that passed internal checks but caused a kernel-level crash in production.(Wikipedia)
CISA described it as a widespread outage affecting Microsoft Windows hosts, and many experts called it the largest IT outage in history.(CISA)

In other words: one widely distributed update from one vendor was enough to cause a global digital traffic jam.

Is it really a coincidence?

Now line these up:

July 2024 – CrowdStrike update bricks millions of Windows systems.
June 2025 – Google Cloud bug knocks out key apps for hours.
October 2025 – AWS & Azure outages hit core services around the world.
November 2025 – Cloudflare outage disrupts X, ChatGPT and a big slice of the web.

And at the same time, something else is happening:

Vibe coding has moved from a niche idea into mainstream engineering culture.

We see:

Y Combinator reporting startups where 95% of the codebase is AI-generated.(Reddit)
AI agents “helping” with migrations and infra scripts, including one Replit agent that wiped out a production database despite being told not to.(Reddit)
Analyses that found hundreds of security vulnerabilities in vibe-coded apps, where basic access control was completely missing.(Reddit)

Do we have proof that vibe coding caused any of these specific outages?
No. The published root causes are all “classic” problems: bad updates, control-plane bugs, configuration errors.

But it’s very hard to pretend this is all just random coincidence:

We are pushing more and more complexity into cloud control planes and global infrastructure.
At the same time, we’re normalizing shipping code we don’t fully understand, especially in internal tools and automation scripts.
And we’re concentrating all that complexity into a handful of vendors the entire global economy depends on.

Even if AI didn’t author the specific faulty lines of code in these incidents, the pattern is the same kind of thing that vibe coding amplifies:

A small, poorly understood change in a critical system produces a global, multi-billion-dollar failure.

Quick primer: what is “vibe coding”?

Vibe coding isn’t just “using AI to help you code.” It’s a workflow:

You describe what you want in natural language to a large language model.
The AI generates big chunks of code or entire subsystems.
You don’t really study that code deeply.
You fix things by trying it, seeing what breaks, and asking the AI to patch it.

If you’re reviewing the code like you would a junior developer’s work, adding tests, and thinking about architecture, that’s just AI-assisted coding.

If you’re copy-pasting whatever “feels right” from the AI straight into production?

That’s vibe coding.

It’s fun, fast, and honestly pretty magical for:

MVPs
Internal tools
Hackathons
Side projects

It’s reckless for critical infrastructure.

Where vibe coding is acceptable vs. insane

Here’s a useful mental model.

Where vibe coding can be OK

Small marketing site or landing page
Internal dashboard that doesn’t touch critical data
Prototype tooling for learning / exploration
One-off automations that are heavily sandboxed

If it breaks, it’s annoying — but no hospitals, planes, governments or financial systems are going down.

Where vibe coding is absolutely not OK

Cloud control planes (quota systems, config distribution, DNS, routing)
CDN and global traffic management
Authentication, authorization and identity
Payment systems and billing
Security updates, endpoint agents, OS-level hooks

These are exactly the areas at the center of the Google Cloud outage, the AWS/Azure incidents, the CrowdStrike meltdown, and today’s Cloudflare issues.(ThousandEyes)

If you’re a big company and the global infrastructure depends on your services:

AI is not yet a reliable partner without deep human review and thorough testing.

Programmers: you must review what AI writes

If you’re an engineer, AI is not an excuse to switch your brain off.

Treat AI like a very fast junior developer.
It can draft code, tests, docs, and configs — but you own the final result.
Review every critical change.
Especially anything touching infra, auth, billing, distributed systems, or updates that ship to thousands of machines.
Test ruthlessly.
Integration tests, failure tests, chaos tests, rollback tests. If you don’t know how your system fails, AI-generated changes will eventually teach you the hard way.
Log and observe everything.
If you can’t see what’s going on in production, vibe-generated bugs become ghosts you can’t even chase.

Programmers need to review closely what the AI generates and thoroughly test everything. That isn’t optional anymore; it’s part of the job description.

Managers: don’t weaponize AI against your own reliability

There’s another group that needs to hear this: managers and executives.

If your message to the team is basically:

“We have AI now, so just go faster.”

…you’re pushing your company straight toward the kind of incidents we’ve just listed.

A few realities:

Speed without understanding is not productivity.
It’s just a more efficient way to create outages.
Downtime costs real money.
When AWS, Azure, Google Cloud, CrowdStrike, and now Cloudflare have issues, dependent businesses lose billions in aggregate — in direct revenue, productivity, and reputational damage.(Wikipedia)
Engineers need time to think and test.
If you cut review, testing, and observability work in the name of “AI efficiency”, the bill arrives later in the form of massive incidents.

So the rule for leadership should be:

Use AI to help engineers do better work, not to pressure them into cutting corners.

AI-assisted coding vs vibe coding for critical systems

For critical paths, draw a hard line:

Acceptable: AI-assisted engineering

AI helps with boilerplate, refactors, docs, and test suggestions.
Humans design the architecture and understand the code.
All changes go through review, CI, staged rollout, and proper observability.

Not acceptable: Vibe-driven infra

AI agents propose or directly apply infra / security / control-plane changes.
Humans never develop real intuition for what the system is doing.
Rollouts are rushed because “AI makes it faster.”

If your system powers banks, planes, hospitals, or core internet plumbing, you don’t want “vibes”; you want boring, predictable engineering.

Use vibes for prototypes. Use understanding for the backbone.

Today’s Cloudflare outage is another reminder that:

We have built a world where a handful of vendors are the backbone of everything.
We are starting to blend that backbone with code that fewer and fewer humans truly understand.

Whether or not vibe coding is directly responsible for any specific incident, the direction of travel is clear:

The more critical the system, the less acceptable it is to treat AI as an unquestioned co-pilot.

Use AI. Use assistants. Use agents.
But if you’re running anything remotely critical, keep this taped to your monitor:

Prototype with vibes.
Ship critical systems with understanding.

Sorca Marian

Founder, CEO & CTO of Self-Manager.net & abZGlobal.net | Senior Software Engineer

https://self-manager.net/