A Third of the New Internet Is Already AI

Dan Toma·April 28, 2026·4 min read

Key Takeaway

By mid-2025, 35% of new websites since ChatGPT's launch were AI-generated or AI-assisted. The internet is getting more uniform, not more useful. The brands that survive will be the ones AI can't replicate.

A study from Stanford, Imperial College London, and the Internet Archive just put a number on what most marketers have been feeling. By mid-2025, roughly 35% of new websites published since ChatGPT launched in late 2022 were AI-generated or AI-assisted.

That's a third of the internet built since the model arrived. Three years. One in three new sites. The researchers used Pangram v3 detection software on samples pulled from the Wayback Machine across 33 months of data.

404 Media covered the working paper this week. The most quoted line is the right one: "After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years."

The volume isn't the surprising part. The pattern under the volume is.

The Internet Got More Positive and Less Diverse

The researchers tested six hypotheses. Only two held up. AI content didn't increase falsehoods, didn't reduce source citations, and didn't degrade outbound link density. The factual quality argument that everyone uses against AI content has weak empirical support.

What did show up: AI-generated content made the internet less semantically diverse and more positive overall. Different topics, different domains, different authors all converging toward similar phrasing, similar sentiment, similar structure.

That's the thing worth paying attention to. The web isn't getting worse in obvious ways. It's getting more uniform. Tone, vocabulary, framing, even the way arguments are constructed are clustering toward whatever the dominant models output.

The flattening isn't a content quality problem. It's a differentiation problem.

Why This Hits Marketing Specifically

Marketing departments were the first major buyers of generative AI for content. The original case was efficiency. Write more, faster, cheaper. The CFO loved it. The agency margins improved. Content calendars filled out.

What nobody priced into that decision was the cost of looking like everyone else.

I see this with our clients at difrnt. Their organic traffic on long-tail informational queries didn't drop because the content went bad. It dropped because their content stopped feeling distinct. Search engines and LLMs both started rewarding the source that read like an actual human wrote it from actual experience. The generic output, even when factually correct, stopped getting cited.

The article that wins now isn't longer or more comprehensive. It's the one that contains something the model couldn't have generated. A specific number from your last quarter. A counterintuitive observation from a real client. A detail nobody else has.

This shouldn't surprise anyone. When the cost of producing X falls to near zero, X stops being a moat. AI just made that true for written content at scale.

The Practical Response

The temptation right now is to fight the volume game by producing more. The math doesn't work. If 35% of new sites are AI-generated and that share is growing, you cannot out-volume your way to differentiation. You'll just add to the uniformity.

The response that does work has three parts.

First, audit your existing library for content that reads like it could have come from anywhere. That's the content losing visibility right now, regardless of word count or technical SEO. Replace it or kill it.

Second, treat original observation as the strategic asset it is. Numbers from your business. Findings from your client work. Patterns nobody else has the data to see. The internet has a surplus of well-written summaries of public information. It has a shortage of inputs that didn't already exist.

Third, accept that the people doing the work need to be visible. Brand recognition, named authors, named sources, real photos, real bios. Anonymous content from a generic site is now actively suspect. The signal value of "this came from a person who knows things" is going up, not down.

Marketing teams that internalize this will spend less on volume and more on capturing what only they can capture. That's not a workflow change. It's a budget allocation change.

The internet has a third more content than it did three years ago. The differentiation surface didn't grow. If anything, it shrank. Build for what AI can't write.

FAQ

How much of the internet is AI-generated as of 2026?

A study from Stanford, Imperial College London, and the Internet Archive estimated that approximately 35% of new websites published since late 2022 were AI-generated or AI-assisted by mid-2025. The researchers used Pangram v3 detection software on samples drawn from the Wayback Machine across a 33-month period.

Is AI content lower quality than human content?

The same study tested whether AI content increased falsehoods or reduced source citation density. Neither hypothesis was supported. What did show up was a measurable reduction in semantic diversity and a shift toward more positive sentiment overall. The internet is becoming more uniform, not necessarily less factual.

How should marketers adapt to AI-generated content saturation?

Stop competing on volume. AI content is now infinite and cheap. The advantage moves to material that AI cannot produce: original data from your business, specific observations from your client work, named authors with real expertise, and brand recognition. Distinctive sourcing now matters more than comprehensive coverage.

Source: Study Finds a Third of New Websites Are AI-Generated

FAQ

How much of the internet is AI-generated as of 2026?

Is AI content lower quality than human content?

How should marketers adapt to AI-generated content saturation?

Back to Newsletter

Subscribe to The Weekly Vibe

Every Tuesday. 5-7 original takes on what matters in AI, Marketing, and Business Growth. No spam, no fluff, unsubscribe anytime.