AI Standardizes the Web: A Cheerful Uniformity That Worries

⚡

Key Takeaways

1A study reveals that 35% of new websites will be generated by AI by mid-2025, compared to zero before 2022.

2AI-generated texts are 33% more semantically similar and 107% more positive than those written by humans.

3No evidence of an increase in factual errors has been found, despite limited analysis methods.

💡Why it matters — The saturation of the web by AI could reduce the diversity of ideas and influence public perception of information.

The Growing Impact of AI on the Web

A recent study conducted by Imperial College London, the Internet Archive, and Stanford University highlights the increasing influence of AI-generated texts on the Internet. According to their findings, by mid-2025, approximately 35% of new websites will be partially or fully created by AI, a significant increase from the end of 2022, when this proportion was nearly zero.

The researchers used a sample of English-language websites collected via the Internet Archive's Wayback Machine, covering the period from August 2022 to May 2025. To detect AI-generated texts, they relied on the Pangram v3 detector, known for its robustness in testing across five dimensions.

Semantic Uniformity and Increased Positivity

The study tested six hypotheses regarding the impact of AI, but only two were validated: semantic contraction and positivity shift. Semantic contraction results in a reduction of idea diversity online, with AI-generated texts being 33% more similar to each other than those written by humans. This could narrow the "Overton window" of online discourse.

Moreover, AI-generated texts exhibit a tone that is 107% more positive than those written by humans. This tendency towards excessive optimism could marginalize dissenting voices. Jonas Dolezal, a researcher at Stanford, suggests that allowing AI models to have a more distinct personality could make them more creative. "Rather than forcing models to be perfectly compliant and agreeable, allowing them to have a more distinct personality or 'friction' could help them act as a creative partner rather than a replacement for the human voice," he told 404 Media.

Lack of Evidence for Increased Factual Errors

Four other hypotheses were not confirmed, including the lack of evidence for an increase in factual errors. The researchers used GPT-4o-mini to extract verifiable claims, which were then verified by human annotators. However, this method relies on a limited sample and captures only a narrow type of truth degradation.

The researchers found no statistically significant correlation between the share of AI-generated content and factual errors. However, this result is based on a relatively narrow foundation: each annotator verified claims from five articles, representing a subsample of about 250 websites. Compared to the approximately 10,000 URLs per month over 33 months that underpin the full study, this is a small portion. The method also captures only a narrow type of truth degradation: clearly refutable claims. More subtle forms, such as vague, suggestive, or simply unverifiable assertions, which are likely common in AI-generated text, completely escape analysis. Additionally, as an AI model decides in advance which statements count as "verifiable" and are sent to annotators, the test is conservatively biased.

Public Perception and Recommendations

A survey of 853 American adults revealed that public perception often diverges from the data. The majority of respondents believe in unvalidated negative hypotheses, such as the disappearance of individual writing styles.

Individuals who rarely use AI were more likely to believe in negative effects than regular users (88.3% vs. 76.2%). Among AI skeptics, the gap was even wider (91.3% vs. 71.1%).

The researchers warn that the high share of AI-generated content transforms the theoretical risk of "model collapse" into a practical problem. Instead of relying on post hoc detection, they recommend cryptographic provenance standards like C2PA, as well as a reevaluation of search and recommendation algorithms to reward semantic diversity.

Maty Bohacek from Stanford indicates that the team is already working with the Internet Archive to transform the analysis into a continuous tracking tool that monitors the share of AI-generated content on the web over time. "We are now working with the Internet Archive to turn this into a continuous tool that continues to provide this signal in the future, rather than a fixed snapshot limited by the static nature of an article," Bohacek told 404 Media.

The study has limitations that the researchers acknowledge. Only English texts were analyzed; other languages and formats such as images or videos were left out. The entire analysis relies on the reliability of the Pangram v3 detector, and its accuracy could change as language models continue to evolve. The data also comes solely from the Internet Archive, which does not represent the entirety of the web.

AI Standardizes the Web: A Cheerful Uniformity That Worries

Le brief IA que les pros lisent chaque soir

The Growing Impact of AI on the Web

Semantic Uniformity and Increased Positivity

Lack of Evidence for Increased Factual Errors

Public Perception and Recommendations

Brief IA — L'actualité IA en français