ChatGPT and AI: Detection of Automated Texts Stalls

⚡

Key Takeaways

1Three years after the emergence of ChatGPT, the identification of AI-generated texts remains uncertain.

2Technical limitations hinder progress in detecting content generated by artificial intelligence.

3The inaction of industry players contributes to the lack of effective solutions for differentiating AI texts.

💡Why it matters — The inability to distinguish AI texts raises issues of trust and authenticity in digital content.

ChatGPT and AI: Detection of Automated Texts Stalls

Summary

The failed attempts of OpenAI
A universal watermark that is not yet
A market born out of urgency
An indicator, never a certainty
The web indexes and values without distinguishing between man and machine

It took some time, but OpenAI can now recognize images generated by its models. Last May, the company publicly launched an online tool capable of detecting whether a visual was created using ChatGPT or its API, by cross-referencing C2PA metadata and the invisible watermark SynthID developed by Google DeepMind.

This long-awaited advancement brings with it a recurring question: why, more than three years after the emergence of ChatGPT, does the detection of AI-generated texts still lag? And, will we ever be able to distinguish a text produced by a robot from one written by a human, especially as some studies, perhaps alarmist, estimate that more than half of the articles published on the web are now synthetic?

The Failed Attempts of OpenAI

Naturally scrutinized in this matter due to its status as a pioneer, the company behind ChatGPT came close to a breakthrough several months ago. In August 2024, the Wall Street Journal revealed that OpenAI had, for about a year, a system of textual watermarking, "invisible to the naked eye," allowing it to determine with certainty whether all or part of a text had been generated using its large language models. An "anti-cheat" tool that, according to internal documents reviewed by the American media, boasted a success rate of 99.9%. Its principle? Slightly altering the way the conversational agent composes its sentences to create a pattern imperceptible to reading but detectable by an algorithm.

Promising on paper, the tool was never deployed on a large scale. "In trying to decide on the way forward, OpenAI employees were torn between their commitment to transparency and the desire to attract and retain users," wrote the Wall Street Journal. Some were concerned about potential workarounds, while others feared an impact on output quality. A user survey conducted in 2023 also weighed in: more than 30% of respondents stated they would use ChatGPT less if such technology were deployed, and 69% feared it would lead to false accusations of cheating.

This was not OpenAI's first failed attempt in the detection arena. In January 2023, the California firm, then in the midst of a boom, launched a free tool called AI Text Classifier, which assigned a probability degree to each analyzed text regarding whether it had been generated by artificial intelligence, ranging from "very unlikely" to "likely AI-generated." Limited from the start, trained on English-language content, and usable only for texts containing more than 1,000 characters, the tool, ridiculed by many media outlets, was ultimately shut down six months later. "It correctly identified 26% of AI-generated texts as likely written by AI while incorrectly classifying 9% of human texts in that same category," the firm acknowledged in a blog post.

A Universal Watermark That Is Not Yet

In recent years, Google has also ventured into this field. Since 2023, the company has been developing SynthID, an invisible watermark initially dedicated to images generated by its models, and gradually extended to text, audio, and video content created by its models, from Gemini to Lyria and Veo.

At the Google I/O 2025 conference, the Mountain View firm even crossed a milestone by unveiling SynthID Detector, a portal allowing users to scan these different types of content to detect the presence of the watermark. The problem: while the technology has been made open source, the portal has never been publicly deployed and is currently accessible only to journalists, researchers, and professionals who have gone through a waiting list. Moreover, like OpenAI's tools, SynthID only detects content generated by its models. A text created by ChatGPT, Le Chat, or Claude, which has not been marked by SynthID, will slip through the cracks. "This forces users to juggle multiple tools to verify the origin of content. Despite calls from researchers for a unified system, and attempts by major players like Google to have their standard adopted by others, the landscape remains fragmented," laments T.J. Thomson, an associate professor of visual communication at RMIT University in Melbourne, in an article published on The Conversation.

A Market Born Out of Urgency

The hesitance of AI players in the detection field has, in any case, opened up a market. As early as January 2023, just a few months after the launch of ChatGPT, several tools emerged to try to fill the gap, including GPTZero, Originality AI, and Winston in the English-speaking market, and Lucide AI, Draft & Goal, or Compilatio in the French-speaking market. The example of GPTZero illustrates quite well the extent of a need that manifested almost instantly. Launched on January 2, 2023 by Edward Tian, then a student at Princeton, and briefly announced on Twitter, the tool attracted several thousand visitors on its first day, to the point of overwhelming the server, as reported by WIRED. The enthusiasm did not wane afterward: twelve months later, GPTZero had 4 million users, and its parent company was already profitable, according to TechCrunch.

While each of these solutions has crafted its own recipe to try to spot synthetic content, they all track similar indicators: punctuation, sentence structure, as well as the frequency of certain words or expressions. They also have their own specificities. For Lucide AI, a French detection tool launched in 2024, particular attention is paid to "clusters of words." "Humans write with more clusters of words than AIs," explains Arthur Villecourt, co-founder of the solution, referring to the groups of terms that a human author spontaneously associates by semantic proximity or stylistic habit. "This creates natural variations, while a model tends to produce texts with a more regular structure."

"These factors may represent 20 or 25% of the analysis," tempers Arthur Villecourt, for whom relying solely on these signals is far from sufficient to produce reliable detection. To assign a credible probability score, Lucide AI has thus developed its own algorithm, connected to a LLM and continuously trained on journalistic, academic, and AI-generated content. This "additional brick" is, according to him, what separates serious paid solutions from free detectors, which often only analyze punctuation or word frequency. "The system via LLM is more expensive, but also more sophisticated," he adds. Another advantage of the approach: continuously trained on textual content, the algorithm adapts more easily to advances in models.

ChatGPT and AI: Detection of Automated Texts Stalls

Le brief IA que les pros lisent chaque soir

ChatGPT and AI: Detection of Automated Texts Stalls

Summary

The Failed Attempts of OpenAI

A Universal Watermark That Is Not Yet

A Market Born Out of Urgency

Brief IA — L'actualité IA en français