Brief IA

Authors Guild: AI Detectors Disagree on Texts

🤖 Models & LLM·Tom Levy·

Authors Guild: AI Detectors Disagree on Texts

Authors Guild: AI Detectors Disagree on Texts
Key Takeaways
1The Authors Guild tested five AI detectors on human-written texts to assess their accuracy.
2Pangram and Grammarly successfully identified all texts as being written by humans.
3Sidekicker and ZeroGPT falsely classified human writings as AI-generated, highlighting a reliability issue.
💡Why it mattersThe ability of AI detectors to distinguish human writing is crucial for the credibility of content and the fight against misinformation.
Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄
Full Analysis

Authors Guild: AI Detectors Disagree on Texts

In a test conducted by the Authors Guild, the AI detectors from Pangram and Grammarly correctly identified each human-written text as human. Originality.ai also performed well. The test used ten articles from the Guild published between 2020 and 2022, before generative AI became commonplace. Sidekicker received the worst results, with every article flagged as primarily AI-generated, including two articles marked at 100%. ZeroGPT also proved unreliable, sometimes reporting high AI percentages for all human-written texts.

False Positives Can Be Costly for Authors

However, the oldest and largest professional organization for writers warns that even the best-performing tools should never be the sole basis for a decision. These tools are constantly changing, and their accuracy cannot be taken for granted. The CEO of Pangram, Max Spero, recently explained that his detector is essentially a black box, with no way to explain in detail why a text is flagged as AI-generated. Language models betray themselves through their uniformity, particularly in how they construct arguments. Humans write with much more variety, Spero stated.

Professionally written texts share many statistical patterns similar to those of AI outputs, according to the Authors Guild, simply because language models have been trained on this type of writing. Erroneous results can cost authors their contracts and reputation, so it is essential for publishers to disclose their methods and always give authors the opportunity to defend themselves.

This creates a troubling paradox. A writer who has spent decades perfecting clarity, economy, and precision writes, by definition, in a way that overlaps with what AI has learned to produce. Detection tools cannot distinguish between a human writer who has mastered their craft and a machine that has learned to imitate it, as at the level where these tools operate, there may be few differences to find.

That said, the fact that Pangram and Originality reliably identify human-written texts as human does not necessarily mean they are also good at detecting those generated by AI. The results primarily show that these tools are tuned to minimize false positives, avoiding cases where a human text is wrongly flagged as AI. Many texts written by or with AI could still go unnoticed. The demonstrated reliability in this test applies primarily to the correct recognition of human writing.

The Cultural Debate Surrounding Detection

Errors will continue to occur, which is why the utility of these detectors is often questioned. This is especially true since AI can be a genuinely useful writing tool, and the broader debate often conflates the use of AI for writing with the use of AI for thinking.

Proponents of detectors, like Pangram's CEO Max Spero, justify their business model by highlighting a social contract between the writer and the reader. The writer invests time and effort to shape an idea; the reader invests time to engage with it. If AI reduces the cost of writing to zero, it leads to poor incentives, and people flood the Internet with worthless content that takes readers more time to consume than it took the author to produce, Spero stated.

Whether the value of a text comes from typing or from the selection of the subject, idea, perspective, story, research, argument, and judgment that underlie it is a completely different question. The same goes for whether AI text detection can actually do anything against the flood of worthless content.

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.