Brief IA

The Atlantic Reveals the Tracks Used to Train AI

🛠️ AI Tools·Tom Levy·

The Atlantic Reveals the Tracks Used to Train AI

The Atlantic Reveals the Tracks Used to Train AI
Key Takeaways
1The Atlantic has launched AI Watchdog, a tool that reveals the music tracks used to train AIs.
2Alex Reisner discovered four music databases, two of which contain millions of tracks.
3The databases include songs from Lady Gaga, Radiohead, and Wu-Tang Clan, accessible through AI Watchdog.
💡Why it mattersThis raises questions about copyright and the ethical use of works in training AIs.
Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄
Full Analysis

The Atlantic has recently introduced a groundbreaking tool that allows users to discover which pieces of music have been used to train certain artificial intelligences. This tool, called AI Watchdog, offers a fascinating glimpse into the vast musical collections that have served to train these advanced technologies.

Journalist Alex Reisner played a key role in this initiative by obtaining four massive music databases. These databases, now publicly accessible via AI Watchdog, contain valuable information about the tracks used to train AI models.

Where do these data come from?

According to Reisner, two of these databases are particularly colossal, containing approximately 12 million and 9 million tracks, respectively. The other two, while smaller, still exceed 100,000 songs each. These datasets have been downloaded thousands of times, and several of them come from freely accessible sources on the Internet.

A notable example is the Free Music Archive, a platform that allows streaming for personal use while requiring a license for commercial use. However, training an AI with this data is not as simple as downloading it. Reisner explains that three of the identified databases are actually lists of links to tracks hosted on platforms like YouTube or Spotify.

Developers then use automated tools to extract the audio files. Some of these tools even bypass identification systems, advertisements, or other mechanisms intended to compensate artists, which goes against the rules established by these platforms.

What can be explored on AI Watchdog?

By exploring these databases, users can discover renowned artists such as Lady Gaga, Fred again.., Radiohead, Aphex Twin, Wu-Tang Clan, and Bruce Springsteen. Even experimental composer Hainbach is included in this list, illustrating the diversity of content used to feed generative AIs.

AI Watchdog is not limited to music. The tool also allows users to explore which songs, books, and other works have been used to train AIs. However, the tool does not specify which companies have exploited this data. To date, only Google and Stability AI have acknowledged using some of these works in their research.

Thus, this tool sparks the curiosity of many users eager to know which content has contributed to the training of AI models. This initiative also raises important questions about copyright and the ethical use of works in the field of artificial intelligence.

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.