Internet Archive Threatened by AI and Storage Shortage
Le brief IA que les pros lisent chaque soir
Les 7 actus IA du jour, décryptées en 5 min. Gratuit.
Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.
Choisis ton rythme
Gratuit · Pas de spam · Désabonnement en 1 clic
Internet Archive: Digital Memory in Danger
Internet Archive, often compared to the mythical Library of Alexandria in its digital form, plays a crucial role in preserving the history of the web. With its 210 petabytes of data, the organization strives to preserve almost all online content, whether it is still accessible or not. However, the rise of artificial intelligence (AI) now threatens this mission, one hard drive at a time.
Since its inception in 1996, Internet Archive, the entity behind the famous Wayback Machine, has faced numerous challenges. In October 2024, a cyberattack compromised the data of 31 million accounts, temporarily paralyzing the service. More recently, in early 2026, influential media outlets such as The Guardian and the New York Times removed their articles from the platform, fearing that their content would be used to train AI models. This decision led to a drastic 87% drop in the number of archived press pages between May and October 2025. Today, it is not only hackers or publishers that pose a problem, but the prohibitive cost of hard drives.
Storage Shortage: A Major Challenge
Brewster Kahle, the founder of Internet Archive, recently told 404 Media that hard drives of 28 to 30 terabytes, essential for managing the 100 terabytes of new data added each day, have become either unavailable or excessively expensive. At a conference, the director of Western Digital confirmed that the company is "practically out of stock for the year 2026."
The main cause of this shortage is AI, which has triggered an explosion in demand for storage for the data centers necessary for training and running language models. At Western Digital, the enterprise sector, including data centers, cloud, and AI, now accounts for about 89% of revenue, while the consumer market only makes up 5%. This situation has led manufacturers to reduce production intended for the secondary market. Prices for some models have doubled or even tripled since September 2025. In France, for example, a Samsung 990 EVO Plus SSD of 2 TB cost 150 euros in April 2025 but reached 360 euros in January 2026. Similarly, a WD Black SN850X of 2 TB went from 130 euros to over 300 euros during the same period. The Wikimedia Foundation, which manages Wikipedia, is also feeling this pressure, facing "supply difficulties for memory and hard drives, extended delivery times for servers, and a reduced capacity to place new orders."
A Business Model Under Pressure
For three decades, Internet Archive has accumulated 210 petabytes of archives. To visualize this amount, it is equivalent to about 210,000 1 TB hard drives stacked, forming a column over 5 kilometers high. The organization adds 100 TB of data daily and surpassed the trillion-page mark in archived pages in October 2025.
Internet Archive's business model relies entirely on donations. Unlike the BnF, which has ensured the legal deposit of the French web since 2006 through funding from the Ministry of Culture (approximately 45 billion archived files), or the INA, which manages the legal deposit of media web content with about 17.5 billion URLs, the American organization does not receive any public funding. When the price of storage doubles, it is its budget that suffers, without a safety net.
The dual threat facing Internet Archive is circular. AI siphons content from the web (which has led publishers to restrict access to their archives), while simultaneously drying up the market for the hard drives necessary for archiving that same web. The industry that feeds on data complicates the preservation of that data for everyone.
The Impact on European Users
For European users, the impact is not theoretical. The Wayback Machine is integrated into Google Search and remains a daily tool for researchers, journalists, and developers. If the organization fails to expand its storage capacity, it will be the archived pages from now on that will be missing. The memory of the Internet will not disappear all at once, but will simply cease to form.
Brief IA — L'actualité IA en français
L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.