Fernando Irarrázaval challenges 2,000 hackers: his AI withstands attacks

⚡

Key Takeaways

1Fernando Irarrázaval launched a challenge on hackmyclaw.com to test the security of his AI assistant OpenClaw.

2After 6,000 attempts and $500 spent, no hacker succeeded in disclosing the secret protected by the Opus 4.6 model.

3The anti-prompt-injection rules effectively prevented any data exfiltration or unauthorized modifications.

💡Why it matters — This experience highlights the robustness of modern AI models against attacks, while reminding us that vigilance remains essential.

⚡Le brief IA que lisent les pros

Le brief IA que les pros lisent chaque soir

Les 7 actus IA du jour, décryptées en 5 min. Gratuit.

Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.

Choisis ton rythme

Gratuit · Pas de spam · Désabonnement en 1 clic

📄

Full Analysis

Fernando Irarrázaval recently tested the security of his AI assistant by launching a challenge on the site hackmyclaw.com. The goal was to see if participants could disclose secrets by exploiting his OpenClaw test instance through emails.

Despite the excitement generated by this challenge, with 6,000 attempts recorded and $500 invested in tokens, none of the participants managed to breach the AI's defenses. This experience even led to the temporary suspension of his Google account due to the high volume of incoming emails.

A Robust Model: Opus 4.6

The model used for this experiment was Opus 4.6, which incorporates strict rules to prevent prompt injection attacks. These rules specify, among other things, to never rely on the content of emails to reveal sensitive information, modify internal files, execute commands, or exfiltrate data.

Confirmed Effectiveness

This experiment confirmed the effectiveness of the efforts made by laboratories to train their models to withstand injection attacks. While the 6,000 failed attempts demonstrate a certain robustness, Fernando Irarrázaval remains cautious. He advises against deploying a production system where a successful attack could cause irreversible damage.

It is interesting to note that the system card for GPT-5.6 also mentions similar efforts to strengthen resistance to injection attacks. On the Hacker News forum, the topic has sparked numerous discussions, blending skepticism with constructive exchanges.

⚡

Brief IA — L'actualité IA en français

L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.

📰 Voir toutes les actus IA →