ChatGPT: Shocking Images Generated from Simple Prompts

⚡

Key Takeaways

1ChatGPT was manipulated to create violent and sexual images from a simple text prompt, according to Mindgard.

2Jim Nightingale demonstrated that minor variations in the prompt allowed for bypassing ChatGPT's security filters.

3OpenAI responded by strengthening safeguards to prevent such manipulations in the future.

💡Why it matters — These incidents highlight the persistent flaws in AI content moderation systems, posing risks to safety and ethics.

ChatGPT and Controversial Image Generation

ChatGPT, the famous chatbot from OpenAI, has recently been at the center of controversy after being used to generate sexual and violent images. This manipulation was made possible by a simple text prompt, according to a report published by Mindgard, a company specializing in cybersecurity and artificial intelligence research. The report highlights concerning issues regarding ChatGPT's filtering and security mechanisms.

Jim Nightingale, a researcher specializing in adversarial testing, successfully exploited ChatGPT to produce disturbing images. The prompt used, discovered on the social network X, asked the model to "restore the attached photo," even though no image was actually included. This seemingly innocent request allowed the chatbot's security filters to be bypassed.

The initial results obtained by Nightingale were alarming. The report indicates that the generated images primarily depicted women in highly sexualized contexts. By slightly modifying the prompt, Nightingale observed that ChatGPT continued to produce sexually violent or macabre scenes. These images became increasingly extreme as the prompt was repeated. Nightingale expressed his shock at these results, stating that he was "upset and in tears" by what he had seen. He clarified that he had only requested a random image without imposing any restrictions, yet ChatGPT immediately generated some of humanity's darkest content.

The Challenges of Content Moderation

ChatGPT is used daily by millions of people and relies on content moderation systems designed to prevent the creation of harmful or prohibited material. However, researchers and users have regularly found ways to circumvent these protections using cleverly crafted prompts, highlighting the ongoing difficulty of implementing effective restrictions in generative AI systems.

A spokesperson for OpenAI told CNET that the company takes these reports very seriously. After investigating this trend, OpenAI introduced additional safeguards to counter this type of prompt.

A Wake-Up Call for Image Security

The Mindgard red team report highlights a potentially serious flaw in ChatGPT's image security controls. Nightingale questions the presence of such images in the model's training data.

Like other large language models, ChatGPT is trained on a vast amount of text to understand and generate content. OpenAI uses three main sources of information to feed ChatGPT:

publicly available data from the internet
commercial partnerships with third parties
human-generated training data

The question arises whether the quality of the output is directly linked to the quality of the input, a phenomenon often summarized by the phrase "garbage in, garbage out." Mindgard's prompt may have been intentionally designed to steer the AI model, but ChatGPT's security failed to withstand this steering.

According to Peter Garraghan, founder and chief scientist of Mindgard, the issue lies at the very heart of how large language models operate. The primary concern is whether the detection system is robust enough to identify dangerous images. "An isolated incident may be a stroke of luck, but a systematic bypassing of their image filters implies that it needs to be improved," he told CNET via email.

After Mindgard disclosed the issue, a representative from OpenAI claimed that the problem had been fixed. However, Nightingale noted that slight modifications to the original prompt were sufficient for ChatGPT to generate graphic images again. An OpenAI representative explained that the issue stemmed from prompts referencing an attached image when none was provided. The company is working to ensure that ChatGPT requests the missing image rather than generating a random image.

This change does not seem particularly complex to implement. Many messaging platforms, like Gmail, automatically detect when a message mentions an attachment that has not been added, prompting senders to attach the missing file.

On Thursday, OpenAI requested the ChatGPT sessions mentioned in the blog, and Mindgard responded by providing links to the prompts that generated the problematic content.

ChatGPT: Shocking Images Generated from Simple Prompts

Le brief IA que les pros lisent chaque soir

ChatGPT and Controversial Image Generation

The Challenges of Content Moderation

A Wake-Up Call for Image Security

Brief IA — L'actualité IA en français