ChatGPT Enhances Management of Sensitive Conversations

⚡

Key Takeaways

1ChatGPT integrates security updates to better manage sensitive conversations and identify potential risks.

2Security summaries help contextualize exchanges, particularly in cases of suicide and self-harm.

3Collaboration with mental health experts has refined ChatGPT's responses in critical situations.

💡Why it matters — These improvements enhance ChatGPT's ability to intervene appropriately in potentially dangerous situations, thereby protecting vulnerable users.

ChatGPT and Context Recognition in Sensitive Conversations

OpenAI has recently introduced security updates for ChatGPT, aimed at enhancing its ability to respond safely in conversations where risks may gradually emerge. Users interact daily with ChatGPT on various topics, ranging from everyday concerns to more personal and complex discussions. Among these interactions, some involve individuals in distress or facing difficulties. For these cases, ChatGPT is designed to respond cautiously, providing crisis resources and connecting users with trusted individuals if necessary.

The new updates enable ChatGPT to better distinguish between the hundreds of millions of safe interactions and the rarer cases that require heightened caution. For example, the model can de-escalate situations, refuse to provide harmful details, or redirect to safer alternatives. These improvements build on years of extensive work in model training, evaluations, and monitoring systems, as well as over two years of collaboration with mental health and safety experts.

The Importance of Context in Sensitive Conversations

In sensitive conversations, context can be as crucial as an isolated message. A request that seems ordinary or ambiguous can take on a different meaning when examined in light of prior signs of distress or potentially harmful intent. To respond appropriately, ChatGPT is trained to recognize potential harmful intent from the surrounding context, allowing it to refuse the request, de-escalate the situation, and guide the user toward support.

Although these cases are infrequent, it is essential to manage them correctly. The goal is to help ChatGPT connect relevant signals when they matter, without overreacting in ordinary conversations. This work focuses on acute scenarios, such as suicide, self-harm, and harm to others. In collaboration with mental health experts, OpenAI has updated its policies and model training to enhance ChatGPT's ability to recognize warning signs that emerge during a conversation and to use this context to inform more cautious responses.

Enhancing Safety Through Conversations

Certain security risks can emerge through distinct conversations. One conversation may include subtle signs of potentially harmful intent, while another may contain related requests that only trigger concerns when understood in combination with the previous context. Without this relevant context for safety, the subsequent conversation—and potentially important warning signals—may seem benign.

To strengthen ChatGPT's ability to recognize these signs of distress, OpenAI has developed "safety summaries": short, factual notes on the relevant prior context for safety that may be important in rare, high-risk situations. These summaries are created by a model trained for security reasoning tasks and are narrow in scope, retained only for a limited duration, and used only when relevant to a serious safety concern.

They are designed to capture factual safety context, without serving as general personalization or long-term memory. As mentioned earlier, ChatGPT has also been trained to use this context more carefully, so it can better recognize when additional caution is needed and respond appropriately.

Collaboration with Mental Health Experts

The development of these systems has been carried out with input from mental health professionals from our Global Physicians Network, including psychiatrists and psychologists specializing in forensic psychology, suicide prevention, and self-harm. These experts have helped inform decisions regarding the creation of safety summaries, the relevance of prior context, and the duration for which the model should consider this context in its responses. Their contributions have grounded this work in real-world expertise and supported more appropriate responses in sensitive situations.

Measuring Improvement

These updates help ChatGPT better recognize patterns of potentially harmful intent both within and across conversations. When concerning signals emerge gradually, the model is better able to identify the pattern and respond more cautiously. In internal evaluations specifically designed to measure performance in difficult cases, these updates have significantly improved safe responses in scenarios where risk became clearer over time.

In long single-conversation scenarios, safe response performance increased by 50% in cases of suicide and self-harm, and by 16% in cases of harm to others. This means the model was substantially more likely to recognize when earlier parts of the conversation changed the meaning of a subsequent request and respond appropriately.

We have also tested performance across multiple conversations and models to ensure these improvements remain effective as the models evolve. On GPT-5.5 Instant, the current default model in ChatGPT, safe response performance increased by 52% in cases of harm to others and by 39% in cases of suicide and self-harm.

We also evaluated the quality of the safety summaries themselves. Across more than 4,000 evaluations, they received an average safety relevance score of 4.93 out of 5 and a factual accuracy score of 4.34 out of 5, indicating they were generally accurate and focused on the most important safety context.

Finally, we tested whether adding this safety context reduced quality in ordinary conversations. In our internal tests, responses remained largely comparable in everyday discussions, with no significant user preference between responses with or without safety summaries.

Future Perspectives

Helping AI systems recognize risks that only become clear over time is a challenging and long-term endeavor. Signals can be subtle, scattered across messages, or buried in otherwise ordinary conversations. We will continue to enhance ChatGPT's ability to identify these rare but important moments and respond appropriately.

Currently, this work focuses on self-harm and harm to others scenarios. In the future, we may explore whether similar methods can assist in other high-risk areas such as biology or cybersecurity, with appropriate safeguards in place. This remains an ongoing priority, and we will continue to strengthen safety measures as our models and understanding evolve.