Emotional AI: When Machines Try to Read Our Emotions

Le brief IA que les pros lisent chaque soir
Les 7 actus IA du jour, décryptées en 5 min. Gratuit.
Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.
Choisis ton rythme
Gratuit · Pas de spam · Désabonnement en 1 clic
Emotional AI: A Technology on the Rise
Imagine yourself in a virtual meeting, where an artificial intelligence (AI) system is analyzing every nuance of your conversation. You've been working hard, juggling tight deadlines, and when your manager asks how you're doing, you respond with a "I'm fine" accompanied by a smile. Yet, your voice betrays a slight hesitation, and your shoulders subtly slump. These subtle signals, which could indicate latent stress to a human observer, may go unnoticed by an AI model that merely categorizes emotions in simple terms like "happy" or "sad." Without the intervention of a human manager, your fatigue and potential burnout might never be acknowledged.
Emotional AI, which attempts to deduce people's feelings from their facial expressions, tone of voice, and behavior, is increasingly present in various fields. It is used for employee well-being, during job interviews, on educational platforms, and even in driver monitoring systems. Tech companies like NiCE and Genesys are integrating AI to detect when a customer is frustrated, prompting agents to adjust their responses accordingly. Giants like Meta and startups such as Hume AI are developing more expressive voice systems capable of detecting emotional cues and adapting their communication.
Moreover, hundreds of companies offer AI virtual companionship applications, a rapidly growing sector that could reach a value of $555 billion by 2035. Companion robots, like ElliQ from Intuition Robotics, are designed to interact with elderly individuals to reduce their loneliness.
However, despite the rapid advancements in emotional AI, most current systems focus on detecting a limited number of signals to label a specific emotion. This framework is insufficient for understanding the complexity of human emotions, which are contextual and evolving. A laugh can signify joy or nervousness, and a raised voice can convey enthusiasm or frustration. Emotional reactions also vary among individuals, influenced by demographic, cultural, and contextual factors.
In summary, there is a gap between the expectations placed on AI and what it can actually achieve. This gap is what a new research field, human contextual AI, seeks to bridge. Rather than limiting itself to a single input, this approach evaluates an individual's personality and character, tracking emotions in real-time through a combination of facial, vocal, linguistic, and behavioral signals. Responses are analyzed within the context of a specific environment, such as an interview or a coaching session, allowing computers to read the scene rather than being confined to the screen.
The Origins of Emotional AI
The idea of an AI capable of detecting emotions dates back nearly three decades to the MIT Media Lab, where Rosalind Picard, an electrical engineer and computer scientist, introduced the concept of "affective computing." Her work paved the way for the notion that computers could be trained to recognize and respond to human emotions.
Picard's early experiments focused on unique modalities such as facial expressions, tone of voice, and physiological signals like skin conductance or heart rate. The goal was to give machines a window into human feelings, making them more empathetic. However, at the time, technology and science were not yet ready. Computing power was limited, sensors were rudimentary, and datasets were narrow and biased.
Over the decades, researchers and companies improved their ability to measure various human expressions. In the 2010s, sentiment analysis, which involves processing large volumes of text to detect emotional undertones, began to gain popularity. Simultaneously, marketing companies, including Neurologyca, started using videos and webcams to measure and catalog customer reactions. Biometric devices and activity trackers, such as Fitbits and Apple Watches, became ubiquitous, generating new streams of data on sleep, step counts, stress levels, and more.
Unsurprisingly, scientists quickly confirmed that larger volumes of personalized data led to greater accuracy in reading human emotions. In 2019, researchers at Cornell demonstrated that combining multiple types of signals improved emotion detection. Their system paired physiological data, such as brain activity measured by electroencephalography (EEG) and heart rate, with visual cues like facial expressions, outperforming systems relying on a single input. Around the same time, Picard and her team at MIT found that humanoid robots trained on person-specific data were significantly better at reading that person's reactions and feelings than robots acting without personalized data.
More recent studies confirm these findings. In 2024, scientists in South Korea showed that merging physiological, environmental, and personal data to recognize emotions led to a 32% reduction in error. Another paper published in 2025 demonstrated that user-specific information greatly improved emotion recognition performance.
Today, our devices know who we are; our habits and trends, our likes and dislikes. They have also become smaller and more efficient. Tiny, low-power cameras and microphones embedded in phones, laptops, and virtual and augmented reality devices can simultaneously detect dozens of human signals, from eye movements and micro-expressions to breathing rhythms, voice modulation, and posture. Advances in computing have also made it possible to integrate audio, video, biometric, and textual data, often without even transmitting raw data to the cloud. Researchers from Stanford, Cambridge, and MIT, as well as Kyoto University in Japan and the Software College of Northeastern University in China, are exploring how merging these inputs can refine the sensitivity and accuracy of human-machine interactions.
And yet, despite so many breakthroughs, machines still cannot reliably interpret emotions or even physical stress. Last year, a survey published in the Journal of Psychopathology and Clinical Science revealed that stress scores on smartwatches rarely, if ever, matched the level of stress users felt. In fact, a quarter of respondents reported feeling the exact opposite of what their smartwatches reported.
Why this discrepancy? We have been quite successful at capturing signals, but not at interpreting them. A fitness tracker might infer from your heart rate that you are stressed and recommend reducing your workout, but it doesn't know if your elevated heart rate is due to excitement, fatigue, or an extra cup of coffee. Assessing emotions in real-world contexts is even more challenging. To solve this complex problem, machines need context.
From Neuromarketing to Emotion Detection AI
My company, Neurologyca, was founded in Spain in 2015 and started in the field of neuromarketing. While working with major brands and European conglomerates, our co-founder, Juan Graña, realized that companies lacked solid data on consumers. At the time, most customer feedback came from surveys, asking questions like, "On a scale of 1 to 10, how happy does this car ad make you?" or "Which emoji best describes your mood?" Naturally, these overly simplistic tools led to high levels of self-assessment bias, as people often misjudge or misreport their own reactions.
To circumvent this issue, Neurologyca established laboratories, using neuroscience and cognitive science to more accurately capture human responses to products, logos, advertisements, and experiences. In addition to using biometric tools such as heart rate monitors, eye trackers, and EEGs, we recorded millions of video frames of human reactions, noting every specific context and the resulting facial and bodily movements. To do this, we mapped over 790 reference points, including the corners of the mouth, eye and pupil sizes, blink rates, and head tilt. All this data was collected and stored anonymously under strict European privacy standards.
Next, we linked this information to the results of neuroscience and behavioral science studies on how biometrics, speech patterns, and human movement relate to emotions—research that we continue to gather from academic institutions across Europe. We also created a database of situational contexts—such as "watching a dog food advertisement" or "listening to a new song"—and the human feelings they elicited.
In our work with companies, not only did this approach allow us to recognize nuanced emotions, but it also enabled us to identify which reactions indicated positive or negative outcomes. Take, for example, the context of horror movie trailers: our research helped us understand that the most successful ones evoke a very specific mix of emotions, namely a bit of fear, a bit of anxiety, but also a bit of joy. With this knowledge, we could quickly assess viewer reactions to help a production company adjust its trailer for the desired impact.
After a few years, we discovered that a model trained on our database could accurately assess emotions using just a webcam. We no longer needed to organize focus groups in rooms filled with equipment. Instead, we could do things like send a new perfume sample to paid participants around the world with a link. When people opened the link, it activated their cameras, allowing us to record their faces as they sniffed the perfume for the first time. Suddenly, we had expanded our understanding of human emotions.
Brief IA — L'actualité IA en français
L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.