Originally published in LinkedIn
We have been steadily moving to improve Human Machine Interaction (HMI). Text inputs (commands) using a keyboard and a mouse have evolved into GUI, speech and gesture recognition (natural conversations). The quality of human interactions will continue to improve as systems begin to decipher visual cues and audio undertones. Interactions will be refined as machines begin to see the context of words and interpret the pauses between phrases. There is an absolutely new world of frictionless interaction on its way.
Facial recognition technologies, voice synthesis, speech recognition, text-to-speech are the wonderful breakthroughs in the past decade that are inching us towards frictionless interaction with machines. But there are gaps that continue to confound computer science.
At Wipro we have built an interviewer. The interviewer understands cues such as a candidate’s nod of the head, but it can’t tell if the candidate is getting frustrated during an interview. But as we become better at emotional analysis, we’ll have systems that truly transform industries such as retail and banking, travel and education. And yes, HR will have better interviewers too!
Machines are getting better at interpreting micro emotions. They can scan a face, for example, and say if the person is smiling or is angry. But they can’t go much beyond the content. I am reminded of a television series that was popular around 2010 called Lie to Me[i]. The hero of the series, the brilliant and acerbic Dr Cal Lightman, could identify even the most imperceptible movements in a person and say if the person was lying or telling the truth. Lightman combined his observations of micro expressions with applied psychology to arrive at conclusions in a variety of criminal investigations. But more importantly, he had the luxury of context. He could interpret expressions because he knew their context. This is the gap HMI technology has to bridge in the coming years. HMI that uses gestures and speech must also understand the underlying context, intent and emotion – stress, fatigue, irritation, boredom, frustration, dissatisfaction, happiness, pleasure – to improve the interaction.
Pioneers that are building computational platforms for natural and human interaction with machines, will soon offer better tools that enable emotional sensing and contextual interpretation. Models are also being built to support these systems. The models can interpret fuzzy attributes such as sarcasm, humor, cynicism and skepticism. All of which tells us that multi modal HMI is ready to bring about a major transformation in our relationship with machines.
[i] https://en.wikipedia.org/wiki/Lie_to_Me