Artash Nath and Arushi Nath On 28 September 2019, we were invited to speak on Artificial Intelligence and Human Interaction at the Google DevFest 2019 held at the George Brown College, […]
Artash Nath and Arushi Nath
On 28 September 2019, we were invited to speak on Artificial Intelligence and Human Interaction at the Google DevFest 2019 held at the George Brown College, Toronto.
There was a great line up of speakers from Google, IBM, Taiga Robotics, Applied Brain Research, and Ontario Government. It was an exciting event with over 200 people in the audience – researchers, academia, students and those from government and tech companies. We thank the organizers for giving us this opportunity to speak at the event.
Google DevFest 2019 and Machine Learning
We spoke about Using Machine Learning to Detect Facial Emotions and gave a demonstration of our home-built robot capable of recognizing facial emotions and making movements depending on the human it is interacting with is feeling happy or sad.
The idea behind this project came from our previous work on space exploration and machine learning. As Astronauts embark on long-duration space travel, say to Mars it would be important to monitor their mental health, moods, and feelings. Robots accompanying them should be capable of interacting with them at a more humane level to lower the stress that comes from Astronauts being away from the family and all things familiar to them.
Make an Astronauts Smile (MArS) Robot
Our Make an Astronauts Smile (MArS) bot uses a Convolutional Neural Network (CNN) built on Google Tensor Flow for facial emotion recognition. It has 3 parts:
- Face Detection (using Open CV)
- Emotion Evaluation (Tensor Flow/Python, Keras)
- Robotics (Arduino and Serial Bridge)
The robot has to first identify the human in its surrounding. We used OpenCV to make this possible. The algorithm makes uses of the geometry of human faces to lock onto humans.
It usually starts by searching for human eyes as they are one of the easiest features to detect because of their geometry and placement (pair of ovals placed horizontally with some distance between them).
After eyes, it looks for other defining features of humans including iris, nose, mouth, and the eyebrows. Once the algorithm has collected several features defining humans in the image, it uses additional validation tests and probability functions to confirm that a human face has been detected.
Now that the face has been detected, our MArs Bot needs to move towards evaluating the emotion on that face. But human and their facial emotions are not stationary (or frozen). Humans keep moving and their expressions keep changing. Thus a single image is not enough data. Instead, the robot has to keep taking multiple images – around 20 images in 4 seconds interval to get adequate data for making any predictions.
The images first need to be processed so that the emotion evaluation process can be speeded up and can happen in real-time. The images are first cropped to zoom in only to the human face and are converted to grayscale. Converting to grayscale is useful as we are not interested in looking at the color but the placement of pixels and features and therefore no important information is lost. Smaller sizes of images mean faster processing.
How do humans recognize if the other person is happy or feeling sad? The process comes easy for humans as we are familiar with the characteristics of human faces. We also had a head start of around 66 million years to create pathways between our eyes and brain to read human faces. For instance, when we smile, our mouths curve up, our eyes may appear more elongated, and our nostrils may flare, or some combination of these and other aspects. Similarly, when we are sad or are frowning, our physical appearances in terms of our eyes, mouth, forehead, and nose may change.
These physical changes are what a machine learning algorithm could train on. It could learn to identify facial changes associated with smiling and sad faces and assign weights to these changes depending on how strongly they are correlated with particular facial emotions.
In the case of MArS Bot, we used a specific type of artificial neural network run on Google Tensorflow: the Convolutional Neural Network (CNN). The underlying idea is to find ways to enable robots to view the world around them as human do and make sense of it, ie recognize faces and objects, understand languages and emotions, and gain more intelligence from the surroundings. The Convolutional Neural Network combines Computer Vision with Deep Learning to differentiate one object from another and glean more information about the target image.
Training and Testing Database + Data Augmentation
For supervised learning algorithms, we need to have a training and testing database to improve the accuracy of the model. The training data set is used by the algorithm to see and learn. And then create a model that best fits the training data.
The testing data set is used to provide an unbiased evaluation of a final model fit. The algorithm is run on the training dataset several times (epochs) to keep improving the accuracy of the predictions. It is then validated on the test data set.
In our case, we had to train our robot to detect 4 emotion types including happy, sad, surprised and angry. To start with, our training database size was 2000 images. When I (Artash) tested my model on these pictures, I achieved an accuracy score of 67%. But this was not good enough for me. I decided to enlarge the training database using data augmentation, which means making multiple copies of each image in the database and slightly altering them, like slightly rotating them, or shifting them, or even slightly blurring the image. Even though the new images created are all of the same people, almost all Neural Networks distinguish them as different people.
I made 4 more copies of each image in my training database. Each of the four images was randomly rotated at an angle between -25 degrees to +25 degrees. After the data augmentation process, I had a training database of 10,000 images (original images + augmented images) I retrained my CNN on this data set, and I achieved an improved accuracy of 80% – an improvement of 13% than before!
After calculating the emotion on each of the 20 images taken in a 4-second interval, my program takes the mode of all the emotions found and prints it out. It then assigns each evaluated emotion a number.
The robot was built by Arushi using household objects such an oil bottle, pool noodles, and a styrofoam ball. Movements of the robot were provided by 3 servos which were controlled using Arduino.
For instance, when the Robot detects a happy face, it turns its face into a happy face and raises both its arms to share the joy of the human. And when it detects a sad face, it lifts up its right arm to give a high five to cheer the human up.
The challenging part was linking the output (emotion predicted) from the machine learning algorithm in Python with Arduino. To do so a serial bridge was used. Arushi programmed the Arduino to take inputs from Python and use them to create specific movements of servos.
End Result + Future + Collaboration
The end result was very neat and the robot perfectly synchronizes with the facial emotion detected. Since its creation, it has cheered up thousands of human bringing smiles to their faces or sharing their joy in interacting with a robot.
At Google DevFest 2019 we had scores of people interacting with our Robot and learning more about Robot-Human interaction. This project is only the beginning and we want to improve the Robot further by adding more senses to it, including sound, voice, touch, and smell.
We would welcome your support and guidance in taking this project further.