Talk @Google DevFest 2019: Artificially Intelligent Robots, Mental Health and Human Interaction

Artash Nath and Arushi Nath

Pioneer Update (8 December): I discussed my project with researchers and machine learning engineers. I now plan to expand the algorithm to include Generative Adversarial Networks (GANs) for identifying body movements that co-relate with mental health.

Gave a presentation about my Machine Learning Model at the University of Toronto on 3 December 2019.

Pioneer Update (1 December): My project has been featured by Google in its Google Developers Blog. It has brought in new supporters! Looking for collaborators to take it forward….

https://developers.googleblog.com/2019/11/devkids-inside-look-at-kids-of-devfest.html

Artash (13-years-old) and Arushi (9-years-old), are a brother-sister programing team from Canada. At DevFest Toronto, they showcased their very-own facial recognition robot that uses Machine Learning to Detect Facial Emotions. Their presentation was complete with live demonstrations where their robot analyzed fellow DevFest speakers and gave physical responses to their emotions. The two up-and-coming programmers also described how they went about creating their own ML algorithm to build the robot.

What inspired them to start such a project? Space travel. Artash and Arushi believe that as astronauts embark on longer space missions, it’s important to build tools that can monitor their mental health. One day, they hope their robot will accompany astronauts on the first trip to Mars.

The Robot can also be used for improving the mental health of youths by engaging in play with them and then evaluating their emotions.

brainstorming on Machine Learning: Generative Adversarial Networks (GANs)

Pioneer Update (24 November): Meeting at Centre for Addiction and Mental Health, Toronto to learn more about mental health and using multimodal techniques to detect issues.

On 28 September 2019, we were invited to speak on Artificial Intelligence and Human Interaction at the Google DevFest 2019 held at George Brown College, Toronto.

There was a great line up of speakers from Google, IBM, Taiga Robotics, Applied Brain Research, and the Ontario Government. It was an exciting event with over 200 people in the audience – researchers, academia, students and those from government and tech companies. We thank the organizers for giving us this opportunity to speak at the event.

Google DevFest 2019 and Machine Learning

We spoke about Using Machine Learning to Detect Facial Emotions and gave a demonstration of our home-built robot capable of recognizing facial emotions and making movements depending on the human it is interacting with is feeling happy or sad.

The idea behind this project came from our previous work on space exploration and machine learning. As Astronauts embark on long-duration space travel, say to Mars it would be important to monitor their mental health, moods, and feelings. Robots accompanying them should be capable of interacting with them at a more humane level to lower the stress that comes from Astronauts being away from the family and all things familiar to them.

Artash and Arushi speaking at Google DevFest 2019 at George Brown College, Toronto

Facial Emotion Detection Robot

Our Robot uses a Convolutional Neural Network (CNN) built on Google Tensor Flow for facial emotion recognition. It has currently 3 parts:

Face Detection (using Open CV)
Emotion Evaluation (Tensor Flow/Python, Keras)
Robotics (Arduino and Serial Bridge

FACE DETECTION

The robot has to first identify the human in its surroundings. We used OpenCV to make this possible. The algorithm makes uses of the geometry of human faces to lock onto humans.

Identifying human face through its geometry

It usually starts by searching for human eyes as they are one of the easiest features to detect because of their geometry and placement (pair of ovals placed horizontally with some distance between them).

After eyes, it looks for other defining features of humans including iris, nose, mouth, and eyebrows. Once the algorithm has collected several features defining humans in the image, it uses additional validation tests and probability functions to confirm that a human face has been detected.

EMOTION EVALUATION

Now that the face has been detected, our MArs Bot needs to move towards evaluating the emotion on that face. But human and their facial emotions are not stationary (or frozen). Humans keep moving and their expressions keep changing. Thus a single image is not enough data. Instead, the robot has to keep taking multiple images – around 20 images in 4 seconds interval to get adequate data for making any predictions.

The images first need to be processed so that the emotion evaluation process can be speeded up and can happen in real-time. The images are first cropped to zoom in only to the human face and are converted to grayscale. Converting to grayscale is useful as we are not interested in looking at the color but the placement of pixels and features and therefore no important information is lost. Smaller sizes of images mean faster processing.

How do humans recognize if the other person is happy or feeling sad? The process comes easy for humans as we are familiar with the characteristics of human faces. We also had a head start of around 66 million years to create pathways between our eyes and brain to read human faces. For instance, when we smile, our mouths curve up, our eyes may appear more elongated, and our nostrils may flare, or some combination of these and other aspects. Similarly, when we are sad or are frowning, our physical appearances in terms of our eyes, mouth, forehead, and nose may change.

These physical changes are what a machine learning algorithm could train on. It could learn to identify facial changes associated with smiling and sad faces and assign weights to these changes depending on how strongly they are correlated with particular facial emotions.

In the case of MArS Bot, we used a specific type of artificial neural network run on Google Tensorflow: the Convolutional Neural Network (CNN). The underlying idea is to find ways to enable robots to view the world around them as human do and make sense of it, ie recognize faces and objects, understand languages and emotions, and gain more intelligence from the surroundings. The Convolutional Neural Network combines Computer Vision with Deep Learning to differentiate one object from another and glean more information about the target image.

Training and Testing Database + Data Augmentation

For supervised learning algorithms, we need to have a training and testing database to improve the accuracy of the model. The training data set is used by the algorithm to see and learn. And then create a model that best fits the training data.

The testing data set is used to provide an unbiased evaluation of a final model fit. The algorithm is run on the training dataset several times (epochs) to keep improving the accuracy of the predictions. It is then validated on the test data set.

In our case, we had to train our robot to detect 4 emotion types including happy, sad, surprised and angry. To start with, our training database size was 2000 images. When I (Artash) tested my model on these pictures, I achieved an accuracy score of 67%. But this was not good enough for me. I decided to enlarge the training database using data augmentation, which means making multiple copies of each image in the database and slightly altering them, like slightly rotating them, or shifting them, or even slightly blurring the image. Even though the new images created are all of the same people, almost all Neural Networks distinguish them as different people.

I made 4 more copies of each image in my training database. Each of the four images was randomly rotated at an angle between -25 degrees to +25 degrees. After the data augmentation process, I had a training database of 10,000 images (original images + augmented images) I retrained my CNN on this data set, and I achieved an improved accuracy of 80% – an improvement of 13% than before!

After calculating the emotion on each of the 20 images taken in a 4-second interval, my program takes the mode of all the emotions found and prints it out. It then assigns each evaluated emotion a number.

ROBOTICS

The robot was built by Arushi using household objects such as an oil bottle, pool noodles, and a styrofoam ball. Movements of the robot were provided by 3 servos which were controlled using Arduino.

For instance, when the Robot detects a happy face, it turns its face into a happy face and raises both its arms to share the joy of the human. And when it detects a sad face, it lifts up its right arm to give a high five to cheer the human up.

Arushi speaking on how she created the robot and transferred data from Python to Arduino using Serial Bridge

The challenging part was linking the output (emotion predicted) from the machine learning algorithm in Python with Arduino. To do so a serial bridge was used. Arushi programmed the Arduino to take inputs from Python and use them to create specific movements of servos.

End Result + Future + Collaboration

The end result was very neat and the robot perfectly synchronizes with the facial emotion detected. Since its creation, it has cheered up thousands of human bringing smiles to their faces or sharing their joy in interacting with a robot.

At Google DevFest 2019 we had scores of people interacting with our Robot and learning more about Robot-Human interaction. This project is only the beginning and we want to improve the Robot further by adding more senses to it, including sound, voice, touch, and smell.

We would welcome your support and guidance in taking this project further.

Click here for the presentation of the project.