Artash Nath, Grade 8 student, 13 years, Toronto, Canada

artash nath seds picThe next ARIEL Science, Mission & Community 2020 Conference is happening on 14 – 16 January 2020 at the European Space Research and Technology Centre (ESTEC) in the Netherlands. Over 250 participants are registered for this conference.

I have been invited by the Conference Organisers to deliver a presentation on my project Hybrid Machine Learning Model to Remove Noise from Exoplanet Data that uses simulated data from the ARIEL Telescope.

The Atmospheric Remote-sensing Infrared Exoplanet Large-survey (ARIEL) is the fourth medium-class mission in the European Space Agency’s Cosmic Vision program. It will be a four-year mission during which ARIEL will study what exoplanets are made of, how they formed and how they evolve, by surveying a diverse sample of about 1000 extrasolar planets, simultaneously in visible and infrared wavelengths. It will be measuring the chemical composition and thermal structures of hundreds of transiting exoplanets providing new insights on planetary science beyond our solar system.

The ARIEL Machine Learning Challenge

How I came up with the Hybrid Model for The ARIEL Machine Learning Challenge?

In 2019, the ARIEL Data Challenge Series was launched to build a global community for exoplanet data solutions. The objective was to use Machine Learning (ML) to remove noise from exoplanet observations caused by starspots and by instrumentation in the simulated data from the ARIEL telescope.

As I love space, robotics, and machine learning, this challenge interested me and I entered the challenge. I registered myself as a solo team. I then worked over several weeks in my free time after school to understand the challenge objective, learn the science behind exoplanets transit light curves, and how the ARIEL telescope gathered data.


Overview of Simulated DataSet from ARIEL

Over 150,000 simulated observations were provided under the challenge alongside transit light curves in 55 different wavelengths and 300 time-step data points. In addition, 6 stellar parameters and planet-star radius ratios were provided. It took a lot of time to pre-process the data, rearrange it, divide it into training and testing data before running different machine learning models to determine the most accurate one.

Data sets vs. Machine Learning Models
The LSTM only model to handle sequential data
The Hybrid Machine Learning Model using LSTM and Feed Forward Neural Network

The model which I finally came up with was a hybrid machine learning model. The Model uses the Long Short Term Memory (LSTM) Model, a form of Recurrent Neural Network (RNN) to handle the time series (or sequential) data such as transit light curves. It uses the Feed-Forward Neural Network to handle the numerical data such as mass, radius, temperatures, period, and the magnitude of stars generated by ARIEL. I then applied the Concatenate Layer to merge the two machine learning models before passing it through Dense Layers to get the output. This hybrid model provides a higher level of accuracy and outperforms LSTM only model.


Results from my Machine Learning Model
Planet-Star Radius Ratio Predictions made using the Hybrid Model

Short Video of my Machine Learning Model

I have made a four-minute video that gives an overview of the challenge objective, datasets and pre-processing, methodology, the machine learning model applied and the results obtained. The video can be viewed at

You can download the complete presentation from here.

Free Online Tutorial: Jupyter Notebook

To expand the community of people interested in using Machine Learning on Space Data, I created a free online tutorial using Jupyter Notebook on Applying Hybrid Machine Learning to the ARIEL Simulated dataset.

Giving training using my online module for the participants at the Global BioSummit at the MIT Media Lab, Massachusetts.

It takes less than 3 hours to complete and several participants have already completed it. On 13 October 2019, I conducted a 3 hours Training Workshop at the MIT Media Lab for the participants of the Global BioSummit. I used the online module to train the participants in applying machine learning to the ARIEL dataset.

The online tutorial is available from my Github account

I will be happy to conduct a workshop for your organization or at your next event.

For more information about the data pre-processing, parameters I set up for my Machine Learning model, check out my previous Blog entry: Predicting Exoplanetary Atmospheres Using Machine Learning: The ARIEL Data Challenge 2019.

The entire Machine Learning Model and the Online Tutorial are posted on my GitHub account at

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s