Underwater Acoustics Data Challenge Workshop 2023: Using Machine Learning to Identify Bioacoustic Signals

Artash Nath, Grade 12 Student. Toronto.

A two-day residential Underwater Acoustics Data Challenge Workshop 2023 was organized by the “Special Interest Group for Underwater Acoustics (SIGUA)” of the UK Acoustics Network on 11-12 September 2023 at Guyers House, near Bath, UK. The workshop objectives were to explore solutions to research challenges set by industry.

30 people attended the workshop, mainly from the UK but some also from France and the Netherlands. I was the only representative from Canada. The event was organized in the form of a hackathon, making available underwater ocean environment data available to the participants in advance. Over the two days, the participants worked in teams based on the challenges that interested them to develop innovative solutions. Each challenge was anchored by a company, including Thales, Ultra, and ORE Catapult. Representatives from the companies were present to provide more contextual information and answer questions.

With Team Mates Valentin Bordoux, Matthew Garcia and Artash Nath: Taking on Challenge 3 (Sonobuoy Network for Marine Mammal Tracking)

When I worked on my www.MonitorMyOcean.com project to measure the quietening of global oceans during the COVID-19 lockdowns, I spent a lot of time gathering hydrophone data from different ocean observatories. While some of these sources were comfortable with sharing the data, many did not, as hydrophone data has military usage, too. It also explained why representatives from companies, such as Thales, which undertakes contracts from the UK Government and the military, could not fully disclose all the purposes they use the hydrophone data.

The Three Challenges

There were three challenges set aside, each headed by an industry partner. These were:

Challenge 1: Marine Acoustic Sensing using Repurposed Fibreoptic Cables (Thales)

Underwater fibre-optic cables that transmit data across the world can act as vast arrays of hydrophones sensing vibrations every couple of meters over thousands of kilometres. By measuring the light backscatter in these fibres, it is possible to measure the underwater sound causing these vibrations.

The participants were given access to the Ocean Observatories Initiative RAPID dataset, where two seafloor cables located offshore Oregon recorded data over a 4-day in 2021 when they were not being used. By analyzing the data, it is possible to observe wave and tidal effects and extract sounds of marine mammals (blue & fin whales) as well as shipping noise. The challenge was to develop algorithms that could extract information about these observed phenomena, characterize them, and, if possible, create visualization. https://acoustics.ac.uk/underwater-acoustics-data-challenge-workshop-2023-thales/

Challenge 2: Passive Acoustic Underwater Detection and Tracking (Ultra)

Due to the abundance of noise in underwater environments, human sonar operators typically outperform traditional passive contact follower algorithms when analyzing broadband waterfall displays. They are also better at detecting and tracking the more interesting “quieter contacts” in broadband sonar data. The challenge is to create robust algorithms that outperform the traditional algorithms without increasing the false alarm rate.

The participants were provided with 250 vignettes of synthetic Hull Mounted Sonar (HMS) data that contained signals from a number of contacts, as well as the corresponding ground truth tracks. These vignettes were 2-dimensional arrays where the columns correspond to the look angles of the HMS while the rows correspond to time. https://acoustics.ac.uk/underwater-acoustics-data-challenges-workshop-ultra-2023/

Challenge 3: Sonobuoy Network for Marine Mammal Tracking (ORE Catapult)

Marine mammal tracking reports for offshore energy project consenting are of increased interest to local governments, as they lead to a reduction in site consenting times for the development of offshore wind, tidal and wave technologies. Thus, there is a great demand for exploring underwater acoustic sensing networks to track marine mammals.

The participants in the challenge are given access to the Detection, Classification, Localisation and Density Estimation (DCLDE) Datasets from hydrophone arrays laid out in a 4×8 grid arrangement separated by around 8km each off the East Coast of Canada. The highly endangered North Atlantic Right Whales (NARW) are known to frequent this area. The 32 sonobuoys dataset contains data recorded over two days in multiplexed format, stored in .wav format, and ordered as omni, sine (East-West), and cosine (North-South) channels. The main goal of this challenge is to develop methods for detecting, characterizing, localizing, and tracking marine mammal calls. The NARW signals mainly consist of their upsweep and down-sweep calls.

Coincident shipboard and aerial NARW visual surveys, as well as oceanographic surveys by Slocum ocean gliders, were also conducted within each sonobuoy array over the two days. A spreadsheet containing information on the different whale sightings was included in the provided data to act as ground truth for testing algorithms. Furthermore, manual labelling of identical calls detected simultaneously at different sonobuoys was also included. https://acoustics.ac.uk/underwater-acoustics-data-challenge-workshop-2023-ore-catapult/

The Hackathon and Team-Making

On the evening prior to the start of the event, I met several participants, including the organizer, Dr. Alan Hunter, as well as researchers from various companies behind the three challenges. It allowed me to get a better understanding of the backgrounds of the participants, what their interests were, what they expected out of this event, and the challenge that interested them the most. Being the only high school student in the group, it was a wonderful learning opportunity to see a broad gathering of people from different universities and institutions interested in the specific subject of underwater acoustics.

The next day, after a hearty breakfast, everyone met in the common room. Alan gave an introductory talk about the workshop objectives and the schedule ahead. It was followed by a short presentation from representatives of companies acting as anchors for each of the three challenges. Many participants had already decided upon the challenge they were most interested in and had identified groups they would like to work with.

Working on Challenge 3: Using Machine Learning to Identify Bioacoustic Signals

I was interested in Challenge 3 as I have been working on acoustics signals for a few months using hydrophone data from Monterey Bay Aquarium Research Institute (MBARI) and also as a part of the working group member of the International Quiet Ocean Experiment (IQOE).

There were around ten people interested in the challenge. We all went into the courtyard and started brainstorming ideas on how to tackle the dataset. As some of the participants had already viewed the datasets and had a good understanding of them, a lot of great ideas were generated. We decided to break into even smaller groups to build upon those ideas. Three sub-groups were formed. I teamed up with Matthew Gracia, a master’s student from the University of Bath and Valentin Bordoux, a master’s student from France.

As Matthew had been working on the DLCDE dataset for his master’s degree, we were able to get a headstart on the project. The main issue we discovered was annotations on multiple sonobuoys that were needed to localize the data were missing. Labelling 32 sonobuoys manually was a challenging task, so only the first observed time of a specific whale call was usually annotated by the university researchers who had created the annotations for the DLCDE dataset. This meant there were many unannotated signals.

Our immediate objective was to use the pre-existing annotation data samples to train a machine-learning model to identify unannotated calls in the dataset. Our model should be able to go through a sliding window and detect whether a 5-second segment contains a signal. To do this, we needed to prepare a dataset containing samples of segments with signals in them as well as segments without any signal.

To make the dataset with positive detections, we created 5-second windows centred on each annotated call. For the negative call detections, we just picked a random time stamp that did not overlap with an annotated call. Although these windows may have contained a few unannotated calls, the duration of the call would have been miniscule in comparison to the duration in which there were absolutely no calls.

We trained a Convolution Neural Network (CNN) model on the negative and positive detections. As we only had 36 hours to work on our projects, we didn’t have time for much preprocessing and fine-tuning the data, which is essential for getting higher accuracy. Consequently, we could only achieve a test accuracy of 65%. Although low, it demonstrated that a highly accurate model capable of detecting new unannotated calls was possible. We applied our model to a sample buoy and were able to detect 7 new calls that had not been annotated before!

We planned to continue working to make an improved model and increase the annotations for surrounding hydrophone calls.

Group Presentations

On the last day, all the teams created presentations of the work they did and presented them to the entire group.

For Challenge 1, three teams presented their work. I found it interesting that one of the teams correlated signals detected from the fibre optic cables with the AIS vessel data. They used different segments of the cables to triangulate the distance to ships. Teams were also able to observe the overlap between marine mammal calls and the noise of the ship. In one instance, the sound of a ship overshadowed the mammal call when it passed close to it, validating the known fact that growing shipping noise is hampering the ability of marine mammals to use sound for communication and navigation. Some of the team did an additional analysis to determine the vessel and propeller type from the sound signals.

Two teams tackled the challenge 2 on sonar tracing. One of the teams tried three different models – convolutional autoencoders to generate traces from a full image, a CNN model that predicted segments, and a CNN/RNN hybrid combination that tried tracing in real-time.

Four teams tackled challenge 3. One team went in-depth into a single observed scenario, carefully analyzed a signal detected by 4 hydrophones, and used an algorithm to improve the time stamps for each call, allowing for accurate TDOA localization. Another team also went in-depth, tracing one call across 6 hydrophones and identifying the directional property of an NARW call. Our group took a more broad approach aimed at the entire dataset. Our model would automatically create annotations, allowing models created by other teams to do more in-depth analysis for each specific scenario in the dataset.

Thanks, and Until the Next Workshop

I learned a lot during this hackathon-styled workshop. The opportunity to work together with lots of researchers and young professionals allowed me to pick knowledge on new tools, algorithms and developments in the area of underwater acoustics.

Thank you to UK Acoustics Network for offering the overnight stay alongside breakfast, lunch and dinner and transportation to and from the Bath Spa train station. I hope to join you again in the next workshop.