Skip to content

aether-raid/atc-transcription

Repository files navigation

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

Marcus Wee Yu Zhe, Justin Wong Juin Hng, Lynus Lim, Joe Tan Yu Wei, Prannaya Gupta, Dillion Lim, Tew En Hao, Aloysius Han Keng Siew, Lim Yong Zhi

paper

This repository contains all relevant codes and materials prepared for our paper, "Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications", at the 2025 International Conference on Military Communication and Information Systems (ICMCIS).

📜 Abstract

Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of ASR models fine-tuned specifically for Southeast Asian accents using a newly created dataset. Our research achieves significant improvements, achieving a Word Error Rate (WER) of 0.0982 or 9.82% on SEA-accented ATC speech. Additionally, the paper highlights the importance of region-specific datasets and accent-focused training, offering a pathway for deploying ASR systems in resource-constrained military operations. The findings emphasize the need for noise-robust training techniques and region-specific datasets to improve transcription accuracy for non-Western accents in ATC communications.

🔧 Installation and Set-Up

CUDA Environment

The codes in this project were primarily run on a CUDA 12.4 system. Certain libraries may have mismatches with different CUDA versions, so please be vary of that.

If you are using Windows or Linux, we encourage installing CUDA, CuDNN and CuBLAS. If you are using MacOS, we have not yet tested this system on a MacOS-based device.

Installing Dependencies

This project uses uv to manage all Python-based dependencies. Please follow the instructions on the site to install uv, and use it to create a virtual environment via the following command:

uv sync

This will install all the dependencies listed in pyproject.toml.

Additionally, please install ffmpeg as it is required as an audio backend for torchaudio.

💻 Getting Started

Project Structure

We have adopted a simple library structure within our codebase, with two major folders to worry about, infer/ and train/. The libraries were adapted from the old codebase, as well as a previous older library. Some of the docstring comments are LLM generated, and while they have been vetted through, please be wary of potential hallucinations.

atc-transcription/
├── _archive/              # Archive of scripts
├── infer/                 # Inference Scripts
├── train/                 # Training Scripts
├── test_audio/            # Random WSSS clips
...                        # Other files
└── README.md              # Usage instructions for the codebase

Running Experiments

To run any scripts in the environment. Please run:

uv run script.py

📚 Resources

Please refer to our Technical Report on arXiv to understand more of the features and functionalities present in our codebase.

Dataset

The dataset is not released for public use. If you do have access to our datasets on the aether-raid HuggingFace, here is a brief overview of the datasets we used:

  • SGdataset: Southeast Asian accented ATC communications dataset
  • Additional processed datasets via
    • Spectral Gating (nrSG Series)
    • Demucs (dSG Series)

Features

  • Noise Reduction: Two approaches are provided:
    • As a data processing step (creates a folder of noise-reduced WAV files)
    • As a preprocessing step during model training (more time-efficient)
  • Fine-tuning: Adaptation of Whisper ASR models specifically for Southeast Asian accented ATC communications

🖊️ Citation

@misc{wee2025adaptingautomaticspeechrecognition,
      title={Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications}, 
      author={Marcus Yu Zhe Wee and Justin Juin Hng Wong and Lynus Lim and Joe Yu Wei Tan and Prannaya Gupta and Dillion Lim and En Hao Tew and Aloysius Keng Siew Han and Yong Zhi Lim},
      year={2025},
      eprint={2502.20311},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.20311}, 
}

About

[ICMCIS '25] "Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications"

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages