Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

Marcus Wee Yu Zhe, Justin Wong Juin Hng, Lynus Lim, Joe Tan Yu Wei, Prannaya Gupta, Dillion Lim, Tew En Hao, Aloysius Han Keng Siew, Lim Yong Zhi

This repository contains all relevant codes and materials prepared for our paper, "Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications", at the 2025 International Conference on Military Communication and Information Systems (ICMCIS).

📜 Abstract

Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of ASR models fine-tuned specifically for Southeast Asian accents using a newly created dataset. Our research achieves significant improvements, achieving a Word Error Rate (WER) of 0.0982 or 9.82% on SEA-accented ATC speech. Additionally, the paper highlights the importance of region-specific datasets and accent-focused training, offering a pathway for deploying ASR systems in resource-constrained military operations. The findings emphasize the need for noise-robust training techniques and region-specific datasets to improve transcription accuracy for non-Western accents in ATC communications.

🔧 Installation and Set-Up

CUDA Environment

The codes in this project were primarily run on a CUDA 12.4 system. Certain libraries may have mismatches with different CUDA versions, so please be vary of that.

If you are using Windows or Linux, we encourage installing CUDA, CuDNN and CuBLAS. If you are using MacOS, we have not yet tested this system on a MacOS-based device.

Installing Dependencies

This project uses uv to manage all Python-based dependencies. Please follow the instructions on the site to install uv, and use it to create a virtual environment via the following command:

uv sync

This will install all the dependencies listed in pyproject.toml.

Additionally, please install ffmpeg as it is required as an audio backend for torchaudio.

💻 Getting Started

Project Structure

We have adopted a simple library structure within our codebase, with two major folders to worry about, infer/ and train/. The libraries were adapted from the old codebase, as well as a previous older library. Some of the docstring comments are LLM generated, and while they have been vetted through, please be wary of potential hallucinations.

atc-transcription/
├── _archive/              # Archive of scripts
├── infer/                 # Inference Scripts
├── train/                 # Training Scripts
├── test_audio/            # Random WSSS clips
...                        # Other files
└── README.md              # Usage instructions for the codebase

Running Experiments

To run any scripts in the environment. Please run:

uv run script.py

📚 Resources

Please refer to our Technical Report on arXiv to understand more of the features and functionalities present in our codebase.

Dataset

The dataset is not released for public use. If you do have access to our datasets on the aether-raid HuggingFace, here is a brief overview of the datasets we used:

SGdataset: Southeast Asian accented ATC communications dataset
Additional processed datasets via
- Spectral Gating (nrSG Series)
- Demucs (dSG Series)

Features

Noise Reduction: Two approaches are provided:
- As a data processing step (creates a folder of noise-reduced WAV files)
- As a preprocessing step during model training (more time-efficient)
Fine-tuning: Adaptation of Whisper ASR models specifically for Southeast Asian accented ATC communications

🖊️ Citation

@misc{wee2025adaptingautomaticspeechrecognition,
      title={Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications}, 
      author={Marcus Yu Zhe Wee and Justin Juin Hng Wong and Lynus Lim and Joe Yu Wei Tan and Prannaya Gupta and Dillion Lim and En Hao Tew and Aloysius Keng Siew Han and Yong Zhi Lim},
      year={2025},
      eprint={2502.20311},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.20311}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
_archive		_archive
infer		infer
test_audio		test_audio
train		train
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
checkversion.py		checkversion.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

📜 Abstract

🔧 Installation and Set-Up

CUDA Environment

Installing Dependencies

💻 Getting Started

Project Structure

Running Experiments

📚 Resources

Dataset

Features

🖊️ Citation

About

Uh oh!

Contributors 3

Uh oh!

Languages

aether-raid/atc-transcription

Folders and files

Latest commit

History

Repository files navigation

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

📜 Abstract

🔧 Installation and Set-Up

CUDA Environment

Installing Dependencies

💻 Getting Started

Project Structure

Running Experiments

📚 Resources

Dataset

Features

🖊️ Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages