Skip to content

PiSugar/whisper.axcl

 
 

Repository files navigation

whisper.axcl

This repository is a fork of whisper.axcl which is an optimized implementation of OpenAI's Whisper model for LLM8850 accelerator card.

Key Improvements

The original project loads the model (small size, ~10 seconds) for each command execution and immediately unloads it after transcription. This fork modifies the workflow to load the model once and then accept continuous input (wav file paths), making it suitable for high-frequency transcription scenarios.

A flask server is added to provide continuous speech-to-text transcription service.

Prerequisites

Building this project requires cmake, make sure to install it first:

sudo apt update
sudo apt install -y cmake

Clone and build

cd
git clone https://github.com/PiSugar/whisper.axcl.git --depth=1
cd whisper.axcl
pip install -r server/requirements.txt --break-system-packages
./build.sh

Download Model

https://huggingface.co/M5Stack/whisper-small-axmodel

https://huggingface.co/M5Stack/whisper-tiny-axmodel

https://huggingface.co/M5Stack/whisper-base-axmodel

You can clone the model repositories and link them in arguments.json for easier management:

{
  "encoder": "/home/pi/whisper-small-axmodel/ax650/small-encoder.axmodel",
  "decoder_main": "/home/pi/whisper-small-axmodel/ax650/small-decoder-main.axmodel",
  "decoder_loop": "/home/pi/whisper-small-axmodel/ax650/small-decoder-loop.axmodel",
  "position_embedding": "/home/pi/whisper-small-axmodel/small-positional_embedding.bin",
  "token": "/home/pi/whisper-small-axmodel/small-tokens.txt",
  "model_type": "small",
  "language": "en"
}

If language is not specified, the model will attempt to detect the language automatically.

Run Server

working directory: project root

bash serve.sh

API Usage

The server exposes a RESTful API on http://localhost:8801 for transcription. You can send a POST request to the /recognize endpoint with a JSON payload containing the path to the audio file.

filePath or base64 is required. If both are provided, the server will check if the file at filePath exists first. If not, it will decode the base64 data and save it to a temporary file for transcription.

{
  "filePath": "/path/to/your/audio.wav",
  "base64": "base64_encoded_audio_data"
}

Use filePath only when client and server are on the same device or share the same file system.

Response:

{
  "recognition": "How is your day?"
}

Run as Systemd Service

This script sets up the transcription server to run as a systemd service, ensuring it starts on boot and restarts on failure.

sudo bash startup.sh
sudo systemctl status whisper.service

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 92.8%
  • Python 4.5%
  • CMake 2.0%
  • Shell 0.7%