whisper.axcl

This repository is a fork of whisper.axcl which is an optimized implementation of OpenAI's Whisper model for LLM8850 accelerator card.

Key Improvements

The original project loads the model (small size, ~10 seconds) for each command execution and immediately unloads it after transcription. This fork modifies the workflow to load the model once and then accept continuous input (wav file paths), making it suitable for high-frequency transcription scenarios.

A flask server is added to provide continuous speech-to-text transcription service.

Prerequisites

Building this project requires cmake, make sure to install it first:

sudo apt update
sudo apt install -y cmake

Clone and build

cd
git clone https://github.com/PiSugar/whisper.axcl.git --depth=1
cd whisper.axcl
pip install -r server/requirements.txt --break-system-packages
./build.sh

Download Model

https://huggingface.co/M5Stack/whisper-small-axmodel

https://huggingface.co/M5Stack/whisper-tiny-axmodel

https://huggingface.co/M5Stack/whisper-base-axmodel

You can clone the model repositories and link them in arguments.json for easier management:

{
  "encoder": "/home/pi/whisper-small-axmodel/ax650/small-encoder.axmodel",
  "decoder_main": "/home/pi/whisper-small-axmodel/ax650/small-decoder-main.axmodel",
  "decoder_loop": "/home/pi/whisper-small-axmodel/ax650/small-decoder-loop.axmodel",
  "position_embedding": "/home/pi/whisper-small-axmodel/small-positional_embedding.bin",
  "token": "/home/pi/whisper-small-axmodel/small-tokens.txt",
  "model_type": "small",
  "language": "en"
}

If language is not specified, the model will attempt to detect the language automatically.

Run Server

working directory: project root

bash serve.sh

API Usage

The server exposes a RESTful API on http://localhost:8801 for transcription. You can send a POST request to the /recognize endpoint with a JSON payload containing the path to the audio file.

filePath or base64 is required. If both are provided, the server will check if the file at filePath exists first. If not, it will decode the base64 data and save it to a temporary file for transcription.

{
  "filePath": "/path/to/your/audio.wav",
  "base64": "base64_encoded_audio_data"
}

Use filePath only when client and server are on the same device or share the same file system.

Response:

{
  "recognition": "How is your day?"
}

Run as Systemd Service

This script sets up the transcription server to run as a systemd service, ensuring it starts on boot and restarts on failure.

sudo bash startup.sh
sudo systemctl status whisper.service

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
3rdparty		3rdparty
cmake		cmake
models		models
server		server
src		src
toolchain		toolchain
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
TSCharacters.ocd2		TSCharacters.ocd2
TSPhrases.ocd2		TSPhrases.ocd2
build.sh		build.sh
cross_compile.sh		cross_compile.sh
demo.wav		demo.wav
main.cpp		main.cpp
serve.sh		serve.sh
startup.sh		startup.sh
t2s.json		t2s.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

whisper.axcl

Key Improvements

Prerequisites

Clone and build

Download Model

Run Server

API Usage

Run as Systemd Service

About

Uh oh!

Releases

Packages

Languages

License

PiSugar/whisper.axcl

Folders and files

Latest commit

History

Repository files navigation

whisper.axcl

Key Improvements

Prerequisites

Clone and build

Download Model

Run Server

API Usage

Run as Systemd Service

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages