Sonix is a Python library designed to extract rich analytical signals directly from audio files — without relying on transcripts or text analysis. It focuses purely on acoustic and prosodic features to help researchers, developers, and data scientists understand conversational dynamics, emotional tone, and speaking patterns.
Sonix provides end-to-end analysis of raw audio conversations, including:
| Category | Description | Example Metrics |
|---|---|---|
| 🎚️ Basic Audio Stats | Extracts simple sound metrics useful for quality and consistency checks. | RMS Energy, Duration, Silence Ratio |
| 🗣️ Voice Activity Detection (VAD) | Identifies speaking vs. silence segments. | Speech Segments, Turn Counts |
| 🎵 Pitch & Prosody | Analyzes intonation and variation in tone. | Average Pitch, Pitch Variance |
| 💬 Speech Tempo | Measures speaking rate and rhythm. | Words per Minute (approx), Speech Rate |
| 🔊 Energy Dynamics | Examines loudness variation to detect emphasis or excitement. | Mean Energy, Energy Variability |
| 😠 Emotion & Tone Estimation | Classifies emotional states using pretrained acoustic models. | Calm, Happy, Angry, Sad |
| ⏱️ Overlap & Turn-Taking | Detects interruptions and conversational overlap. | Speaker Overlap %, Turn Durations |
| 🌈 Spectral Features | Extracts frequency-domain data for ML and acoustic analysis. | MFCCs, Spectral Centroid, Roll-off |
| 🎯 Audio Quality | Evaluates clarity and background noise. | SNR (Signal-to-Noise Ratio), Distortion |
| 🧠 Derived Conversation Insights | Combines features to infer interaction quality. | Engagement Index, Talk/Listen Ratio |
- Conversation analytics for call centers or AI voice agents
- Measuring emotional tone or stress levels in speech
- Detecting dominance or interruptions in meetings
- Generating audio-based KPIs for human–AI interactions
- Building real-time feedback tools for voice communication training
pip install Sonixfrom Sonix import AudioAnalyzer
# Load and analyze an audio file
analyzer = AudioAnalyzer("conversation.wav")
# Run full analysis
report = analyzer.analyze_all()
# Print summary
print(report.summary())
# Access individual feature groups
print(report.pitch.mean)
print(report.energy.variance)
print(report.emotion.probabilities){
"duration_sec": 180.4,
"speech_segments": 56,
"average_pitch_hz": 201.3,
"pitch_variance": 32.8,
"mean_energy": -20.5,
"energy_variability": 0.17,
"speech_rate_wpm": 142,
"emotion": {
"calm": 0.52,
"happy": 0.28,
"angry": 0.12,
"sad": 0.08
},
"overlap_ratio": 0.07,
"engagement_index": 0.81
}Sonix/
│
├── core/
│ ├── audio_loader.py # Handles input normalization, channel merging
│ ├── feature_extractor.py # Extracts MFCCs, pitch, energy, etc.
│ ├── vad_detector.py # Voice activity segmentation
│ ├── prosody_analyzer.py # Pitch, tone, tempo analysis
│ ├── emotion_estimator.py # Acoustic emotion classification
│ ├── quality_metrics.py # Noise and clarity estimation
│ └── report_builder.py # Combines results into structured JSON
│
├── models/
│ ├── emotion_model.onnx
│ └── vad_model.onnx
│
└── utils/
├── visualization.py # Waveform and spectrum plots
└── audio_helpers.py
| Library | Purpose |
|---|---|
| librosa | Audio feature extraction |
| pyAudioAnalysis | Energy and tempo metrics |
| webrtcvad | Voice activity detection |
| torch / onnxruntime | Emotion classification models |
| numpy / scipy | Signal processing |
| matplotlib | Visualization (optional) |
from Sonix.pipeline import AudioPipeline
pipeline = AudioPipeline([
"vad",
"pitch",
"energy",
"emotion",
"turn_taking"
])
results = pipeline.run("meeting.wav")
results.plot_waveform()| Metric | Description |
|---|---|
duration_sec |
Total length of the audio file |
speech_segments |
Number of detected speaking parts |
speech_rate_wpm |
Approximate speech tempo |
average_pitch_hz |
Average fundamental frequency |
energy_variability |
Standard deviation of loudness |
overlap_ratio |
% of overlapping speech between speakers |
emotion |
Probabilities of acoustic emotion categories |
engagement_index |
Combined measure of voice energy, tempo, and tone consistency |
Sonix analyze conversation.wav --plotOutput includes JSON summary + waveform visualization.
Integrate with your AI or analytics platform:
from Sonix import AudioAnalyzer
analyzer = AudioAnalyzer("agent_call.wav")
signals = analyzer.get_signals()
agent_metrics = {
"engagement": signals["engagement_index"],
"emotion": signals["emotion"]["happy"],
"speech_rate": signals["speech_rate_wpm"]
}- Real-time streaming analysis
- Speaker diarization and identification
- Gender and age acoustic profiling
- Conversation quality score model
- REST API integration
Developed by Dima Statz & Contributors
📫 Contributions welcome via pull requests and GitHub issues.
MIT License © 2025 Sonix Team
