π΅ A powerful Discord bot that brings text-to-speech functionality to your server using the GPT-SoVITS v2 API. Transform text messages into natural-sounding speech with multiple character voices, including support for custom user voice samples.
- Multiple Character Voices: Choose from a variety of pre-configured character voices
- Custom Voice Samples: Upload and use your own voice samples for personalized TTS
- Real-time Voice Chat Integration: Automatically convert text messages to speech in voice channels
- Intelligent Text Processing: Handles Discord mentions, markdown formatting, and special characters
- LLM Commands: Ask questions and get AI-powered responses with voice playback
- Image Analysis: Upload images along with questions for visual content analysis
- Conversation History: Maintains chat context for more natural conversations
- Gemini AI Integration: Powered by Google's Gemini language model
- Per-user Voice Settings: Each user can configure their preferred voice character
- Voice Channel Controls: Join/leave voice channels with simple commands
- TTS Toggle: Enable/disable TTS functionality per user
- Message Commands: Right-click any message to convert it to speech
/ask- Ask a question to the AI and get voice response/chat- Have a conversation with the AI assistant/clear_chat- Clear conversation history/set_voice- Configure your preferred voice character/get_voice- Check your current voice settings/play_tts- Convert text to speech and play in voice channel/join_voice- Make the bot join your voice channel/leave_voice- Make the bot leave the current voice channel/tts_start- Enable TTS functionality/tts_stop- Disable TTS functionality/voice_add- Add a new voice character (requires manager role)/voice_remove- Remove an existing voice character/voice_edit- Edit a voice character's audio or reference text
!!tts start- Enable automatic TTS for your messages!!tts stop- Disable automatic TTS for your messages
- Play TTS - Right-click any message to convert it to speech
- Python 3.11 or higher
- FFmpeg (included in the project)
- GPT-SoVITS v2 API server running on
http://127.0.0.1:9880/ttsor your custom endpoint - Discord Bot Token
- Google Gemini API Key (for AI features)
-
Clone the repository
git clone https://github.com/yourusername/TTS-Discord-Bot.git cd TTS-Discord-Bot -
Install dependencies
pip install -r requirements.txt
-
Configure environment variables Create a
.envfile in the project root:DISCORD_TOKEN=your_discord_bot_token GUILD_ID=your_discord_guild_id GOOGLE_API_KEY=your_gemini_api_key # Optional overrides TTS_API_URL=http://127.0.0.1:9880/tts/ TTS_TARGET_CHANNEL_ID=933384447145943071 VOICE_TEXT_INPUT_CHANNEL_IDS=1087044327315878020,1047857030226006016 VOICE_MANAGER_ROLE_ID=1003708775284342955 MESSAGE_BOT_TARGET_USER_ID=998254901538861157 VOICE_DIR=data/samples USER_SETTINGS_FILE=data/user_settings.json USER_VOICE_SETTINGS_FILE=data/user_voice.json REVERSE_MAPPING_FILE=data/game_id_to_user_id.json
-
Set up voice samples
- Place voice sample files in the
data/samples/directory - Configure sample metadata in
data/sample_data.json
- Place voice sample files in the
-
Start the bot
python run.py
Or use the provided batch file on Windows:
start.bat
TTS-Discord-Bot/
βββ bot/ # Core bot modules
β βββ api/ # API integrations
β β βββ gemini_api.py # Google Gemini API client
β β βββ gemini_chat_history.py # Chat history management
β β βββ tts_handler.py # TTS processing and audio generation
β βββ commands/ # Slash commands
β β βββ general.py # General utility commands
β β βββ llm_commands.py # AI/LLM related commands
β β βββ tts_commands.py # TTS configuration commands
β β βββ voice_commands.py # Voice channel management
β βββ events/ # Event listeners
β β βββ message_listener.py # Target channel message processing
β β βββ on_ready.py # Bot startup events
β β βββ voice_chat_text_channel_listener.py # Voice chat text processing
β βββ message_command/ # Context menu commands
β βββ analyze_material.py # Material analysis
β βββ play_tts.py # Message TTS conversion
βββ data/ # Data storage
β βββ samples/ # Voice sample files
β βββ conversations/ # Chat history storage
β βββ *.json # Configuration files
βββ utils/ # Utility functions
βββ config.py # Configuration settings
βββ run.py # Main entry point
Add new voice characters by:
- Placing audio files in
data/samples/ - Adding character metadata to
data/sample_data.json:{ "character_name": { "file": "character_voice.wav", "text": "Sample text for this character" } }
You can also manage voices in Discord using /voice_add, /voice_remove and /voice_edit (requires the role specified by VOICE_MANAGER_ROLE_ID).
Configure target channels in config.py:
TTS_TARGET_CHANNEL_ID: Channel for message-based TTSVOICE_TEXT_INPUT_CHANNEL_IDS: Channels for automatic voice chat TTSVOICE_MANAGER_ROLE_ID: Role allowed to add, remove and edit voice characters
- disnake: Discord API wrapper
- google-genai: Google Gemini API client
- pydub: Audio processing
- requests: HTTP requests
- python-dotenv: Environment variable management
- PyNaCl: Voice functionality
- Join a voice channel
- Use
/tts_startto enable automatic TTS - Type messages in configured channels to hear them spoken
- Use
/set_voiceto change your voice character
- Use
/ask "What is the weather like?"for one-time questions - Use
/chat "Hello!"to start a conversation - Upload images with
/askfor visual analysis - Use
/clear_chatto reset conversation history
- Upload your voice samples to
data/samples/ - Select "θͺε·±θ²ι³ (ιθ¦ε
δΈε³θͺι³ζ¨£ζ¬οΌ" in
/set_voice - The bot will use your custom voice for TTS
A Dockerfile is included for containerized deployment:
docker build -t tts-discord-bot .
docker run -d --env-file .env tts-discord-botContributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with disnake Discord API library
- Powered by GPT-SoVITS v2 for high-quality TTS
- AI features provided by Google Gemini
- Supports multiple languages with focus on Traditional Chinese