A multilingual conversational AI agent powered by ElevenLabs, featuring real-time audio visualization, geographic location awareness, and 30+ integrated tools including weather, news, search, image gallery, navigation, flight search, and academic research.
- Alec Fritsch (@flokzybtw)
- Mehmet Ali Dolgun (@psychip_)
This application demonstrates an advanced conversational AI interface with:
- Real-time voice conversation using ElevenLabs Conversational AI
- Multi-language support with Turkish, English, German, and Spanish
- Dynamic audio visualization with speech activity detection
- Geographic awareness with IP-based location detection
- 30+ integrated tools for weather, news, search, navigation, flights, and more
- Touch-friendly interface with automatic device detection
- Image gallery system with automated visual search and modal view
- Responsive web interface with mobile optimization
-
web-search - Search the web using Google
- "Search for quantum computing"
- "Look up climate change effects"
-
image-search - Find images across the web
- "Show me pictures of Mount Everest"
- "Find images of sports cars"
- Automatically triggers when discussing celebrities, places, landmarks, movies, products, animals, or any visual subject
-
latest-news - Get recent news articles by location or topic
- "What's the latest news?"
- "Get technology news"
- "News about Istanbul"
- Automatically filters out sports news unless specifically requested
-
latest-earthquakes - Check recent earthquakes near location
- "Any earthquakes nearby?"
- "Recent earthquakes in California"
- Reports magnitude, location, and depth
-
get-weather - Get current weather and forecast
- "What's the weather?"
- "Weather in London"
- "Will it rain today?"
-
poi-search - Find nearby points of interest
- "Find a hospital nearby"
- "Where's the nearest gas station?"
- "Show me restaurants"
- Types: hospital, pharmacy, gas station, charging station, atm, parking, hotel, cafe, bank, police
-
save-location - Save current location as KML file
- "Save this location"
- "Mark this as parking spot"
-
local-events - Find upcoming local events
- "What's happening this weekend?"
- "Any concerts in Berlin?"
-
get-address - Reverse geocoding to identify current location
- "Where am I?"
- "What is this place?"
- "I'm lost"
-
flight-search - Search for available flights between cities
- "Find flights to Berlin"
- "Flights from Istanbul to Berlin tomorrow"
- "Fly to London today"
- Automatically finds airport IATA codes via web search for any city
- Supports date parsing (today, tomorrow, YYYY-MM-DD)
- Converts USD prices to local currency
-
Google Maps Navigation - Get driving directions
- "Navigate to Istanbul"
- "Take me to the airport"
- "Directions to the nearest hospital"
-
Hotel Search - Find accommodation via Hotels.com
- "Find a hotel"
- "Hotels in Paris"
- "Where to stay in Tokyo"
-
music-search - Search and play music
- "Play Bohemian Rhapsody"
- "Play Mozart Symphony No 40"
-
YouTube Search - Find videos and music
- "Show me the Thriller music video"
- "How to tie a tie"
-
SoundCloud Search - Find music, remixes, DJ sets
- "Find lo-fi hip hop on SoundCloud"
- "Search for deadmau5 live set"
-
Amazon Search - Search for products
- "Find wireless headphones on Amazon"
- "Search for Sony cameras"
-
eBay Search - Find used items, collectibles
- "Find used MacBook Pro on eBay"
- "Search for vintage watches"
-
app-search - Find apps for your platform
- "Find Spotify"
- "Search for WhatsApp"
- Auto-detects platform (Android/iOS/Windows/Linux)
-
Google Scholar - Search academic papers across all disciplines
- "Find research on climate change"
-
Semantic Scholar - AI-powered academic search with citation context
- "Find machine learning papers"
-
PubMed - Medical and life sciences research
- "Search for diabetes treatment research"
-
JSTOR - Humanities and social sciences archives
- "Find articles on ancient philosophy"
-
ResearchGate - Academic networking and paper sharing
- "Find papers on renewable energy"
-
Reddit Search - Find community discussions
- "Search Reddit for gaming PC builds"
-
X/Twitter Search - Real-time updates and reactions
- "Search Twitter for AI news"
- IMDB Search - Find movies, TV shows, actors
- "Find Inception on IMDB"
- "Search for Breaking Bad"
-
calculator - Complex mathematical calculations
- "Calculate square root of 144 plus 5 squared"
- "Convert 3.5 inches to centimeters"
- "Multiply matrix [1,2][3,4] by [5,6][7,8]"
-
currency-convert - Convert between currencies
- "Convert 100 USD to EUR"
- "How much is 50 dollars in my currency?"
-
visible-aircraft - Check aircraft overhead
- "How many planes are in the sky?"
- "Show visible aircraft"
-
author - Generate long-form content (recipes, code, guides)
- "Write a Python script to backup files"
- "Give me a chocolate cake recipe"
- "Create a Linux installation guide"
-
pick-card - Randomly select and open an image with personalized comment
- "Pick one"
- "Show me one"
- "Open one of those"
- Agent provides unique contextual comments for each selection
-
next-card - Navigate to next image in modal
- "Next"
- "Show me another"
-
close-card - Close image modal
- "Close"
- "That's enough"
-
take-note - Capture spoken notes
- "Take a note: meeting at 3 PM"
- "Remember to buy milk"
-
save-name - Save your name for personalization
- "My name is John"
- "I'm Sarah"
-
volume-adjust - Adjust master volume by 10%
- "Turn it up"
- "I can't hear you"
- "Too loud"
- Recognizes casual volume requests
-
reset - Factory reset with data clearing
- "Forget about me"
- "Delete everything"
- "Reset to factory settings"
- Clears all user data and preferences
-
end-session - End conversation
- "Goodbye"
- "End session"
- Tab - Toggle text input window
- ` (Backtick) - Toggle debug console
- Escape - Close image modal
- Arrow Left - Previous image in modal
- Arrow Right - Next image in modal
- Node.js with Express.js server
- Webpack for module bundling
- Web Audio API for real-time audio processing
- Canvas API for audio visualization and image gallery
- MaxMind GeoIP2 for location detection
- ElevenLabs API - Voice synthesis and conversation management
- SerpAPI - Web search, image search, news, events, and flight data
- OpenWeather API - Weather information and forecasts
- Google Places API - Points of interest search
- AltınKaynak API - Turkish Lira currency rates
- OpenExchangeRates API - Global currency conversion
- EMSC & USGS - Earthquake data feeds
- MaxMind GeoLite2 - Local IP geolocation
- AviationStack API - Visible aircraft tracking
- Math.js - Complex mathematical calculations
git clone https://github.com/psychip/berlin-hackathon
cd berlin-hackathonnpm installCreate a .env file in the root directory:
# ElevenLabs Configuration
XI_API_KEY=your_elevenlabs_api_key
AGENT_ID=your_elevenlabs_agent_id
# API Keys
SERPAPI_KEY=your_serpapi_key
OPENWEATHER_KEY=your_openweather_key
OPENEXCHANGERATES_KEY=your_openexchangerates_key
GPLACES_KEY=your_google_places_key
# Server Configuration
PORT=3388The application includes MaxMind GeoLite2 databases for IP geolocation:
db/GeoLite2-City.mmdb- City-level geolocationdb/GeoLite2-ASN.mmdb- ISP/Organization data
These are included in the repository for development purposes.
npm run build
node server.jsThe application will be available at http://localhost:3388
VOX/
├── src/ # Frontend source files
│ ├── app.js # Main application logic
│ ├── index.html # HTML template
│ ├── styles.css # Stylesheets
├── dist/ # Built/compiled files
│ ├── bundle.js # Webpack compiled bundle
│ ├── index.html # Production HTML
│ └── static/ # Static assets (sound effects)
├── content/ # Agent configuration
│ ├── system.md # System prompt and tool definitions
│ ├── drift.md # Critical reminders
│ ├── character.md # Character definitions
│ ├── greetings.json # Greeting templates
│ └── tool.md # Tool implementation guide
├── db/ # Databases
│ ├── GeoLite2-*.mmdb # MaxMind GeoIP databases
│ ├── api.json # API endpoint configurations
│ ├── currency.json # Currency data
│ └── lang.json # Language settings
├── server.js # Express.js backend server
├── token.py # Token counter utility
├── webpack.config.js # Webpack configuration
└── package.json # Project dependencies
VOX supports 4 languages with full localization:
- Turkish (tr) - Türkçe - Default for Turkey
- English (en) - English - Default for most regions
- German (de) - Deutsch - Default for Germany, Austria, Switzerland
- Spanish (es) - Español - Default for Spain and Latin America
Language is automatically detected from user's IP location and can be changed via the language selection screen on first launch.
- FFT Size: 256 (standard), 64 (low-end devices)
- Smoothing: 0.6 (standard), 0.25 (low-end)
- Speech Detection Threshold: 15
- Silence Detection: 800ms pause for sentence end
- Subtitle Speed: 75 characters per second
- UI Timeout: 5000ms (5 seconds) - configurable in
src/app.jsviaTOUCH_UI_TIMEOUT - Controls auto-hide after timeout, reappear on touch
- Circle Radius: 80px
- Audio Multiplier: 40 (standard), 15 (low-end)
- Color Speed: 10
- Glow Effect: 8 (disabled on low-end devices)
- Automatic device capability detection
- Low-end mode for devices with <8GB RAM
- Manual override:
?lowperf=true/false
- Real-time FFT analysis
- Circular spectrum display with rotation
- Speech activity detection with visual feedback
- Agent/user state differentiation
- Performance-adaptive rendering
- Animated image display with random placement
- Collision detection and smart layout
- Click to view full-size in modal
- Keyboard navigation (arrow keys)
- Automatic fade-out on disconnect
- Hover effects with scaling
- Automatic touch device detection
- Auto-hiding controls after 5 seconds
- Show on touch/tap
- Affects volume bar, call controls, topic display
- Intelligent sentence splitting (respects abbreviations like "Mr.", "Dr.")
- Dynamic display timing (30 chars/second)
- Automatic handling of transcription errors
- Shows current conversation topic
- Color-coded tags
- Hover to view (desktop) or touch to show (mobile)
- Persists across sessions
- Time-based greetings
- Multi-language support: Turkish, English, German, Spanish
- Location and timezone awareness
- Session history tracking
- Error handling with audio feedback
- Proactive image search for visual subjects
- Automatic tool triggering based on context
Agent Not Connecting
- Verify ElevenLabs API key and Agent ID
- Check network connectivity
- Confirm microphone permissions
Performance Issues
- Try low performance mode:
?lowperf=true - Close other audio applications
- Use supported browsers (Chrome, Firefox, Safari)
No Audio/Microphone
- Grant microphone permissions
- Check microphone is not stereo mix
- Verify no other application is using microphone
- Define tool in
content/system.mdwith trigger patterns and examples - Add API endpoint to
db/api.jsonif needed - Implement handler in
server.js(for server-side tools) - Add client-side handler in
src/app.jsif needed - Test tool across all supported languages
- Create language folder in
content/[language-code]/ - Add
agent.mdwith localized instructions - Add
greetings.jsonwith time-based greeting templates - Update
db/lang.jsonwith language configuration - Add language card to
src/index.html - Test all tools and responses in new language
Edit content/system.md - changes apply immediately after agent restart.
Modify TOUCH_UI_TIMEOUT constant in src/app.js (line 20).
- Chrome/Edge: Full support ✅
- Firefox: Full support ✅
- Safari: Full support ✅
- Mobile browsers: Touch-optimized ✅
This project is developed for educational and demonstration purposes as part of the {Tech:Europe} Berlin Hackathon 2025.
Built with ❤️ in 48 hours for the Berlin Hackathon