This project is still in the development phase. It aims to create a Telegram web app bot called "Sannie." Sannie crawls the BBC Burmese website to display news content, allowing Telegram users to read the news without leaving the app.
A Telegram account is required.
- Direct Link: Chat the bot directly
- Search Method: Find the bot in Telegram's search bar:
@presenter_sannie_bot- Inline Button - Interactive buttons within messages
- Keyboard Button - Custom keyboard for easy navigation
- Inline Mode - Search and share content directly from any chat
/start- greet and return the main web app/help- describe how to use this bot/keyboard- return the keyboard button
Once the bot is started, it will automatically greet with a direct link to the website. The user can then:
- Enter a single link to read news content directly
- Browse by topic: Choose a topic, then a page, then copy a link of a content to read
The webscraper can
- scrape all topics (all pages of each topic) from BBC Burmese,
- scrape Burmese contents with a filter,
- export scraped data in spreadsheet,
- write spreadsheet data to Local DynamoDB,
- store current URL in redis cache for internet loss recovery.
- To write spreadsheet data to cloud DynamoDB
- To use a modular approach than a single scraper script
- To make the repository more compact
/caching prototypes- Development and testing files for caching system/db- DynamoDB Local database files and scripts/docs- Frontend files for GitHub Pages hosting (more info inflow.md)/img- Project images and assets/notebooks- Jupyter notebooks for webscraper development and documentation/spreadsheets- Exported data (ignored in version control)/telegram-bot- Telegram bot scripts (requires.envfile for tokens)app.py- Main bot applicationcredentials.py- Environment variable handlerDockerfile- Bot container configurationProcfile- Bot deployment configuration for Railwayrequirements.txt- Bot-specific dependencies
/webscraper- Main Python web scraping scripts and modules/modules- Modular scraping scripts
api.py- FastAPI server for web scraping endpointsdocker-compose.yml- Orchestrates all Docker services (FastAPI, Redis, Telegram Bot)Dockerfile- FastAPI app container configurationflow.md- Control flow documentationProcfile- FastAPI app deployment configuration for Railwaypyproject.toml- Project configuration and dependenciesrequirements.txt- All dependencies
- Web Scraper Development - Check all notebooks in the
/notebooksfolder for detailed documentation on how the customized web-crawler evolved from scratch - Integration - The web scraper is combined with two additional components:
- Frontend (web-hosted with GitHub Pages)
- Telegram bot (created to use the hosted frontend)
- Local Development Setup - Uses local Redis with manual execution of Python scripts:
api.py- FastAPI server for web scraping endpointstelegram-bot/app.py- Telegram bot application
- Deployment Configuration - Railway hosts FastAPI, Telegram bot, and Redis with required environment variables set
- Testing and Verification - Test FastAPI endpoints via
<deployment-URL>/docsor<deployment-URL>/redocand verify Redis cache
- Start FastAPI Backend
# Option 1: Direct Python script
python api.py
# Option 2: CLI
uvicorn api:app --reload- FastAPI will run on http://localhost:8000
- API documentation available at http://localhost:8000/docs
- Start Redis Cache: Redis-client must be already installed on the local device
# Open Ubuntu terminal and run:
redis-cli- Start Local Frontend Server
# Navigate to "/docs"
cd docs
# Start HTTP server (port 9000 is arbitrary - any port can be used)
python -m http.server 9000- Test Web App: Test web app locally at http://localhost:9000 by choosing topic → page → content → article
- Start/Test Telegram Bot
# Navigate to "/telegram-bot"
cd telegram-bot
# Start Telegram bot
python app.py- Start DynamoDB (Optional)
DynamoDB_init.bat- Start All Services: Use Docker Compose to start FastAPI, Redis, and Telegram Bot
docker-compose up --build- FastAPI will run on http://localhost:8000
- API documentation available at http://localhost:8000/docs
- Start Local Frontend Server
# Navigate to "/docs"
cd docs
# Start HTTP server (port 9000 is arbitrary - any port can be used)
python -m http.server 9000- Test All Services: Test web app locally at http://localhost:9000 by choosing topic → page → content → article
- Create Railway Account: Sign up at railway.app
- Connect GitHub Repository: Link GitHub repo to Railway
- Deploy FastAPI: Railway automatically detects and deploys the FastAPI app
- Add Redis Service: Create a Redis database service in Railway
- Deploy Telegram Bot: Create separate Railway service for the bot
- Set Environment Variables:
REDIS_URL- Redis connection string from RailwayBOT_TOKEN- Telegram bot tokenBOT_USERNAME- Bot username
NOTE: The Python scripts (api.py and telegram-bot/app.py) work seamlessly for both local and cloud deployment without requiring endpoint or API URL modifications.
To install DynamoDB locally, check here.
- (Optional Config) If the compressed file is downloaded, extract it and move to "C:".
To run DynamoDB locally, JDK 17 is recommended.
- (Optional Config) Place it in the specific directory such that java.exe can be used as follows:
"C:\Program Files\Java\jdk-17\bin\java.exe"To install Redis for client, check here. To install Ubuntu on Windows with WSL, check here.
Additional libraries and modules may also need to be installed.
- Create a virtual environment
# Option 1: Python
python -m venv <env-name>
# Option 2: Conda
conda create --name <env-name>- Activate the virtual environment
# Option 1: Python virtual environment
<env-name>\Scripts\activate
# Option 2: Conda virtual environment
conda activate "C:\Users\<pc-username>\anaconda3\envs\<env-name>"- Install uv
pip install uv- Navigate to the working directory
- Install dependencies
uv pip install -r requirements.txt- Download and install Docker Desktop
- Install dependencies
docker-compose up --build- Get bot credentials from @BotFather on Telegram
- Create a
.envfile in the root with bot credentials
BOT_TOKEN=actual_bot_token_here
BOT_USERNAME=bot_username_hereThis project is intended for educational purposes.