Scrapes property listings from Zonaprop, Argenprop, and MercadoLibre, saving all details to a CSV and sending new listings via Telegram.
src/— All main code modules (scraprop.py,scraper.py,utils.py)tests/— For future test scriptsurls_to_scrap.txt— List of search URLsoutputs/scraped_properties.csv— All scraped property dataoutputs/seen.txt— Tracks already-notified properties.env— Environment variables for Telegramoutputs/— All output files (CSV, logs)
- Multi-source scraping: Zonaprop, Argenprop, and MercadoLibre
- LLM Analysis: AI-powered property scoring and analysis using Google Gemini
- Smart filtering: Penalizes commercial properties ("local", "deposito") with -15 points
- Live Google Sheets: Real-time updates to shareable Google Sheets (optional)
- CSV Export: All scraped data saved to
outputs/scraped_properties.csv - Telegram notifications: New high-scoring properties sent via Telegram
- Duplicate prevention: Tracks seen properties to avoid spam
- Detailed extraction: Price, expenses, neighbourhood, surface, rooms, descriptions, and more
- Clone the repo
- Install dependencies (preferably in a conda or venv):
pip install -r requirements.txt
- Set up environment variables in a
.envfile:# Required for Telegram notifications TELEGRAM_BOT_ID=your_bot_id TELEGRAM_ID=your_telegram_user_id # Required for LLM analysis GEMINI_API_KEY=your_gemini_api_key # Optional: Google Sheets integration (see setup guide) GOOGLE_SHEETS_CREDENTIALS_FILE=credentials.json GOOGLE_SHEETS_SHARE_EMAIL=your-email@gmail.com # GOOGLE_SHEET_ID=your_existing_sheet_id
- Add search URLs to
urls_to_scrap.txt(one per line)
Set up live Google Sheets for real-time property data viewing:
- Follow the detailed setup guide: See
setup_google_sheets.md - Test the integration: Run
python test_google_sheets.py - Features:
- Automatic updates with every scraper run
- Data sorted by LLM score (best properties first)
- Shareable with colleagues or family
- Includes timestamps and all analysis data
- Shared history: Seen URLs tracked in Google Sheet (no more duplicate notifications)
- To run the main workflow:
python src/scraprop.py
- To test scrapers for each source:
python src/scraper.py
All scraped properties are saved to outputs/scraped_properties.csv with columns:
- Basic: url, price, expenses, neighbourhood, surface, rooms, description
- Analysis: score, score_breakdown, llm_neighbourhood, llm_surface_m2, etc.
- Live updates: Real-time data accessible from anywhere
- Sorted by score: Best properties appear at the top
- Shareable: Easy to share with others
- Timestamped: Last update time included
High-scoring new properties are sent with LLM analysis:
⭐ SCORE: 23
📊 Score Breakdown:
• Location (Belgrano): +10
• Ground Floor: +10
• Outdoor Space: +3
🏠 Analysis:
📍 Neighborhood: Belgrano
🌳 Ground Floor: Yes
🌿 Outdoor Space: Yes
📏 Surface: 80m²
💰 Price: $500,000
📍 Zona: Belgrano
💰 Precio: $500.000
📏 Sup.: 80 m²
🏠 Ambientes: 3
https://www.zonaprop.com.ar/propiedades/...
- Add or remove search URLs in
urls_to_scrap.txt - Adjust scraping logic in
src/scraper.pyfor new fields - Change CSV filename in
src/scraprop.pyif needed
To run every 6 hours and log output:
30 */6 * * * /path/to/python /path/to/scraprop/src/scraprop.py >> /path/to/scraprop/outputs/logs/scraprop-cron.log- Place test scripts in the
tests/directory. - (Coming soon: example test scripts for scrapers and utilities)
For questions or improvements, open an issue or PR.