Property Scraper

Scrapes property listings from Zonaprop, Argenprop, and MercadoLibre, saving all details to a CSV and sending new listings via Telegram.

Repo Structure

src/ — All main code modules (scraprop.py, scraper.py, utils.py)
tests/ — For future test scripts
urls_to_scrap.txt — List of search URLs
outputs/scraped_properties.csv — All scraped property data
outputs/seen.txt — Tracks already-notified properties
.env — Environment variables for Telegram
outputs/ — All output files (CSV, logs)

Features

Multi-source scraping: Zonaprop, Argenprop, and MercadoLibre
LLM Analysis: AI-powered property scoring and analysis using Google Gemini
Smart filtering: Penalizes commercial properties ("local", "deposito") with -15 points
Live Google Sheets: Real-time updates to shareable Google Sheets (optional)
CSV Export: All scraped data saved to outputs/scraped_properties.csv
Telegram notifications: New high-scoring properties sent via Telegram
Duplicate prevention: Tracks seen properties to avoid spam
Detailed extraction: Price, expenses, neighbourhood, surface, rooms, descriptions, and more

Setup

Clone the repo
Install dependencies (preferably in a conda or venv):
```
pip install -r requirements.txt
```

Set up environment variables in a .env file:

# Required for Telegram notifications
TELEGRAM_BOT_ID=your_bot_id
TELEGRAM_ID=your_telegram_user_id

# Required for LLM analysis
GEMINI_API_KEY=your_gemini_api_key

# Optional: Google Sheets integration (see setup guide)
GOOGLE_SHEETS_CREDENTIALS_FILE=credentials.json
GOOGLE_SHEETS_SHARE_EMAIL=your-email@gmail.com
# GOOGLE_SHEET_ID=your_existing_sheet_id

Add search URLs to urls_to_scrap.txt (one per line)

Google Sheets Integration (Optional)

Set up live Google Sheets for real-time property data viewing:

Follow the detailed setup guide: See setup_google_sheets.md
Test the integration: Run python test_google_sheets.py
Features:
- Automatic updates with every scraper run
- Data sorted by LLM score (best properties first)
- Shareable with colleagues or family
- Includes timestamps and all analysis data
- Shared history: Seen URLs tracked in Google Sheet (no more duplicate notifications)

Running

To run the main workflow:
```
python src/scraprop.py
```
To test scrapers for each source:
```
python src/scraper.py
```

Output

CSV Export

All scraped properties are saved to outputs/scraped_properties.csv with columns:

Basic: url, price, expenses, neighbourhood, surface, rooms, description
Analysis: score, score_breakdown, llm_neighbourhood, llm_surface_m2, etc.

Google Sheets (if configured)

Live updates: Real-time data accessible from anywhere
Sorted by score: Best properties appear at the top
Shareable: Easy to share with others
Timestamped: Last update time included

Telegram Notifications

High-scoring new properties are sent with LLM analysis:

⭐ SCORE: 23

📊 Score Breakdown:
  • Location (Belgrano): +10
  • Ground Floor: +10
  • Outdoor Space: +3

🏠 Analysis:
  📍 Neighborhood: Belgrano
  🌳 Ground Floor: Yes
  🌿 Outdoor Space: Yes
  📏 Surface: 80m²
  💰 Price: $500,000

📍 Zona: Belgrano
💰 Precio: $500.000
📏 Sup.: 80 m²
🏠 Ambientes: 3

https://www.zonaprop.com.ar/propiedades/...

Customization

Add or remove search URLs in urls_to_scrap.txt
Adjust scraping logic in src/scraper.py for new fields
Change CSV filename in src/scraprop.py if needed

Cron Example

To run every 6 hours and log output:

30 */6 * * * /path/to/python /path/to/scraprop/src/scraprop.py >> /path/to/scraprop/outputs/logs/scraprop-cron.log

Tests

Place test scripts in the tests/ directory.
(Coming soon: example test scripts for scrapers and utilities)

For questions or improvements, open an issue or PR.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
outputs		outputs
src		src
.env		.env
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
property_explorer.ipynb		property_explorer.ipynb
requirements.txt		requirements.txt
setup_cron.sh		setup_cron.sh
setup_google_sheets.md		setup_google_sheets.md
test_google_sheets.py		test_google_sheets.py
test_main_scraper.py		test_main_scraper.py
test_real_websites.py		test_real_websites.py
urls_to_scrap.txt		urls_to_scrap.txt
zonaprop_screenshot.png		zonaprop_screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Property Scraper

Repo Structure

Features

Setup

Google Sheets Integration (Optional)

Running

Output

CSV Export

Google Sheets (if configured)

Telegram Notifications

Customization

Cron Example

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Cerebrock/scraprop

Folders and files

Latest commit

History

Repository files navigation

Property Scraper

Repo Structure

Features

Setup

Google Sheets Integration (Optional)

Running

Output

CSV Export

Google Sheets (if configured)

Telegram Notifications

Customization

Cron Example

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages