Tokopedia Product Scraper

A Python-based web scraper designed to extract product information from Tokopedia, one of Indonesia's largest e-commerce platforms. This tool utilizes Playwright for browser automation to dynamically load and scrape data, and it saves the collected information into a structured CSV file for easy analysis.

Features

Scrapes product data from Tokopedia's search results.
Extracts key product details:
- Product Name
- Price
- Shop Name
- Shop Location
- Product Rating
- Number of Items Sold
Saves the scraped data into a CSV file named tokopedia_products.csv.
Configurable search parameters through an environment file.

Requirements

To run this scraper, you need the following installed:

Python 3.8+
All Python packages listed in requirements.txt. The main dependencies are:
- playwright: For browser automation.
- pandas: For data handling and CSV export.
- python-dotenv: For managing environment variables.

Installation

Follow these steps to set up the project environment:

Clone the repository:

git clone <your-repository-url>
cd <repository-directory>

Install Python dependencies: Create a virtual environment (recommended) and install the required packages.

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt

Install Playwright browsers: Playwright requires browser binaries to be installed. Run the following command:
```
playwright install
```

Configuration

You can configure the scraper's behavior by creating a .env file in the root of the project directory.

Create a file named .env:
```
touch .env
```

Add the following configuration variables to the .env file. These values are examples; you can change them to suit your needs.

# The product search query
SEARCH_QUERY="mechanical keyboard"

# The maximum number of search result pages to scrape
MAX_PAGES=3

Usage

Once you have completed the installation and configuration steps, you can run the scraper.

Execute the main script from your terminal:

python main.py

The script will launch a browser, navigate to Tokopedia, perform the search, and start scraping the data. A CSV file named tokopedia_products.csv will be created in the project's root directory with the scraped data.

Disclaimer

This scraper is intended for educational purposes only. Please be responsible and respect Tokopedia's terms of service. Avoid making an excessive number of requests in a short period to prevent overloading their servers. The developers of this tool are not responsible for any misuse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tokopedia Product Scraper

Features

Requirements

Installation

Configuration

Usage

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
tiktok		tiktok
tokopedia		tokopedia
.env		.env
README.md		README.md
requirements.txt		requirements.txt

agumfanani19/website_scraping

Folders and files

Latest commit

History

Repository files navigation

Tokopedia Product Scraper

Features

Requirements

Installation

Configuration

Usage

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages