A Python-based web scraper designed to extract product information from Tokopedia, one of Indonesia's largest e-commerce platforms. This tool utilizes Playwright for browser automation to dynamically load and scrape data, and it saves the collected information into a structured CSV file for easy analysis.
- Scrapes product data from Tokopedia's search results.
- Extracts key product details:
- Product Name
- Price
- Shop Name
- Shop Location
- Product Rating
- Number of Items Sold
- Saves the scraped data into a CSV file named
tokopedia_products.csv. - Configurable search parameters through an environment file.
To run this scraper, you need the following installed:
- Python 3.8+
- All Python packages listed in
requirements.txt. The main dependencies are:playwright: For browser automation.pandas: For data handling and CSV export.python-dotenv: For managing environment variables.
Follow these steps to set up the project environment:
-
Clone the repository:
git clone <your-repository-url> cd <repository-directory>
-
Install Python dependencies: Create a virtual environment (recommended) and install the required packages.
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` pip install -r requirements.txt
-
Install Playwright browsers: Playwright requires browser binaries to be installed. Run the following command:
playwright install
You can configure the scraper's behavior by creating a .env file in the root of the project directory.
-
Create a file named
.env:touch .env -
Add the following configuration variables to the
.envfile. These values are examples; you can change them to suit your needs.# The product search query SEARCH_QUERY="mechanical keyboard" # The maximum number of search result pages to scrape MAX_PAGES=3
Once you have completed the installation and configuration steps, you can run the scraper.
Execute the main script from your terminal:
python main.pyThe script will launch a browser, navigate to Tokopedia, perform the search, and start scraping the data. A CSV file named tokopedia_products.csv will be created in the project's root directory with the scraped data.
This scraper is intended for educational purposes only. Please be responsible and respect Tokopedia's terms of service. Avoid making an excessive number of requests in a short period to prevent overloading their servers. The developers of this tool are not responsible for any misuse.