Skip to content

agumfanani19/website_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokopedia Product Scraper

A Python-based web scraper designed to extract product information from Tokopedia, one of Indonesia's largest e-commerce platforms. This tool utilizes Playwright for browser automation to dynamically load and scrape data, and it saves the collected information into a structured CSV file for easy analysis.

Features

  • Scrapes product data from Tokopedia's search results.
  • Extracts key product details:
    • Product Name
    • Price
    • Shop Name
    • Shop Location
    • Product Rating
    • Number of Items Sold
  • Saves the scraped data into a CSV file named tokopedia_products.csv.
  • Configurable search parameters through an environment file.

Requirements

To run this scraper, you need the following installed:

  • Python 3.8+
  • All Python packages listed in requirements.txt. The main dependencies are:
    • playwright: For browser automation.
    • pandas: For data handling and CSV export.
    • python-dotenv: For managing environment variables.

Installation

Follow these steps to set up the project environment:

  1. Clone the repository:

    git clone <your-repository-url>
    cd <repository-directory>
  2. Install Python dependencies: Create a virtual environment (recommended) and install the required packages.

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    pip install -r requirements.txt
  3. Install Playwright browsers: Playwright requires browser binaries to be installed. Run the following command:

    playwright install

Configuration

You can configure the scraper's behavior by creating a .env file in the root of the project directory.

  1. Create a file named .env:

    touch .env
    
  2. Add the following configuration variables to the .env file. These values are examples; you can change them to suit your needs.

    # The product search query
    SEARCH_QUERY="mechanical keyboard"
    
    # The maximum number of search result pages to scrape
    MAX_PAGES=3

Usage

Once you have completed the installation and configuration steps, you can run the scraper.

Execute the main script from your terminal:

python main.py

The script will launch a browser, navigate to Tokopedia, perform the search, and start scraping the data. A CSV file named tokopedia_products.csv will be created in the project's root directory with the scraped data.

Disclaimer

This scraper is intended for educational purposes only. Please be responsible and respect Tokopedia's terms of service. Avoid making an excessive number of requests in a short period to prevent overloading their servers. The developers of this tool are not responsible for any misuse.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages