Skip to content

Latest commit

 

History

History

README.md

E-Commerce Scrapingcourse (Python)

Basic e-commerce scraper using scrapingcourse.com.

Run on Intuned

Run on Intuned

APIs

API Description
list Scrapes products from the e-commerce store with pagination support. Automatically triggers details API for each product using extend_payload
details Extracts detailed information for a specific product including price, SKU, category, descriptions, images (uploaded to S3), sizes, colors, and variants

Getting started

Install dependencies

uv sync

If the intuned CLI is not installed, install it globally:

npm install -g @intuned/cli

After installing dependencies, intuned command should be available in your environment.

Prepare the project

Before running any API, provision and deploy the project first.

intuned dev provision
intuned dev deploy

Run an API

intuned dev run api list .parameters/api/list/default.json
intuned dev run api details .parameters/api/details/default.json

Project structure

/
├── api/
│   ├── list.py                       # Scrapes product list with pagination
│   └── details.py                    # Extracts detailed product information
├── utils/
│   └── types_and_schemas.py          # Type definitions and Pydantic models
├── intuned-resources/
│   └── jobs/
│       ├── list.job.jsonc            # Job definition for list API
│       └── details.job.jsonc         # Job definition for details API
├── .parameters/api/                  # Test parameters
├── Intuned.jsonc                     # Project config
├── pyproject.toml                    # Python dependencies
└── README.md

Key features

  • Automatic pagination: The list API automatically handles pagination to scrape multiple pages
  • Dynamic API chaining: Uses extend_payload to automatically trigger the details API for each product found
  • S3 file upload: Product images are automatically uploaded to S3 using save_file_to_s3
  • Job configuration: Configured as a job template with retry logic and concurrent request handling

Related