CFScraper API

A comprehensive scraper API service with FastAPI, SeleniumBase, and Cloudscraper for bypassing Cloudflare's anti-bot protection.

Features

FastAPI-based REST API with async support
Multiple scraper backends:
- CloudScraper for Cloudflare bypass
- SeleniumBase for JavaScript-heavy sites
Job queue system with Redis and in-memory options
Background job processing with status tracking
Database integration with SQLAlchemy
Health checks and monitoring

Installation

This project uses uv for dependency management. To get started:

# Install dependencies
uv sync

# Install development dependencies
uv sync --extra dev

Usage

Starting the Server

# Development server
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Production server
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000

API Endpoints

Health Check

GET /health

Create Scraping Job

POST /api/v1/scrape
Content-Type: application/json

{
  "url": "https://example.com",
  "scraper_type": "cloudscraper",
  "method": "GET",
  "headers": {},
  "timeout": 30
}

Get Job Status

GET /api/v1/jobs/{task_id}

List Jobs

GET /api/v1/jobs?status=completed&limit=10

Queue Status

GET /api/v1/queue/status

Demo Script

Run the demo script to see the API in action:

# Start the server first
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000

# In another terminal, run the demo
uv run python demo.py

Configuration

Environment variables can be set in a .env file:

DATABASE_URL=sqlite:///./cfscraper.db
REDIS_URL=redis://localhost:6379
MAX_CONCURRENT_JOBS=10
JOB_TIMEOUT=300
USE_IN_MEMORY_QUEUE=true

Architecture

Core Components

FastAPI Application (app/main.py)
- REST API with async support
- Health checks and monitoring
- Automatic database initialization
Scraper Classes (app/scrapers/)
- Base scraper interface
- CloudScraper implementation
- SeleniumBase implementation
- Factory pattern for scraper creation
Job Queue System (app/utils/queue.py)
- Abstract queue interface
- In-memory queue for development
- Redis queue for production
- Job status tracking
Database Models (app/models/)
- SQLAlchemy models for jobs and results
- Job status tracking
- Result storage
Background Processing (app/utils/executor.py)
- Async job execution
- Concurrent job handling
- Error handling and retries

Testing

Run the test suite:

uv run pytest tests/ -v

Development

Project Structure

cfscraper/
├── alembic/           # Database migration scripts
├── app/
│   ├── api/           # API routes
│   ├── core/          # Core configuration
│   ├── models/        # Database models
│   ├── scrapers/      # Scraper implementations
│   ├── utils/         # Utilities (queue, executor)
│   └── main.py        # FastAPI application
├── docs/              # Documentation
├── examples/          # Demo scripts
├── tests/             # Test files
├── pyproject.toml     # Project configuration
└── README.md          # This file

Adding New Scrapers

Create a new scraper class inheriting from BaseScraper
Implement the required methods
Register it in the ScraperFactory
Add it to the ScraperType enum

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.augment/rules		.augment/rules
alembic		alembic
app		app
docker		docker
docs		docs
examples		examples
k8s		k8s
load_tests		load_tests
monitoring		monitoring
scripts		scripts
security		security
tests		tests
.dockerignore		.dockerignore
.env.dev		.env.dev
.env.example		.env.example
.env.prod		.env.prod
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CFScraper API

Features

Installation

Usage

Starting the Server

API Endpoints

Health Check

Create Scraping Job

Get Job Status

List Jobs

Queue Status

Demo Script

Configuration

Architecture

Core Components

Testing

Development

Project Structure

Adding New Scrapers

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

lowzj/cfscraper

Folders and files

Latest commit

History

Repository files navigation

CFScraper API

Features

Installation

Usage

Starting the Server

API Endpoints

Health Check

Create Scraping Job

Get Job Status

List Jobs

Queue Status

Demo Script

Configuration

Architecture

Core Components

Testing

Development

Project Structure

Adding New Scrapers

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages