A Model Context Protocol (MCP) server that enables Claude Desktop to perform advanced web scraping and crawling operations. Extract structured data, analyze website architectures, and discover content relationships - all through natural conversation with Claude.
- Static & Dynamic Scraping: Handle both regular HTML and JavaScript-rendered pages
- Website Crawling: Discover and map entire website structures
- Data Extraction: Extract specific elements using CSS selectors
- Batch Operations: Process multiple URLs efficiently
- Link Analysis: Understand how pages connect and reference each other
- Python 3.10 or higher
- WSL2 with Ubuntu (for Windows users)
- Claude Desktop application
uv
package manager
git clone https://github.com/samirsaci/mcp-webscraper.git
cd mcp-webscraper
If you don't have uv installed:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Initialize the virtual environment
uv init .
uv add "mcp[cli]"
source .venv/bin/activate
uv pip install -r requirements.txt
Do not forget to install playwright browser to scrape dynamic content
uv run playwright install chromium
Run the test script to verify everything works using a website that loves to be scrapped https://books.toscrape.com/
:
uv run python test_local.py
Expected Output:
Static Scraping Success: True
HTML length: 51294
---------
Dynamic Scraping Success: True
HTML length: 51004
---------
Testing Crawler...
Crawler Success: True
Pages crawled: 5
Pages discovered: 437
Failed URLs: 0
First 3 pages discovered:
1. All products | Books to Scrape - Sandbox
URL: https://books.toscrape.com/
Links found: 73
Depth: 0
2. All products | Books to Scrape - Sandbox
URL: https://books.toscrape.com/index.html
Links found: 73
Depth: 1
3. Books |
Books to Scrape - Sandbox
URL: https://books.toscrape.com/catalogue/category/books_1/index.html
Links found: 73
Depth: 1
Statistics:
Total unique links: 104
Max depth reached: 1
Avg load time: 0.21s
For Windows Users with WSL
- Locate your Claude Desktop configuration file:
File -> Settings -> Edit Config
- Add the WebScrappingServer configuration
{
"mcpServers": {
"WebScrapingServer": {
"command": "wsl",
"args": [
"-d",
"Ubuntu",
"bash",
"-lc",
"cd ~/path/to/mcp-webscraper && uv run --with mcp[cli] mcp run scrapping.py"
]
}
}
}
Important: Replace ~/path/to/mcp-webscraper with the actual path to your project folder in WSL. To find your WSL path:
pwd
After updating the configuration:
- Completely quit Claude Desktop (not just close the window)
- Start Claude Desktop again
- Look for the 🔌 icon in the text input area
- Click it to verify "WebScrapingServer" appears
Once configured, you can ask Claude to:
"Scrape the homepage of example.com and tell me what you find"
Please help me to crawl my personal blog https://yourblog.com with a limit of 150 pages.
I would like to understand how articles are referring to each other.
Can you help me to perform this type of analysis?
mcp-webscraper/
├── models/
│ └── scraping_models.py # Pydantic models for data validation
├── utils/
│ └── web_scraper.py # Core WebScraper class
├── scrapping.py # MCP server implementation
├── test_local.py # Local testing script
├── requirements.txt # Python dependencies
├── README.md # This file
└── scraping_server.log # Server logs (created at runtime)
The server exposes these tools to Claude:
scrape_url
: Get raw HTML from any webpageextract_data
: Extract multiple elements using CSS selectorsextract_first
: Get a single element from a pagebatch_scrape
: Process multiple URLscrawl_website
: Discover and map website structure
*If the server does not appear in Claude, try first to restart Claude Desktop by terminating its processus.`
If this does not work, try to
- Check the log file:
cat scraping_server.log
- Verify the path in config matches your WSL path:
pwd
The output should match what you have in your config file.
- Test the server directly:
uv run python scrapping.py
If JavaScript scraping fails, try to reinstall the browser
uv run playwright install chromium
Ensure WSL2 is properly installed:
Run this in Windows PowerShell opened as Administrator
wsl --status
MIT License - feel free to use this in your own projects!
Senior Supply Chain and Data Science consultant with international experience working on Logistics and Transportation operations. For consulting or advising on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting or LinkedIn