This package contains three main tools for working with proxies:
- proxy_scraper.py - Scrapes proxies from websites
- proxy_checker.py - Tests proxy lists for working proxies
- proxy_tool.py - Integrated tool that combines both functions
- proxy_sources.txt - Sample URLs for proxy sources
pip install -r requirements.txt
python proxy_scraper.py
Interactive mode - you'll be prompted to enter URLs or load from file.
python proxy_checker.py
Checks proxies in proxy_list.txt
by default.
# Interactive mode - prompts you to choose scrape/check/both
python proxy_tool.py
# Scrape and check in one go (automatically uses proxy_sources.txt if no URLs provided)
python proxy_tool.py --mode both
# Scrape only (automatically uses proxy_sources.txt if available)
python proxy_tool.py --mode scrape
# Scrape only with specific URLs
python proxy_tool.py --mode scrape --urls https://example.com/proxies
# Check only
python proxy_tool.py --mode check --proxy-file my_proxies.txt
# Use URLs from file
python proxy_tool.py --mode both --url-file proxy_sources.txt
# Custom settings
python proxy_tool.py --mode both --max-workers 100 --timeout 5 --max-pages 10
--mode
: scrape, check, or both--urls
: Space-separated list of URLs to scrape--url-file
: File containing URLs (one per line)--proxy-file
: File containing proxies to check--max-workers
: Number of threads (default: 50)--timeout
: Timeout in seconds (default: 10)--max-pages
: Max pages to scrape per site (default: 5)--delay
: Delay between scraping requests (default: 1)
The tools create the following files:
proxy_list/proxy_list.txt
- All scraped proxies (no duplicates)proxy_list/scrape_summary_YYYYMMDD_HHMMSS.json
- Scraping statistics
working_http_proxies_YYYYMMDD_HHMMSS.txt
working_https_proxies_YYYYMMDD_HHMMSS.txt
working_socks4_proxies_YYYYMMDD_HHMMSS.txt
working_socks5_proxies_YYYYMMDD_HHMMSS.txt
failed_proxies_YYYYMMDD_HHMMSS.txt
- Automatic pagination detection
- Multithreaded scraping
- Regex-based proxy extraction
- Respectful scraping with delays
- Automatic duplicate removal
- Single output file for easy use
- Tests all proxy types (HTTP/HTTPS/SOCKS4/SOCKS5)
- Multithreaded checking for speed
- Real-time progress display
- Automatic duplicate removal before checking
- Detailed success rate statistics
- Separate output files by type
-
Easy start - Interactive mode:
python proxy_tool.py
Then choose: 1=Scrape, 2=Check, 3=Both
-
Scrape proxies from multiple sources (automatically uses proxy_sources.txt):
python proxy_tool.py --mode scrape
-
Check the scraped proxies:
python proxy_tool.py --mode check
-
Or do both in one command:
python proxy_tool.py --mode both --max-workers 100
- Start with fewer workers (10-20) for scraping to be respectful to websites
- Use more workers (50-100) for checking since it's testing connectivity
- Adjust timeout based on your internet speed and proxy locations
- Some websites may block automated scraping - use delays and rotate User-Agents if needed
- The tools automatically handle duplicates and invalid proxy formats
- Place your proxy source URLs in
proxy_sources.txt
for automatic loading - The scraper saves to
proxy_list.txt
which the checker automatically uses
- Import errors: Run
pip install -r requirements.txt
- No proxies found: Check if URLs are accessible and contain proxy lists
- Slow checking: Reduce timeout or increase max-workers
- Memory issues: Reduce max-workers or process smaller batches