Skip to content

mbrell/webscraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mbrell's WEBSCRAPING Tool

A Rust-based web scraping tool that downloads complete web pages with all their assets (HTML, CSS, JavaScript, and images).

Features

  • Downloads HTML content from any URL
  • Automatically extracts and downloads CSS stylesheets
  • Downloads JavaScript files
  • Downloads all images from the page
  • Organizes downloaded content in a structured directory
  • Handles relative and absolute URLs
  • Async/await for efficient downloading

Requirements

  • Rust (latest stable version)
  • Cargo package manager

Installation

  1. Clone or download the source code
  2. Install dependencies:
    cargo build

Usage

  1. Edit the url variable in the code to target your desired website
  2. Run the tool:
    cargo run

The tool will:

  • Create a downloaded_content directory
  • Download the HTML page as index.html
  • Download CSS files with css_ prefix
  • Download JavaScript files with js_ prefix
  • Download images with img_ prefix

Example Output

Created directory: 'downloaded_content'
Downloading content from 'https://example.com/'...
Saved HTML content: downloaded_content/index.html
Downloading CSS: https://example.com/styles.css
Saved CSS: downloaded_content/css_styles.css
Downloading JavaScript: https://example.com/script.js
Saved JavaScript: downloaded_content/js_script.js
Downloading Image: https://example.com/logo.png
Saved image: downloaded_content/img_logo.png
Process completed! All files are in 'downloaded_content' directory.

Note

Make sure you have permission to scrape the target website and comply with robots.txt and terms of service.

About

Web scraping script.

Topics

Resources

License

Stars

Watchers

Forks

Languages