Fetch webpages

Fetch html and assets of webpages and save them on disk for later browsing and retrieval.

Run

Using python

Prerequisites

Install python3
Add packages:

pip install -r requirements.txt

Inside program folder, run main.py directly from command-line:

$ cd program
$ python3 main.py https://www.google.com https://www.github.com --metadata --assets

Using Docker

Inside program folder, run using docker:

$ cd program
$ docker image build -t fetch-webpages:latest .
$ docker container run --rm -v ${PWD}:/fetch fetch-webpages:latest https://www.google.com https://www.github.com --metadata --assets

Using bash

In the root folder, run using bash script:

$ ./fetch https://www.google.com https://www.github.com --metadata --assets

Parameters

Include --metadata to include statistics about the webpage loaded
Include --assets to download assets (img, css, js, etc.) to the same folder (Note: currently only downloading img due to lack of time)

Results

Each webpage is stored as a separate folder in output in the current directory.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
program		program
.gitignore		.gitignore
README.md		README.md
fetch		fetch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fetch webpages

Run

Using python

Prerequisites

Using Docker

Using bash

Parameters

Results

About

Uh oh!

Releases

Packages

Languages

pulkitmittal/fetch-webpages

Folders and files

Latest commit

History

Repository files navigation

Fetch webpages

Run

Using python

Prerequisites

Using Docker

Using bash

Parameters

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages