Skip to content

CrawlyOEG/WebCrawler

Repository files navigation

WebCrawler

WebCrawler is a library to obtain articles related to light pollution.

© 2018 Jorge Galán - OEG-UPM. Available under Apache License 2.0. See LICENSE.

Features

  • Download any article from ZENODO and DARKSKY. While Zenodo is a website with a many articles on different topics, is focused on light pollution.
  • Get all information of this articles in a JSON, included tittle, author, abstract, authors, keyWords, doi and licence.

Requirements

  • Make sure you have the latest version of the browser Google Chrome browser on your computer.
  • You need a persistent internet connection

Download

Download a version of the WebCrawler's from our releases page, that includes a jar and a exe, which needs to be given permissions on your machine.

Usage

WebCrawler provides a command line application:

$java -jar WebCrawler.jar --help 
usage: PDFExtractor [-h] [-i <inputFolder>] [-k <keywords>] [-s
       <sourceWeb>]
Mised argument
 -h,--help                  Indicate how yo use the program.
 -i,--input <inputFolder>   [REQUIRED] Input folder where download the
                            content. Ex: /Users/jesus/aFolder
 -k,--keyword <keywords>    [REQUIRED] Keyword to search the PDF files
 -s,--sources <sourceWeb>   [OPTIONAL] Choose the information source.
                            (ZENODO, DARKSKY, ALL). Default: ALL

Building from Source

Clone this repo and run:

mvn clean compile assembly:single

Then, get your own version of the jar in the project's target folder.

OEG Laboratory STARS4ALL

About

Obtain articles related to light pollution

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages