Adding support for local file scraping #569

jonocodes · 2024-07-27T16:23:36Z

jonocodes
Jul 27, 2024

It would be nice if this could be used 'offline'. So I could scrape something like 'file://home/me/site/mypage.html'
Or perhaps a way to feed the scraper raw html instead of a url.

jonocodes · 2024-07-28T00:52:39Z

jonocodes
Jul 28, 2024
Author

For now I have a silly workaround, but it still visits a website even if it ignores the content.

class BodyScraperPlugin {

  constructor(body) {
    this.body = body;
  }

	apply(registerAction) {
		registerAction('afterResponse', async ({response}) => {
      
      if (response.headers['content-type'].includes('text/html')) {
        console.log("html AFTER REPONSE " + response.url)

        return this.body
      }

      return response.body
    });
	}
}


    const options = {
      urls: ['https://nodejs.org'],	// dummy since it does not use this content
      directory: saveDir,
      plugins: [ new BodyScraperPlugin(article.content) ]
    };

    var result = await scrape(options);

0 replies

aivus · 2024-07-28T18:04:05Z

aivus
Jul 28, 2024
Maintainer

Hello @jonocodes

file protocol is not supported by got module we use for http requests which means we doesn't support it as well.

I would recommend to run a http server in the directory with needed files.
Simplest way to do it run of python -m SimpleHTTPServer 9000

2 replies

jonocodes Jul 28, 2024
Author

Yeah, behind a server this works. But I'm trying to do this in a self contained script against local files. Having to start and stop a server just to read a text file is too much overhead for my simple task.

aivus Aug 21, 2024
Maintainer

Module, as per name website, is not supposed to work with local files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adding support for local file scraping #569

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Adding support for local file scraping #569

Uh oh!

Uh oh!

jonocodes Jul 27, 2024

Replies: 2 comments · 2 replies

Uh oh!

jonocodes Jul 28, 2024 Author

Uh oh!

Uh oh!

aivus Jul 28, 2024 Maintainer

Uh oh!

jonocodes Jul 28, 2024 Author

Uh oh!

aivus Aug 21, 2024 Maintainer

jonocodes
Jul 27, 2024

Replies: 2 comments 2 replies

jonocodes
Jul 28, 2024
Author

aivus
Jul 28, 2024
Maintainer

jonocodes Jul 28, 2024
Author

aivus Aug 21, 2024
Maintainer