Replies: 2 comments 2 replies
-
For now I have a silly workaround, but it still visits a website even if it ignores the content. class BodyScraperPlugin {
constructor(body) {
this.body = body;
}
apply(registerAction) {
registerAction('afterResponse', async ({response}) => {
if (response.headers['content-type'].includes('text/html')) {
console.log("html AFTER REPONSE " + response.url)
return this.body
}
return response.body
});
}
}
const options = {
urls: ['https://nodejs.org'], // dummy since it does not use this content
directory: saveDir,
plugins: [ new BodyScraperPlugin(article.content) ]
};
var result = await scrape(options); |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hello @jonocodes
I would recommend to run a http server in the directory with needed files. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It would be nice if this could be used 'offline'. So I could scrape something like 'file://home/me/site/mypage.html'
Or perhaps a way to feed the scraper raw html instead of a url.
Beta Was this translation helpful? Give feedback.
All reactions