Skip to content

Documentation

Maharishi edited this page Jun 22, 2025 · 2 revisions

Function : url_to_llm_text(url)

url (str): url to scrape.
keep_images (bool): keep image links. If False will remove image links from the text if image link are not required while scraping, saving tokens to be processed by LLM. Default True
remove_svg_image (bool): remove .svg image. usually .svg files are not required while scraping. default True
remove_gif_image (bool): remove .gif image. usually .gif files are not required while scraping. default True
remove_image_types (list): add any image extensions which you want to remove inside a list. eg: ['.png', '.jpg']. Default []
keep_webpage_links (bool): keep webpage links. if scraping job does not require links then you can remove them to reduce input token count to LLM. Default True
remove_script_tag (bool): True
remove_style_tag (bool): True
remove_tags (list): list of tags to be remove. Default []
Clone this wiki locally