Skip to content

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Jul 4, 2023

  • Purify HTML to avoid potential XSS attacks.
  • Remove external URLS to protect users' privacy.

Details

The sanitize function:

  • Removes harmful code like the one in the example.
  • Remove internal links. It removes the "href" attribute.
  • Replace image sources with the direct base64 image data from the backend image proxy, only for valid image formats. Invalid format images are removed from the DOM.
  • Valid images are images whose source URL ends with a valid extension: ["png", "PNG", "jpg", "JPG", "jpeg", "JPEG"]. The backend is supposed to support only PNG and JPG images.
  • Other HTML tags like <embed> are removed.

The sanitize function is being tested with this sample torrent description:

Harmful script (the button show not show the alert):

<p>Click the button to display an alert box.</p>

<button onclick="alert('Hello! I am an alert box!')">Try it</button>

Valid PNG image in markdown format (it should show the image using the proxy)

![Torrust Logo](https://raw.githubusercontent.com/torrust/torrust-index-backend/develop/docs/media/torrust_logo.png)

Another valid (JPG) image in markdown format (it should show the image using the proxy)

![Mandelbrot Set](https://upload.wikimedia.org/wikipedia/commons/2/21/Mandel_zoom_00_mandelbrot_set.jpg)

Valid image (PNG) in html format (it should show the image using the proxy):

<img src="https://raw.githubusercontent.com/torrust/torrust-index-backend/develop/docs/media/torrust_logo.png">

Invalid image (TIF) in html format (it should remove the image):

<img src="https://commons.wikimedia.org/wiki/Category:TIFF_files#/media/File:Arkansas_Constitution_of_1836,_page_1.tif">

Invalid html link (it should remove the href attribute value):

<a href="https://commons.wikimedia.org/wiki/Category:TIFF_files#/media/File:Arkansas_Constitution_of_1836,_page_1.tif">Arkansas Constitution of 1836. Page 1</a>

<embed type="video/webm"
       src="/media/cc0-videos/flower.mp4"
       width="250"
       height="200">

Todo

  • Basic sanitize function
  • Remove other HTML tags that can contain external sources like <video>
  • Add tests: unit and E2E. This could be hard to test as base64 encoded images are big strings. Maybe with a very small image.
  • Refactor: extract sanitize function from component to a service

I think these are the HTML tags that can contain sources linking to external resources:

  • <a>: The href attribute can contain an external URL.
  • <img>: The src attribute can point to an external image URL.
  • <script>: The src attribute can point to an external JavaScript file.
  • <link>: Used for linking CSS files, favicons, etc., the href attribute can contain an external URL.
  • <iframe>: The src attribute can contain an external URL to embed content from another site.
  • <object> and <embed>: These tags are used to embed multimedia content like Flash or PDFs, and their data and src attributes respectively can contain external URLs.
  • <audio> and <video>: The src attribute can contain an external URL to a media file.
  • <source>: This tag, used inside ,
  • <form>: The action attribute can contain an external URL where the form data is sent when submitted.
  • <meta http-equiv="refresh" content="0; url=http://example.com/" />: This meta tag can be used to redirect to an external URL.

The dompurifier only removes unsafe code, but we also want to remove external links to avoid tracking the users.

To sanitize markdown in torrent description.
- Purify HTML to avoid potencial XSS attacks.
- Remove external URLS to protect users' privacy.
@josecelano
Copy link
Member Author

josecelano commented Jul 4, 2023

@da2ce7 maybe we can remove all HTML tags except <img> tags and other tags converted from markdown, for example, the markdown syntax for bold text: ** -> <strong>.

Here are some typical mappings:

- Headers (# through ######) convert to <h1> through <h6>.
- Emphasis (* or _) converts to <em>.
- Strong emphasis (** or __) converts to <strong>.
- Strikethrough (~~) converts to <del>.
- Links ([text](url)) convert to <a href="url">text</a>.
- Images (![alt text](url)) convert to <img src="url" alt="alt text">.
- Unordered lists (*, -, or +) convert to <ul> with <li> elements.
- Ordered lists (1., 2., etc.) convert to <ol> with <li> elements.
- Inline code (`code`) converts to <code>.
- Code blocks (javascript ` `` or python etc.) convert to<pre><code>`.
- Blockquotes (> or >>) convert to <blockquote>.
- Horizontal rules (---, ***, or ___) convert to <hr>.

…description

We allow only HTML tags that are used when converting from markdown to
HTML. We want to avoid using html tag that might contain external
sources ("src" attributes) which can be used to track users.
Decouple function from Vue component so that:

- They can be tested independently
- They can be reused easily in other components
@josecelano
Copy link
Member Author

I'm going to close this PR as it's because it's near impossible to setup unit testing for Nuxt+TypeScript+Jest; I'm getting a never-ending stream of errors :-. I will open a different issue to setup the configuration for unit testing.

@josecelano josecelano merged commit d3ae471 into torrust:develop Jul 5, 2023
@josecelano
Copy link
Member Author

I'm going to close this PR as it's because it's near impossible to setup unit testing for Nuxt+TypeScript+Jest; I'm getting a never-ending stream of errors :-. I will open a different issue to setup the configuration for unit testing.

I've opened a PR #159 with my attempt to set up unit testing scaffolding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Review lint warning: warning 'v-html' directive can lead to XSS attack vue/no-v-html

1 participant