Skip to content

Conversation

@mishig25
Copy link
Contributor

@mishig25 mishig25 commented Sep 23, 2023

Problem description

< & { are special characters in svelte syntax. Therefore, you can't write a doc content like 4<5. To fix this problem, we were escaping those characters with their respective html codes (lt; & lcub;) in our python code using regexes. These regexs were slow, which was fixed in #373 and #394. However, there were still some errors

Solution

Fix it at markdown/mdsvex AST level using a custom remark plugin
AST Playground

Description of the custom remark plugin

When AST tree is being traversed, if a node type is "text" (rather than "html"), then escape < & { with &#60; & &#123;. This solution works great! The only catch is that we lost the full capability of svelte syntax. {4+5} will be evaluated as string {4+5} rather than js compute statement of 9. I think it is fine since we dont't have any inline js compute statements in the docs mdx files. Usage of custom components (such as <Tip>, <Question>) should still work as expected

As a side effect, this PR should also decrease the build time since we are removing a lot regex checks in python and using already done AST tree traversal in mdsvex

todo:

mishig25 added a commit to huggingface/course that referenced this pull request Sep 23, 2023
mishig25 added a commit to huggingface/transformers that referenced this pull request Sep 24, 2023
@mishig25 mishig25 marked this pull request as ready for review September 24, 2023 15:03
@mishig25 mishig25 merged commit 5869a5a into main Sep 24, 2023
@mishig25 mishig25 deleted the remark_escape_chars branch September 24, 2023 15:06
@Wauplin
Copy link
Contributor

Wauplin commented Sep 25, 2023

Really nice, thanks @mishig25 ! 🚀

@coyotte508
Copy link
Member

Thanks, will help a lot for huggingface.js too iirc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants