Skip to content

Conversation

@Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Aug 29, 2023

Related to #373 which broke the docs in huggingface_hub (see #373 (comment)). The issue is caused by a docstring in which both a "<" and a ">" are written but not as a HTML tag (e.g."... <5MB .... .... something -> something else"). This PR fixes it, hopefully without breaking other docs.

In details:

  • Update _re_lt_html:
    • remove re.DOTALL (unused in the regex)
    • add re.IGNORECASE
    • update from \w+ to [a-z] after the first "<" => the start of a tag must be a letter, not an alphanumeric. This way we don't capture string like "<5MB" anymore. This update fixes the huggingface_hub issue.
  • Move _re_lt_html and _re_lcub_svelte regexes outside of convert_md_to_mdx to compile them only once.
  • Add a regression test.

@Wauplin Wauplin requested review from mishig25 and xenova August 29, 2023 13:20
Copy link
Contributor

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 29, 2023

Thanks for the quick review @mishig25 !

@Wauplin Wauplin merged commit c8152d4 into main Aug 29, 2023
@Wauplin Wauplin deleted the fix-html-escape-regex branch August 29, 2023 13:34
@mishig25
Copy link
Contributor

@Wauplin please let me know if hub client docsCI passes now 👍

@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 29, 2023

@mishig25 Just confirmed in https://github.com/huggingface/huggingface_hub/actions/runs/6013363450/job/16310680128 ! CI is green again 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants