Skip to content

Improve performance by removing SHA #618

@za3k

Description

@za3k

The performance could be faster.

When I profile my blog build, about 60% of the time spent in markdown2.py is spent running SHA hashes.

Could you explain the logic of _hash_html_block_sub and _hash_text generally? Why are we even running SHA inside a markdown converter?

It looks like... this is some some of escape mechanism, maybe? Like we generate a key, replace the HTML with the key (so it doesn't look like HTML to some other stage of the parser that should ignore it), do some processing on the outer HTML, and finally replace all the keys with the original HTML?

That could be served just as well by generating a random string rather than a hash, if so?

Ex.

def _hash_text(s: str) -> str:
    'md5-' + sha256(SECRET_SALT + s.encode("utf-8")).hexdigest()[32:]

could be replaced by the much faster

hex_digits = "0123456789abcdef"
def _hash_text(s: str) -> str:
    'md5-' + ''.join(random.choice(hex_digits) for _ in range(32)) 

for a quick fix.

(Estimate says this will make markdown conversion 2.5X faster)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions