Skip to content

Conversation

@Andrew-William-Smith
Copy link

@Andrew-William-Smith Andrew-William-Smith commented Sep 23, 2024

Following largely the same approach as #1, this patch fixes the improper string termination formerly performed for BadPattern exception messages, which are sourced from pcre2_get_error_message(). It also increases the size of the buffer into which these error messages are copied, choosing the same constant of 256 as used by most other open-source projects (as found by a quick survey of uses of this function on GitHub): currently, the maximum length of any of PCRE2's error messages is 92 bytes including the null terminator, so this change moves us well out of the range of our buffer being too small.

Prior to this patch:

# Pcre2.regexp "(";;
Exception:
Pcre2.Error
 (Pcre2.BadPattern
   ("missing closing parenthesis\000\001\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\b\000\000\000\000\000\000�»�\004\001\000\000\000\003\000\000\000\000\000\000\000ü\004\000\000\000\000\000\000A\000\000\000\000\000\000\006\000\004\000\000\000\000\000\000",
   1)).

With this patch:

# Pcre2.regexp "(";;
Exception: Pcre2.Error (Pcre2.BadPattern ("missing closing parenthesis", 1)).

Previously, we would overallocate the string passed back to the OCaml
runtime containing the PCRE2 error message; now, we copy the raw error
to an intermediate buffer on the stack before allocating an OCaml string
of exactly the proper length. This change both fixes the absence of the
string terminator and consumes less memory overall.

Additionally, I've increased the size of the error message buffer to
256; while 128 is sufficient for the current set of errors supported by
PCRE2, the maximum length of any of which is 92 including the NUL
terminator, 256 gives us some breathing room should longer messages
begin to be used. This length is also a form of unofficial standard for
PCRE error message buffer sizes, being used in the Julia runtime and
other high-profile open-source applications. Since the OCaml string we
allocate will be exactly the length of the actual error message, the
impact this additional stack allocation has is negligible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant