Harmless buffer overflow reads reported by valgrind and asan when validating without length set

We’ve recently switched GLib’s UTF-8 validation functions (`g_utf8_validate()` and `g_str_is_ascii()`) to use c-utf8. In doing so, we found that asan (via oss-fuzz) and valgrind report buffer overflow reads from the code. See https://gitlab.gnome.org/GNOME/glib/-/issues/3493

The buffer overflow reads are caused by the SIMD 4-byte read reading 1–3 bytes off the end of the input string buffer. They are harmless, as the 1–3 byte over-read is guaranteed to not cross a page boundary (as those are at least 4-byte aligned), and the read data is only used up to the detected nul termination byte. There are no buffer overflow writes.

These buffer overflow reads probably haven’t been detected in c-utf8 before because, as far as I can tell, its use within systemd is limited to situations where the input string length is known, so the “read until you hit the nul terminator” code path is not taken.

While these buffer overflow reads are harmless, and not a security vulnerability, they introduce noise in valgrind and asan output, potentially hiding actual vulnerabilities, and preventing those tools being used to automatically verify that code is completely free of overflows.

We’ve taken a couple of steps in GLib to try and mitigate these false positives (see the [‘related merge requests’](https://gitlab.gnome.org/GNOME/glib/-/issues/3493#related-merge-requests) for the GLib issue):
 * Annotate the validation functions as `__attribute__((no_sanitize_address))` to turn asan off for them
 * Add an [ifunc](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-ifunc-function-attribute) to switch to an alternative implementation when running under valgrind, which essentially calls `strlen()` on the input string first, then passes it to the length-is-known codepath, avoiding the buffer overflow read

These mitigations are not perfect (ifuncs [don’t appear to be well supported on musl](https://gitlab.gnome.org/GNOME/glib/-/issues/3511) and appear to [cause crashes when statically linked](https://gitlab.gnome.org/GNOME/glib/-/issues/3526)), but they’re something.

I’m filing this issue to:
 * Make you aware that the same false positives from asan and valgrind will affect c-utf8 if it’s used with a string input length unset
 * Suggest you might want to take similar mitigations, or find better mitigations (we’re still deciding whether our mitigations are actually going to work reliably, or are going to cause more trouble)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harmless buffer overflow reads reported by valgrind and asan when validating without length set #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Harmless buffer overflow reads reported by valgrind and asan when validating without length set #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions