Skip to content

utf8.read function producing wrong strings #1473

@masad-frost

Description

@masad-frost

protobuf.js version: 6.10.1

The utf8.read function seems to inserting extra unicode characters sometimes.

Here's a repro (https://repl.it/@masfrost/pbjs-bad-decode) where I check utf8.read against WHATWG TextEncoder. The repro file is not very minimal but this issue seems to be pretty common for us when decoding strings.

FYI for future readers, we monkey patched the library and forced it to use TextDecoder/TextEncoder here https://github.com/replit/crosis/blob/v5.0.3/src/fixUtf8.ts

I think maybe using the standard TextEncoder/Decoder might be the best thing to do here, encoding is just too complicated and I'm sure these standard libraries are faster. Happy to put up a PR if that's an option, otherwise, I don't really have enough time to go splunking into utf8 land.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions