This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Description
EncodeForRegExpEscape step 4.e (which would be reached if input c were a Space_Separator supplementary code point in [U+10000, U+10FFFF]) results in a return value like \u{…}. The interpretation of such pattern text is dependent upon regular expression flags—specifically, it is interpreted as a |RegExpUnicodeEscapeSequence| that will match a code point with the contained hexadecimal value in the presence of a "u" or "v" flag, but otherwise is interpreted as either a syntax error or (only in a host supporting Annex B and only when the hexadecimal representation of the code point consists only of decimal digits) as a quantified |ExtendedAtom| "u" with the specified decimal count of repetitions (e.g., /^\u{10000}$/.test("u".repeat(10000)) is true).
Rather than returning results subject to conditional interpretation, EncodeForRegExpEscape should return a \u…\u… surrogate pair |RegExpUnicodeEscapeSequence| for such inputs (which work in both Unicode and non-Unicode regular expressions, e.g. /^\uD834\uDF06$/u.test("𝌆") and /^\uD834\uDF06$/v.test("𝌆") and /^\uD834\uDF06$/.test("𝌆") are all true).
Or alternatively (and preferably IMO), EncodeForRegExpEscape should not escape all white space. I'm not certain why it does so right now, but looking back I suspect it is due to a misinterpretation of #30 (which requests escaping of control characters, and even more specifically line terminators—and even that isn't necessary).