Skip to content

Char.GetUnicodeCategory returns wrong category for certain Latin-1 characters #10990

@GrabYourPitchforks

Description

@GrabYourPitchforks

In a nutshell, there are certain characters where CharUnicodeInfo.GetUnicodeCategory returns the correct value, but Char.GetUnicodeCategory returns the wrong value. One such character is U+00B6 PILCROW SIGN, where CharUnicodeInfo returns OtherPunctuation (which is correct) and where Char returns OtherSymbol (which is incorrect). This also affects the behavior of dependent methods like Char.IsPunctuation and Char.IsLower.

MSDN says this behavior is intentional to preserve back-compat, but it is extraordinarily confusing to have two methods with the same name have different behavior.

One solution would be to update Char.GetUnicodeCategory to stay in sync with CharUnicodeInfo.GetUnicodeCategory. This is a breaking change, but it's the type of breaking change that is normally allowed in side-by-side major version updates.

An alternative is to mark Char.GetUnicodeCategory, Char.IsPunctuation, Char.IsLower, etc. as obsolete and to direct users to call into CharUnicodeInfo instead. This preserves existing behavior and provides a migration story to get developers on to the APIs which provide correct results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-System.Globalizationbreaking-changeIssue or PR that represents a breaking API or functional change over a prerelease.enhancementProduct code improvement that does NOT require public API changes/additions

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions