Skip to content

Conversation

@stephentoub
Copy link
Member

Fixes #68868

@stephentoub stephentoub added this to the 7.0.0 milestone May 13, 2022
@ghost
Copy link

ghost commented May 13, 2022

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented May 13, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #68868

Author: stephentoub
Assignees: stephentoub
Labels:

area-System.Runtime, new-api-needs-documentation

Milestone: 7.0.0

Copy link
Member

@GrabYourPitchforks GrabYourPitchforks May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: is ((uint)(c - 'A') & ~0x20u) < 26 a more effiicent pattern because it allows folding the MOV and SUB into a single LEA operation? (I don't know the answer to this re: perf, but it is a codegen size savings on x86.)

(Tanner seemed to think it should be left as-is for readability. I'd defer to his expertise in the absence of supporting evidence otherwise.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a sharplab example that shows the lea? I don't see a codegen size difference on x86, but I might be doing it wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using System;

foreach (Char ch in "Hello!")
{
    if (IsAsciiLetter1(ch)) Console.WriteLine(ch);
}

foreach (Char ch in "Hello!")
{
    if (IsAsciiLetter2(ch)) Console.WriteLine(ch);
}

static bool IsAsciiLetter1(char c) => (uint)((c | 0x20) - 'a') <= 'z' - 'a';
static bool IsAsciiLetter2(char c) => (uint)((c - 'A') & ~0x20) < 26;
; Core CLR 6.0.322.12309 on amd64

<Program>$.<Main>$(System.String[])
    L0000: push rdi
    L0001: push rsi
    L0002: push rbp
    L0003: push rbx
    L0004: sub rsp, 0x28
    L0008: mov rcx, 0x19f2a7d1c18
    L0012: mov rsi, [rcx]
    L0015: mov rdi, rsi
    L0018: xor ebx, ebx
    L001a: mov ebp, [rdi+8]
    L001d: test ebp, ebp
    L001f: jle short L0041
    L0021: movsxd rcx, ebx
    L0024: movzx ecx, word ptr [rdi+rcx*2+0xc]
    L0029: mov eax, ecx
    L002b: or eax, 0x20 ; <-- OR between MOV and ADD, preventing combination
    L002e: add eax, 0xffffff9f
    L0031: cmp eax, 0x19
    L0034: ja short L003b
    L0036: call 0x00007ffbe91e41d8
    L003b: inc ebx
    L003d: cmp ebp, ebx
    L003f: jg short L0021
    L0041: mov rdi, rsi
    L0044: xor ebx, ebx
    L0046: test ebp, ebp
    L0048: jle short L0068
    L004a: movsxd rcx, ebx
    L004d: movzx ecx, word ptr [rdi+rcx*2+0xc]
    L0052: lea eax, [rcx-0x41]
    L0055: and eax, 0xffffffdf ; <-- AND after MOV and ADD, allowing them to be combined into LEA
    L0058: cmp eax, 0x1a
    L005b: jae short L0062
    L005d: call 0x00007ffbe91e41d8
    L0062: inc ebx
    L0064: cmp ebp, ebx
    L0066: jg short L004a
    L0068: add rsp, 0x28
    L006c: pop rbx
    L006d: pop rbp
    L006e: pop rsi
    L006f: pop rdi
    L0070: ret

<Program>$.<<Main>$>g__IsAsciiLetter1|0_0(Char)
    L0000: movzx eax, cx
    L0003: or eax, 0x20
    L0006: add eax, 0xffffff9f
    L0009: cmp eax, 0x19
    L000c: setbe al
    L000f: movzx eax, al
    L0012: ret

<Program>$.<<Main>$>g__IsAsciiLetter2|0_1(Char)
    L0000: movzx eax, cx
    L0003: add eax, 0xffffffbf
    L0006: and eax, 0xffffffdf
    L0009: cmp eax, 0x1a
    L000c: setb al
    L000f: movzx eax, al
    L0012: ret

Copy link
Member

@GrabYourPitchforks GrabYourPitchforks May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably outside the scope of this PR, but we should probably change the FromChar helper to return -1 instead of 255 on failure. It would also simplify lots of consuming code since they could just check "< 0" and take advantage of all the magic bit-twiddling that comes along with it.

Sample: GrabYourPitchforks@f4f546c

I can submit once your PR is finished.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented May 16, 2022

My signoff is still valid for iteration 89413a6.

@GrabYourPitchforks
Copy link
Member

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephentoub stephentoub merged commit 9260c24 into dotnet:main May 18, 2022
@stephentoub stephentoub deleted the charhelpers branch May 18, 2022 15:04
Copy link
Contributor

@dakersnar dakersnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit late to the party but I really like this API! As a developer, remembering the manual bit math for char range checking is really annoying. Looks good!

internal static bool IsValidDriveChar(char value)
{
return (value >= 'A' && value <= 'Z') || (value >= 'a' && value <= 'z');
return (uint)((value | 0x20) - 'a') <= (uint)('z' - 'a');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use the new API, or is it too low level to be dependent on it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this file is compiled into a library that has a pre-.NET 7 target. It wasn't worth ifdef'ing it to use the new API in some builds and the open-coded version in others.

@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: char helpers for common ASCII categories

5 participants