-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add more char.Is* helpers for common ASCII sets #69318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
|
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsFixes #68868
|
src/libraries/System.Net.HttpListener/src/System/Net/Managed/HttpListenerRequest.Managed.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question: is ((uint)(c - 'A') & ~0x20u) < 26 a more effiicent pattern because it allows folding the MOV and SUB into a single LEA operation? (I don't know the answer to this re: perf, but it is a codegen size savings on x86.)
(Tanner seemed to think it should be left as-is for readability. I'd defer to his expertise in the absence of supporting evidence otherwise.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a sharplab example that shows the lea? I don't see a codegen size difference on x86, but I might be doing it wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using System;
foreach (Char ch in "Hello!")
{
if (IsAsciiLetter1(ch)) Console.WriteLine(ch);
}
foreach (Char ch in "Hello!")
{
if (IsAsciiLetter2(ch)) Console.WriteLine(ch);
}
static bool IsAsciiLetter1(char c) => (uint)((c | 0x20) - 'a') <= 'z' - 'a';
static bool IsAsciiLetter2(char c) => (uint)((c - 'A') & ~0x20) < 26;; Core CLR 6.0.322.12309 on amd64
<Program>$.<Main>$(System.String[])
L0000: push rdi
L0001: push rsi
L0002: push rbp
L0003: push rbx
L0004: sub rsp, 0x28
L0008: mov rcx, 0x19f2a7d1c18
L0012: mov rsi, [rcx]
L0015: mov rdi, rsi
L0018: xor ebx, ebx
L001a: mov ebp, [rdi+8]
L001d: test ebp, ebp
L001f: jle short L0041
L0021: movsxd rcx, ebx
L0024: movzx ecx, word ptr [rdi+rcx*2+0xc]
L0029: mov eax, ecx
L002b: or eax, 0x20 ; <-- OR between MOV and ADD, preventing combination
L002e: add eax, 0xffffff9f
L0031: cmp eax, 0x19
L0034: ja short L003b
L0036: call 0x00007ffbe91e41d8
L003b: inc ebx
L003d: cmp ebp, ebx
L003f: jg short L0021
L0041: mov rdi, rsi
L0044: xor ebx, ebx
L0046: test ebp, ebp
L0048: jle short L0068
L004a: movsxd rcx, ebx
L004d: movzx ecx, word ptr [rdi+rcx*2+0xc]
L0052: lea eax, [rcx-0x41]
L0055: and eax, 0xffffffdf ; <-- AND after MOV and ADD, allowing them to be combined into LEA
L0058: cmp eax, 0x1a
L005b: jae short L0062
L005d: call 0x00007ffbe91e41d8
L0062: inc ebx
L0064: cmp ebp, ebx
L0066: jg short L004a
L0068: add rsp, 0x28
L006c: pop rbx
L006d: pop rbp
L006e: pop rsi
L006f: pop rdi
L0070: ret
<Program>$.<<Main>$>g__IsAsciiLetter1|0_0(Char)
L0000: movzx eax, cx
L0003: or eax, 0x20
L0006: add eax, 0xffffff9f
L0009: cmp eax, 0x19
L000c: setbe al
L000f: movzx eax, al
L0012: ret
<Program>$.<<Main>$>g__IsAsciiLetter2|0_1(Char)
L0000: movzx eax, cx
L0003: add eax, 0xffffffbf
L0006: and eax, 0xffffffdf
L0009: cmp eax, 0x1a
L000c: setb al
L000f: movzx eax, al
L0012: ret
src/libraries/System.Private.CoreLib/src/System/Globalization/DateTimeFormatInfoScanner.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/Headers/AltSvcHeaderParser.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably outside the scope of this PR, but we should probably change the FromChar helper to return -1 instead of 255 on failure. It would also simplify lots of consuming code since they could just check "< 0" and take advantage of all the magic bit-twiddling that comes along with it.
Sample: GrabYourPitchforks@f4f546c
I can submit once your PR is finished.
|
My signoff is still valid for iteration 89413a6. |
|
/azp run runtime |
|
Azure Pipelines successfully started running 1 pipeline(s). |
dakersnar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit late to the party but I really like this API! As a developer, remembering the manual bit math for char range checking is really annoying. Looks good!
| internal static bool IsValidDriveChar(char value) | ||
| { | ||
| return (value >= 'A' && value <= 'Z') || (value >= 'a' && value <= 'z'); | ||
| return (uint)((value | 0x20) - 'a') <= (uint)('z' - 'a'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use the new API, or is it too low level to be dependent on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this file is compiled into a library that has a pre-.NET 7 target. It wasn't worth ifdef'ing it to use the new API in some builds and the open-coded version in others.
Fixes #68868