-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Improve Dictionary TryInsert CQ #15462
Conversation
| for (int i = _buckets[targetBucket]; i >= 0; i = _entries[i].next) | ||
| int i = _buckets[targetBucket]; | ||
| Entry[] entries = _entries; | ||
| while (i >= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way is while better than a for loop?
In general (so not specific to the changes here) we're going from foreach - > for loop - > while (with locals).
What's next? :p
Having such manually optimized code in this repo is ok, but I honestly hope the JIT can be improved further so that (performant) user code can remain simple and readable. And yes, I know there's work in progress (eg. https://github.com/dotnet/coreclr/issues/15130)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way is while better than a for loop?
When the for loop is not really a for loop, but a while loop in disguise.
The "increment" portion ; i = _entries[i].next) causes an additional bounds check as a for but not as a while
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at try while ((uint)i < (uint)entries.Length), usually this doesn't work in loops but I've seen a case where it did, and I think it was a while too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I probably need to take a closer look at this, it's not readily obvious why while is different than for in this case. Does the difference persist if you move int i = _buckets[targetBucket]; after Entry[] entries = _entries;?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit sensitive
PR
Total bytes of diff: -3687 (-0.11% of base)
(5 range checks, bytes of code 565)
-249 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 894)
-3 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
PR + for loop (local var)
Total bytes of diff: -2860 (-0.08% of base)
(6 range checks, bytes of code 594)
-162 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(7 range checks, bytes of code 902)
+5 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
PR + for loop (local var) + (uint)i < (uint)entries.Length condition
Total bytes of diff: -3093 (-0.09% of base)
(6 range checks, bytes of code 579)
-207 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(7 range checks, bytes of code 891)
-6 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
PR + switching Entry[] entries = with int i =
Total bytes of diff: -3659 (-0.11% of base)
(5 range checks, bytes of code 569)
-237 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 894)
-3 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
PR + while ((uint)i < (uint)entries.Length)
Total bytes of diff: -3607 (-0.11% of base)
(5 range checks, bytes of code 573)
-225 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 870)
-27 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
PR + while ((uint)i < (uint)entries.Length) + switching Entry[] entries = with int i =
Total bytes of diff: -3594 (-0.11% of base)
(5 range checks, bytes of code 576)
-216 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 870)
-27 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
6 range checks for the reference version seems a bit high (over the 5 for non reference key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler one to look at might be #15460
Changing to (uint)i < (uint)entries.Length doesn't skip bounds check there either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
018d676 to
d76207e
Compare
|
Might have gone a bit far, but changes are 3 range checks rather than 6 |
; Lcl frame size = 56
G_M59118_IG01:
push r15
push r14
push r13
push r12
push rdi
push rsi
push rbp
push rbx
sub rsp, 56
mov rsi, rcx
mov edi, edx
mov ebx, r8d
mov ebp, r9d
G_M59118_IG02:
cmp gword ptr [rsi+8], 0
jne SHORT G_M59118_IG03
mov rcx, rsi
xor edx, edx
call Dictionary`2:Initialize(int):this
G_M59118_IG03:
mov r14, gword ptr [rsi+24]
mov rcx, r14
mov edx, edi
lea r11, [(reloc)]
cmp dword ptr [rcx], ecx
call qword ptr [r11]IEqualityComparer`1:GetHashCode(int):int:this
mov r15d, eax
and r15d, 0xD1FFAB1E
xor r12d, r12d
mov rax, gword ptr [rsi+8]
mov rcx, rax
mov r8d, dword ptr [rax+8]
mov eax, r15d
cdq
idiv edx:eax, r8d ; % Length
cmp edx, r8d
jae G_M59118_IG20 ; CORINFO_HELP_RNGCHKFAIL
movsxd rdx, edx
lea r13, bword ptr [rcx+4*rdx+16]
mov eax, dword ptr [r13]
mov r9, gword ptr [rsi+16]
G_M59118_IG04: ; loop start
mov r10d, dword ptr [r9+8]
cmp r10d, eax
jbe G_M59118_IG11
movsxd rdx, eax
mov rax, rdx
shl rax, 4
cmp dword ptr [r9+rax+16], r15d
jne SHORT G_M59118_IG05
mov gword ptr [rsp+28H], r9
mov qword ptr [rsp+30H], rax
lea r10, bword ptr [r9+rax+16]
mov bword ptr [rsp+20H], r10
mov edx, dword ptr [r10+8]
mov rcx, r14
mov r8d, edi
lea r11, [(reloc)]
cmp dword ptr [rcx], ecx
call qword ptr [r11]IEqualityComparer`1:Equals(int,int):bool:this
test eax, eax
mov rax, qword ptr [rsp+30H]
mov r9, gword ptr [rsp+28H]
jne SHORT G_M59118_IG06
G_M59118_IG05:
lea r10, bword ptr [r9+rax+16]
mov rax, r10
mov eax, dword ptr [rax+4]
inc r12d
jmp SHORT G_M59118_IG04 ; loop end
G_M59118_IG06:
mov r13, bword ptr [rsp+20H]
movzx rcx, bpl
cmp ecx, 1
jne SHORT G_M59118_IG08
mov dword ptr [r13+12], ebx
inc dword ptr [rsi+68]
mov eax, 1
G_M59118_IG07:
add rsp, 56
pop rbx
pop rbp
pop rsi
pop rdi
pop r12
pop r13
pop r14
pop r15
ret
G_M59118_IG08:
cmp ecx, 2
je G_M59118_IG19
G_M59118_IG09:
xor eax, eax
G_M59118_IG10:
add rsp, 56
pop rbx
pop rbp
pop rsi
pop rdi
pop r12
pop r13
pop r14
pop r15
ret
G_M59118_IG11:
xor ebp, ebp
xor r14d, r14d
mov ecx, dword ptr [rsi+64]
test ecx, ecx
jle SHORT G_M59118_IG12
mov r12d, dword ptr [rsi+60]
mov r14d, 1
dec ecx
mov dword ptr [rsi+64], ecx
jmp SHORT G_M59118_IG14
G_M59118_IG12:
mov ecx, dword ptr [rsi+56]
mov r12d, ecx
cmp r10d, r12d
jne SHORT G_M59118_IG13
call HashHelpers:ExpandPrime(int):int
mov edx, eax
mov rcx, rsi
xor r8d, r8d
call Dictionary`2:Resize(int,bool):this
mov ebp, 1
G_M59118_IG13:
lea eax, [r12+1]
mov dword ptr [rsi+56], eax
mov r9, gword ptr [rsi+16]
mov rcx, r9
mov r9, rcx
G_M59118_IG14:
test ebp, ebp
jne SHORT G_M59118_IG15
jmp SHORT G_M59118_IG16
G_M59118_IG15:
mov rax, gword ptr [rsi+8]
mov rcx, rax
mov r8d, dword ptr [rax+8]
mov eax, r15d
cdq
idiv edx:eax, r8d ; % Length
cmp edx, r8d
jae G_M59118_IG20 ; CORINFO_HELP_RNGCHKFAIL
movsxd rax, edx
lea r13, bword ptr [rcx+4*rax+16]
G_M59118_IG16:
cmp r12d, dword ptr [r9+8]
jae G_M59118_IG20 ; CORINFO_HELP_RNGCHKFAIL
movsxd rax, r12d
shl rax, 4
lea rax, bword ptr [r9+rax+16]
test r14d, r14d
je SHORT G_M59118_IG17
mov ecx, dword ptr [rax+4]
mov dword ptr [rsi+60], ecx
G_M59118_IG17:
mov dword ptr [rax], r15d
mov ecx, dword ptr [r13]
mov dword ptr [rax+4], ecx
mov dword ptr [rax+8], edi
mov dword ptr [rax+12], ebx
mov dword ptr [r13], r12d
inc dword ptr [rsi+68]
mov eax, 1
G_M59118_IG18:
add rsp, 56
pop rbx
pop rbp
pop rsi
pop rdi
pop r12
pop r13
pop r14
pop r15
ret
************** Beginning of cold code **************
G_M59118_IG19:
mov ecx, edi
call ThrowHelper:ThrowAddingDuplicateWithKeyArgumentException(int)
int3
G_M59118_IG20:
call CORINFO_HELP_RNGCHKFAIL
int3
; Total bytes of code 508, prolog size 27 for method Dictionary`2:TryInsert(int,int,ubyte):bool:this |
|
|
||
| Array.Copy(_entries, 0, entries, 0, count); | ||
|
|
||
| _entries = entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look right - this is assigning into argument.
|
|
||
| private void Resize(int newSize, bool forceNewHashCodes) | ||
| [MethodImpl(MethodImplOptions.NoInlining)] | ||
| private void Rehash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether separating this into a non-inlineable is helping in either JIT or AOT case. There is a fixed memory overhead per generic method even when it is not JITed (ie the cost of the generics is more than the cost of the code). I would keep it where it was.
| { | ||
| // Identical initialize code from Dictionary Resize, externalized to not replicate from generic instantiation | ||
| int[] buckets = new int[newSize]; | ||
| for (int i = 0; i < buckets.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would omit this change from this PR. We have rejected similar micro-optimizations that make the code hard to follow and maintains.
If we were to reconsider them, I think switching buckets to have one-based index so that it does not need to be initialized to -1 would look better than this and likely save more.
|
Will incorporate feedback, but just testing @dotnet-bot test Windows_NT x64 Checked corefx_baseline |
d76207e to
603f233
Compare
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
|
Incorporated into #15419 as a commit |
3 range checks rather than 6
2 are mod Length based (1 of them is conditional) https://github.com/dotnet/coreclr/issues/15472
1 can't be helped