-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Split Dictionary Resize -> Expand and Rehash #15455
Conversation
|
Before ; Lcl frame size = 88
...
G_M29783_IG10:
413BD8 cmp ebx, r8d
0F8393000000 jae G_M29783_IG18
4863D3 movsxd rdx, ebx
488D1452 lea rdx, [rdx+2*rdx]
498D4CD510 lea rcx, bword ptr [r13+8*rdx+16]
448B4910 mov r9d, dword ptr [rcx+16]
4585C9 test r9d, r9d
7C1C jl SHORT G_M29783_IG11
418BC1 mov eax, r9d
99 cdq
F7FF idiv edx:eax, edi
413BD6 cmp edx, r14d
7373 jae SHORT G_M29783_IG18
4863C2 movsxd rax, edx
8B448510 mov eax, dword ptr [rbp+4*rax+16]
894114 mov dword ptr [rcx+20], eax
4863C2 movsxd rax, edx
895C8510 mov dword ptr [rbp+4*rax+16], ebx
G_M29783_IG11:
FFC3 inc ebx
8B44244C mov eax, dword ptr [rsp+4CH]
3BD8 cmp ebx, eax
8944244C mov dword ptr [rsp+4CH], eax
7CB8 jl SHORT G_M29783_IG10
...
; Total bytes of code 497, prolog size 29 for method Dictionary`2:Resize(int,bool):thisAfter ; Lcl frame size = 64
...
G_M13443_IG06:
443BE1 cmp r12d, ecx
735E jae SHORT G_M13443_IG10
4963C4 movsxd rax, r12d
488D0440 lea rax, [rax+2*rax]
4D8D44C610 lea r8, bword ptr [r14+8*rax+16]
418B4010 mov eax, dword ptr [r8+16]
85C0 test eax, eax
7C1A jl SHORT G_M13443_IG07
99 cdq
F7FF idiv edx:eax, edi
3BD5 cmp edx, ebp
7343 jae SHORT G_M13443_IG10
4863C2 movsxd rax, edx
8B448310 mov eax, dword ptr [rbx+4*rax+16]
41894014 mov dword ptr [r8+20], eax
4863C2 movsxd rax, edx
4489648310 mov dword ptr [rbx+4*rax+16], r12d
G_M13443_IG07:
41FFC4 inc r12d
453BE7 cmp r12d, r15d
7CC5 jl SHORT G_M13443_IG06
...
; Total bytes of code 265, prolog size 24 for method Dictionary`2:Expand(int):this |
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
|
@mikedn is this better? int bucket = (int)((uint)hashCode % (uint)prime);It produces the following asm G_M13443_IG06:
443BE1 cmp r12d, ecx
735F jae SHORT G_M13443_IG10
4963C4 movsxd rax, r12d
488D0440 lea rax, [rax+2*rax]
4D8D44C610 lea r8, bword ptr [r14+8*rax+16]
418B4010 mov eax, dword ptr [r8+16]
85C0 test eax, eax
7C1B jl SHORT G_M13443_IG07
33D2 xor rdx, rdx
F7F7 div edx:eax, edi
3BD5 cmp edx, ebp
7343 jae SHORT G_M13443_IG10
4863C2 movsxd rax, edx ; can these be avoided?
8B448310 mov eax, dword ptr [rbx+4*rax+16]
41894014 mov dword ptr [r8+20], eax
4863C2 movsxd rax, edx ; can these be avoided?
4489648310 mov dword ptr [rbx+4*rax+16], r12d
G_M13443_IG07:
41FFC4 inc r12d
453BE7 cmp r12d, r15d
7CC4 jl SHORT G_M13443_IG06 |
|
Error is api change /cc @ahsonkhan |
for (int i = 0; i < count; i++)
{
ref Entry entry = ref entries[i];
int hashCode = entry.hashCode;
if (hashCode >= 0)
{
ref int bucket = ref buckets[((uint)hashCode % (uint)prime)];
entry.next = bucket;
bucket = i;
}
}Changes it to G_M13443_IG06:
443BE1 cmp r12d, ecx
735D jae SHORT G_M13443_IG10
4963C4 movsxd rax, r12d
488D0440 lea rax, [rax+2*rax]
4D8D44C610 lea r8, bword ptr [r14+8*rax+16]
418B4010 mov eax, dword ptr [r8+16]
85C0 test eax, eax
7C19 jl SHORT G_M13443_IG07
33D2 xor rdx, rdx
F7F6 div edx:eax, esi
3BD5 cmp edx, ebp
7341 jae SHORT G_M13443_IG10
4863C2 movsxd rax, edx
488D448310 lea rax, bword ptr [rbx+4*rax+16]
8B10 mov edx, dword ptr [rax]
41895014 mov dword ptr [r8+20], edx
448920 mov dword ptr [rax], r12d
G_M13443_IG07:
41FFC4 inc r12d
453BE7 cmp r12d, r15d
7CC6 jl SHORT G_M13443_IG06But again don't know if that's any better |
|
Can skip the range checks with Unsafe Add; but I assume that wouldn't be palatable? |
|
What is the total size of Dictionary instantiation before/after this change? Or what is the System.Private.CoreLib.dll native image size before/after this change? |
Better how? Unsigned division is just as costly as signed division.
Probably it's a wash. Or worse. That |
@jkotas This seems like it should work when jitting but does it work when crossgening? I guess not, |
Agree. The trade-off is different for different configs. Also, the implementation is shared with .NET Native / CoreRT and Xamarin / Mono that have full AOT targets. For changes like this, it is important to understand what we are gaining and losing. |
|
Thinking about this some more: I think it is better tradeoff, considering the different configs, to keep Resize and Expand as a single method. It helps AOT configs much more than it hurts the JIT configs. |
This due to all instance methods getting baked into type on AOT compile; regardless of usage? Would it be better then to externalize Rehash then only for |
|
Externalizing Dictionary Rehash |
2533f64 to
5c0429a
Compare
|
static class |
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
|
The spliting the methods into two achieves:
Can we achieve most of the benefit by just skipping the for-loop that recomputes the hashcodes for valuetypes in the unified method? |
|
It either recomputes the hashcodes or expands; but never both; so its a function with two exclusive modes. However, currently it always reallocates the arrays no matter which its doing and loops twice for rehashing. The efficiency of rehashing probably isn't greatly important as rehashing will only ever happen once per dictionary; however resizing will happen lots. So its avoiding the for-loop that recomputes the hashcodes; which is a code path that is only ever run 0 - 1 times for a From a code size point of view, skipping it for value types gets most of the way there: https://github.com/dotnet/coreclr/compare/master...benaadams:rehash-skip-structs?expand=1 Would you prefer that as a PR instead? |
|
Or... could split it to two methods and if out the body for structs? |
3486e24 to
9333726
Compare
|
Changed to if out resize body for structs |
9333726 to
abfe1ad
Compare
Right - the efficiency of rehashing is not that important. It will only happen once, and only when the system is under likely DoS attack - close to never in practice. The most important thing is that it works reliably and does make things worse by exposing new bugs. It may be one of the reasons why why the implementation was done as a small modification of Resize. I would prefer to keep it that way. Minimizing rehasing-specific code reduces chances that there is rehasing-specific bug. |
|
Makes sense |
|
This PR had a couple of other good changes (caching fields in locals) that are pure improvements. It would be great if you can submit those - I will be happy to take them. |
Unify
OnDeserializationinitialization with .ctorInitializeSplit
Resize->ExpandandRehashRehashing is only used in
stringkey variant so it doesn't need to be compiled in every ExpandRehashing doesn't expand the Dictionary so it can reuse the existing arrays
Expand doesn't need the extra param and loop that it always skips
Loop CQ improvements
Shrinks the generic Resize (as Expand) from 497 bytes to 265 bytes
PTAL @jkotas