Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@benaadams
Copy link
Member

@benaadams benaadams commented Dec 11, 2017

Total bytes of diff: -6455 (-0.19% of base)
    diff is an improvement.

Total byte diff includes 1389 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    2 unique methods,     1389 unique bytes

Top file improvements by size (bytes):
       -6455 : System.Private.CoreLib.dasm (-0.19% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regessions by size (bytes):
        1287 : System.Private.CoreLib.dasm - Dictionary`2:Rehash():this (0/11 methods)
         102 : System.Private.CoreLib.dasm - HashHelpers:ResizeDictionaryArrays(ref,ref,int,int):ref (0/1 methods)

Top method improvements by size (bytes):
       -3749 : System.Private.CoreLib.dasm - Dictionary`2:Resize(int,bool):this (29 methods)
        -671 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,struct,ubyte):bool:this (4 methods)
        -462 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,int,ubyte):bool:this (3 methods)
        -452 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,ref,ubyte):bool:this (3 methods)
        -360 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
        -296 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,struct,ubyte):bool:this (2 methods)
        -240 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,ref,ubyte):bool:this (2 methods)
        -156 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
        -156 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,long,ubyte):bool:this
        -154 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,bool,ubyte):bool:this
        -154 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,char,ubyte):bool:this
        -146 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,int,ubyte):bool:this
        -146 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,ubyte,ubyte):bool:this
        -128 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(short,long,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,short,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ubyte,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,int,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,bool,ubyte):bool:this
        -114 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(char,ref,ubyte):bool:this

21 total methods with size differences (19 improved, 2 regressed), 16724 unchanged.

3 range checks rather than 6
2 are mod Length based (1 of them is conditional) https://github.com/dotnet/coreclr/issues/15472
1 can't be helped

@benaadams benaadams changed the title Dictionary TryInsert CQ Improve Dictionary TryInsert CQ Dec 11, 2017
for (int i = _buckets[targetBucket]; i >= 0; i = _entries[i].next)
int i = _buckets[targetBucket];
Entry[] entries = _entries;
while (i >= 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way is while better than a for loop?
In general (so not specific to the changes here) we're going from foreach - > for loop - > while (with locals).
What's next? :p

Having such manually optimized code in this repo is ok, but I honestly hope the JIT can be improved further so that (performant) user code can remain simple and readable. And yes, I know there's work in progress (eg. https://github.com/dotnet/coreclr/issues/15130)

Copy link
Member Author

@benaadams benaadams Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way is while better than a for loop?

When the for loop is not really a for loop, but a while loop in disguise.

The "increment" portion ; i = _entries[i].next) causes an additional bounds check as a for but not as a while

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at try while ((uint)i < (uint)entries.Length), usually this doesn't work in loops but I've seen a case where it did, and I think it was a while too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I probably need to take a closer look at this, it's not readily obvious why while is different than for in this case. Does the difference persist if you move int i = _buckets[targetBucket]; after Entry[] entries = _entries;?

Copy link
Member Author

@benaadams benaadams Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit sensitive

PR

Total bytes of diff: -3687 (-0.11% of base)
(5 range checks, bytes of code 565)
-249 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 894)
-3 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this 

PR + for loop (local var)

Total bytes of diff: -2860 (-0.08% of base)
(6 range checks, bytes of code 594)
-162 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(7 range checks, bytes of code 902)
+5 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this

PR + for loop (local var) + (uint)i < (uint)entries.Length condition

Total bytes of diff: -3093 (-0.09% of base)
(6 range checks, bytes of code 579)
-207 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(7 range checks, bytes of code 891)
-6 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this

PR + switching Entry[] entries = with int i =

Total bytes of diff: -3659 (-0.11% of base)
(5 range checks, bytes of code 569)
-237 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 894)
-3 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this

PR + while ((uint)i < (uint)entries.Length)

Total bytes of diff: -3607 (-0.11% of base)
(5 range checks, bytes of code 573)
-225 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 870)
-27 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this 

PR + while ((uint)i < (uint)entries.Length) + switching Entry[] entries = with int i =

Total bytes of diff: -3594 (-0.11% of base)
(5 range checks, bytes of code 576)
-216 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
(6 range checks, bytes of code 870)
-27 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this 

6 range checks for the reference version seems a bit high (over the 5 for non reference key)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler one to look at might be #15460

Changing to (uint)i < (uint)entries.Length doesn't skip bounds check there either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benaadams
Copy link
Member Author

benaadams commented Dec 12, 2017

Might have gone a bit far, but changes are

Total bytes of diff: -6455 (-0.19% of base)
    diff is an improvement.

Total byte diff includes 1389 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    2 unique methods,     1389 unique bytes

Top file improvements by size (bytes):
       -6455 : System.Private.CoreLib.dasm (-0.19% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regessions by size (bytes):
        1287 : System.Private.CoreLib.dasm - Dictionary`2:Rehash():this (0/11 methods)
         102 : System.Private.CoreLib.dasm - HashHelpers:ResizeDictionaryArrays(ref,ref,int,int):ref (0/1 methods)

Top method improvements by size (bytes):
       -3749 : System.Private.CoreLib.dasm - Dictionary`2:Resize(int,bool):this (29 methods)
        -671 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,struct,ubyte):bool:this (4 methods)
        -462 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,int,ubyte):bool:this (3 methods)
        -452 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,ref,ubyte):bool:this (3 methods)
        -360 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this (3 methods)
        -296 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,struct,ubyte):bool:this (2 methods)
        -240 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,ref,ubyte):bool:this (2 methods)
        -156 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,ref,ubyte):bool:this
        -156 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,long,ubyte):bool:this
        -154 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,bool,ubyte):bool:this
        -154 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,char,ubyte):bool:this
        -146 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,int,ubyte):bool:this
        -146 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,ubyte,ubyte):bool:this
        -128 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(short,long,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,short,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ubyte,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,int,ubyte):bool:this
        -115 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,bool,ubyte):bool:this
        -114 : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(char,ref,ubyte):bool:this

21 total methods with size differences (19 improved, 2 regressed), 16724 unchanged.

3 range checks rather than 6
2 are mod Length based (1 of them is conditional) https://github.com/dotnet/coreclr/issues/15472
1 can't be helped

@benaadams
Copy link
Member Author

; Lcl frame size = 56
G_M59118_IG01:
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 56
       mov      rsi, rcx
       mov      edi, edx
       mov      ebx, r8d
       mov      ebp, r9d
G_M59118_IG02:
       cmp      gword ptr [rsi+8], 0
       jne      SHORT G_M59118_IG03
       mov      rcx, rsi
       xor      edx, edx
       call     Dictionary`2:Initialize(int):this
G_M59118_IG03:
       mov      r14, gword ptr [rsi+24]
       mov      rcx, r14
       mov      edx, edi
       lea      r11, [(reloc)]
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]IEqualityComparer`1:GetHashCode(int):int:this
       mov      r15d, eax
       and      r15d, 0xD1FFAB1E
       xor      r12d, r12d
       mov      rax, gword ptr [rsi+8]
       mov      rcx, rax
       mov      r8d, dword ptr [rax+8]
       mov      eax, r15d
       cdq      
       idiv     edx:eax, r8d                       ; % Length
       cmp      edx, r8d
       jae      G_M59118_IG20                      ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rdx, edx
       lea      r13, bword ptr [rcx+4*rdx+16]
       mov      eax, dword ptr [r13]
       mov      r9, gword ptr [rsi+16]
G_M59118_IG04:                                     ; loop start   
       mov      r10d, dword ptr [r9+8]
       cmp      r10d, eax
       jbe      G_M59118_IG11
       movsxd   rdx, eax
       mov      rax, rdx
       shl      rax, 4
       cmp      dword ptr [r9+rax+16], r15d
       jne      SHORT G_M59118_IG05
       mov      gword ptr [rsp+28H], r9
       mov      qword ptr [rsp+30H], rax
       lea      r10, bword ptr [r9+rax+16]
       mov      bword ptr [rsp+20H], r10
       mov      edx, dword ptr [r10+8]
       mov      rcx, r14
       mov      r8d, edi
       lea      r11, [(reloc)]
       cmp      dword ptr [rcx], ecx
       call     qword ptr [r11]IEqualityComparer`1:Equals(int,int):bool:this
       test     eax, eax
       mov      rax, qword ptr [rsp+30H]
       mov      r9, gword ptr [rsp+28H]
       jne      SHORT G_M59118_IG06
G_M59118_IG05:
       lea      r10, bword ptr [r9+rax+16]
       mov      rax, r10
       mov      eax, dword ptr [rax+4]
       inc      r12d
       jmp      SHORT G_M59118_IG04               ; loop end    
G_M59118_IG06:
       mov      r13, bword ptr [rsp+20H]
       movzx    rcx, bpl
       cmp      ecx, 1
       jne      SHORT G_M59118_IG08
       mov      dword ptr [r13+12], ebx
       inc      dword ptr [rsi+68]
       mov      eax, 1
G_M59118_IG07:
       add      rsp, 56
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
G_M59118_IG08:
       cmp      ecx, 2
       je       G_M59118_IG19
G_M59118_IG09:
       xor      eax, eax
G_M59118_IG10:
       add      rsp, 56
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
G_M59118_IG11:
       xor      ebp, ebp
       xor      r14d, r14d
       mov      ecx, dword ptr [rsi+64]
       test     ecx, ecx
       jle      SHORT G_M59118_IG12
       mov      r12d, dword ptr [rsi+60]
       mov      r14d, 1
       dec      ecx
       mov      dword ptr [rsi+64], ecx
       jmp      SHORT G_M59118_IG14
G_M59118_IG12:
       mov      ecx, dword ptr [rsi+56]
       mov      r12d, ecx
       cmp      r10d, r12d
       jne      SHORT G_M59118_IG13
       call     HashHelpers:ExpandPrime(int):int
       mov      edx, eax
       mov      rcx, rsi
       xor      r8d, r8d
       call     Dictionary`2:Resize(int,bool):this
       mov      ebp, 1
G_M59118_IG13:
       lea      eax, [r12+1]
       mov      dword ptr [rsi+56], eax
       mov      r9, gword ptr [rsi+16]
       mov      rcx, r9
       mov      r9, rcx
G_M59118_IG14:
       test     ebp, ebp
       jne      SHORT G_M59118_IG15
       jmp      SHORT G_M59118_IG16
G_M59118_IG15:
       mov      rax, gword ptr [rsi+8]
       mov      rcx, rax
       mov      r8d, dword ptr [rax+8]
       mov      eax, r15d
       cdq      
       idiv     edx:eax, r8d                       ; % Length
       cmp      edx, r8d
       jae      G_M59118_IG20                      ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rax, edx
       lea      r13, bword ptr [rcx+4*rax+16]
G_M59118_IG16:
       cmp      r12d, dword ptr [r9+8]
       jae      G_M59118_IG20                      ; CORINFO_HELP_RNGCHKFAIL
       movsxd   rax, r12d
       shl      rax, 4
       lea      rax, bword ptr [r9+rax+16]
       test     r14d, r14d
       je       SHORT G_M59118_IG17
       mov      ecx, dword ptr [rax+4]
       mov      dword ptr [rsi+60], ecx
G_M59118_IG17:
       mov      dword ptr [rax], r15d
       mov      ecx, dword ptr [r13]
       mov      dword ptr [rax+4], ecx
       mov      dword ptr [rax+8], edi
       mov      dword ptr [rax+12], ebx
       mov      dword ptr [r13], r12d
       inc      dword ptr [rsi+68]
       mov      eax, 1
G_M59118_IG18:
       add      rsp, 56
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       ret      
************** Beginning of cold code **************
G_M59118_IG19:
       mov      ecx, edi
       call     ThrowHelper:ThrowAddingDuplicateWithKeyArgumentException(int)
       int3     
G_M59118_IG20:
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
; Total bytes of code 508, prolog size 27 for method Dictionary`2:TryInsert(int,int,ubyte):bool:this


Array.Copy(_entries, 0, entries, 0, count);

_entries = entries;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look right - this is assigning into argument.


private void Resize(int newSize, bool forceNewHashCodes)
[MethodImpl(MethodImplOptions.NoInlining)]
private void Rehash()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether separating this into a non-inlineable is helping in either JIT or AOT case. There is a fixed memory overhead per generic method even when it is not JITed (ie the cost of the generics is more than the cost of the code). I would keep it where it was.

{
// Identical initialize code from Dictionary Resize, externalized to not replicate from generic instantiation
int[] buckets = new int[newSize];
for (int i = 0; i < buckets.Length; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would omit this change from this PR. We have rejected similar micro-optimizations that make the code hard to follow and maintains.

If we were to reconsider them, I think switching buckets to have one-based index so that it does not need to be initialized to -1 would look better than this and likely save more.

@benaadams
Copy link
Member Author

Will incorporate feedback, but just testing

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams benaadams mentioned this pull request Dec 15, 2017
@benaadams
Copy link
Member Author

Incorporated into #15419 as a commit

@benaadams benaadams closed this Dec 16, 2017
@benaadams benaadams deleted the dict-tryinsert-cq branch October 14, 2019 00:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants