Skip to content

Conversation

@xtqqczze
Copy link
Contributor

internal static uint ResetLowestSetBit(uint value)
{
// It's lowered to BLSR on x86
return value & (value - 1);
}

Without any lowering, the implementation is still likely more efficient than mask &= ~(uint)(0b11 << bitPos) as it avoids a loop-carried dependency.

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-System.Memory labels Jun 20, 2023
@ghost
Copy link

ghost commented Jun 20, 2023

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

internal static uint ResetLowestSetBit(uint value)
{
// It's lowered to BLSR on x86
return value & (value - 1);
}

Without any lowering, the implementation is still likely more efficient than mask &= ~(uint)(0b11 << bitPos) as it avoids a loop-carried dependency.

Author: xtqqczze
Assignees: -
Labels:

area-System.Memory, community-contribution

Milestone: -

// the full string always.
int bitPos = BitOperations.TrailingZeroCount(mask);
nint charPos = (nint)((uint)bitPos / 2); // div by 2 (shr) because we work with 2-byte chars
nint charPos = (nint)((uint)BitOperations.TrailingZeroCount(mask) / sizeof(ushort));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think the cast to uint is needed here

Copy link
Contributor Author

@xtqqczze xtqqczze Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the uint cast for consistency and to avoid a sign-extension in the expression offset + charPos.

https://csharp.godbolt.org/z/chWM3rYbc

Copy link
Contributor Author

@xtqqczze xtqqczze Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

887ccac: uint.TrailingZeroCount(mask) is easier to read though.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, makes sense to me, godbolt example: https://godbolt.org/z/EYMfbr44K

@xtqqczze
Copy link
Contributor Author

LGTM, makes sense to me, godbolt example: https://godbolt.org/z/EYMfbr44K

Data from llvm-mca for verification: https://www.diffchecker.com/90uyQUWF/

On Haswell the RThroughput ratio is 0.72 (an improvement).

// the full string always.
int bitPos = BitOperations.TrailingZeroCount(mask);
nint charPos = (nint)((uint)bitPos / 2); // div by 2 (shr) because we work with 2-byte chars
nint charPos = (nint)(uint.TrailingZeroCount(mask) / sizeof(ushort));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nint charPos = (nint)(uint.TrailingZeroCount(mask) / sizeof(ushort));
nint charPos = (nint)(uint.TrailingZeroCount(mask) / sizeof(char));

@stephentoub This would be slightly clearer, but might not be worth spinning CI again?

@adamsitnik adamsitnik merged commit 426d18a into dotnet:main Jun 22, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Memory community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants