Skip to content

The property function StableStringHash is relatively weak on some input #7131

@xoofx

Description

@xoofx

Hey,
I'm starting to use the property function StableStringHash $(MSBuildProjectName)-$([MSBuild]::StableStringHash($(MSBuildProjectFile)).ToString("x8")) for hashing a project filepath for the purpose that when you want to output to an intermediate folder that is shared (in a different obj/* location), you need to have a unique filename for the project folder.

But while looking at the generated hashes, I'm seeing a weird pattern in some numbers, where a single character change was not causing enough shuffle of the bits, which indicates that the hash algorithm is weak:

LibChild3_30-5e56f133
LibChild3_31-66f1f133
LibChild3_32-5220f133
LibChild3_33-5abbf133
LibChild3_34-7c62f133
LibChild3_35-84fdf133
LibChild3_36-6f8cf133
LibChild3_4-a48ac640 
LibChild3_5-f243a77b 
LibChild3_6-091903ca 
LibChild3_7-56d1e505 
LibChild3_8-5e70f9dc 
LibChild3_9-ac29db17 

You could see that LibChild3_30 and all up to LibChild3_36 are sharing the same 16 bits f133, while the other below from LibChild3_4 to LibChild3_9 are changing as you would expect.

So first, I found that I made a mistake, and I should have used MSBuildProjectFullPath instead of MSBuildProjectFile... as it was only hashing the filename but not the fullpath (and if we have a same project filename in different folders, we still want this to hash properly the folder)

But still, it doesn't look good at all...

So looking at the code:

internal static int GetHashCode(string fileVersion)
{
unsafe
{
fixed (char* src = fileVersion)
{
int hash1 = (5381 << 16) + 5381;
int hash2 = hash1;
int* pint = (int*)src;
int len = fileVersion.Length;
while (len > 0)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
if (len <= 2)
{
break;
}
hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
pint += 2;
len -= 4;
}
return hash1 + (hash2 * 1566083941);
}
}
}

and hashing simple strings like this:

$([MSBuild]::StableStringHash('10'))   => cdbab78f
$([MSBuild]::StableStringHash('11'))   => cdb9b78f
$([MSBuild]::StableStringHash('100'))  => b7435729
$([MSBuild]::StableStringHash('101'))  => 59eacbc4
$([MSBuild]::StableStringHash('1000')) => 00f35729
$([MSBuild]::StableStringHash('1001')) => 758e5729

And we can see that the algorithm is messing with string with a length that is even here. Hashing 10 vs 11 is generating cdbab78f and cdb9b78f, and just changed by a few bits (!)

I don't know where this algorithm comes from, but it doesn't look good as it is bit shifting+xor, instead of better approaches like xor+multiply_by_prime_number for simple hash. For example it could use FNV-1A and it would provide a much better hash.

But, I assume that now that StableStringHash is out, we cannot really change it right? (as programs are relying on it being stable... 🤔 )

So one possible way to workaround it is to hash(str + hash(str)) and that's probably what I'm gonna do...

Metadata

Metadata

Assignees

Labels

help wantedIssues that the core team doesn't plan to work on, but would accept a PR for. Comment to claim.triaged

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions