Skip to content

Conversation

ishnagy
Copy link
Contributor

@ishnagy ishnagy commented Aug 5, 2025

reduce insertion count in SparkBloomFilterSuite to mitigate long running time

What changes were proposed in this pull request?

This change reduces the insertion count in the SparkBloomFilterSuite testsuite to the bare minimum that's necessary to demonstrate the int truncation bug in the V1 version of BloomFilterImpl.

Why are the changes needed?

#50933 introduced a new SparkBloomFilterSuite testsuite which increased the test running time of the common/sketch module from about 7s to a whopping 12minutes. This change is a workaround to decrease the test running time, until we can devise a way to then (and only then) trigger these long running tests when there are actual changes done in common/sketch.

Does this PR introduce any user-facing change?

No

How was this patch tested?

the minimum insertion count was selected based on the following measurements with the V1 version of the BloomFilterImpl:

100M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.050257 %) [00m18s] T:  ~9.6%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.053887 %) [00m09s] T:  ~9.3%

150M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.080157 %) [00m28s] T: ~15.0%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.079987 %) [00m15s] T: ~15.4%

200M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.861257 %) [00m37s] T: ~19.8%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.860424 %) [00m20s] T: ~20.6%

250M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.676172 %) [00m47s] T: ~25.1%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.675387 %) [00m25s] T: ~25.8%

300M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.210548 %) [00m57s] T: ~30.5%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.209847 %) [00m30s] T: ~30.1%

350M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (5.377388 %) [01m07s] T: ~35.8%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (5.377483 %) [00m36s] T: ~37.1%

400M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (8.170380 %) [01m17s] T: ~41.2%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (8.170716 %) [00m40s] T: ~41.2%

500M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (15.392861 %) [01m36s] T: ~51.3%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (15.391692 %) [00m50s] T: ~51.5%

1G
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (59.890330 %) [03m07s] T: 100.0%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (59.888499 %) [01m37s] T: 100.0%

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Aug 5, 2025
@ishnagy
Copy link
Contributor Author

ishnagy commented Aug 5, 2025

I have added the requested insertion count reduction, please have a look,
@dongjoon-hyun
@LuciferYang
@beliefer
@peter-toth

@the-sakthi
Copy link
Member

LGTM!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Thank you, @ishnagy , @peter-toth , @the-sakthi .

Merged to master.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-53077][CORE] reduce insertion count in SparkBloomFilterSuite [SPARK-53077][CORE][TESTS][FOLLOWUP] Reduce insertion count in SparkBloomFilterSuite Aug 5, 2025
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 5, 2025

To @LuciferYang , I'm going to merge this as a followup of your PR because the original JIRA issue title was "Re-enable SparkBloomFilterSuite"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants