Skip to content

Commit c0f1ab5

Browse files
puzzletsigmavirus24
authored andcommitted
don't overfit for short multibyte sequences <https://bugzilla.mozilla.org/show_bug.cgi?id=306272>
1 parent 2016966 commit c0f1ab5

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

charade/chardistribution.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
ENOUGH_DATA_THRESHOLD = 1024
4141
SURE_YES = 0.99
4242
SURE_NO = 0.01
43+
MINIMUM_DATA_THRESHOLD = 3
4344

4445

4546
class CharDistributionAnalysis:
@@ -82,7 +83,7 @@ def get_confidence(self):
8283
"""return confidence based on existing data"""
8384
# if we didn't receive any character in our consideration range,
8485
# return negative answer
85-
if self._mTotalChars <= 0:
86+
if self._mTotalChars <= 0 or self._mFreqChars <= MINIMUM_DATA_THRESHOLD:
8687
return SURE_NO
8788

8889
if self._mTotalChars != self._mFreqChars:

0 commit comments

Comments
 (0)