[SentenceDetector] Added Flag for returning custom bounds #10567
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Currently, if users provide custom bounds to the SentenceDetector, they will not be returned at all. This PR adds a flag which will enable returning the custom bounds with separate sentences. There are also different sentence break policies which the user can choose from (either prepend or append the sentence break)
Example
with
setCustomBounds([r"\.", ";"])Without the flags will result in
With the new flag:
the result will be
Similarly with prepend:
the result will be
[ "1. This is a list", "1.1 This is a subpoint", "2. Second thing", "2.2 Second subthing" ]All test cases here.
Summary of the changes
customBoundsStrategyHow Has This Been Tested?
New tests and old test are passing.
Screenshots (if appropriate):
Types of changes
Checklist: