-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add regression tests for captures in positive lookbehind assertions #120983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: stephentoub <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a bug where compiled regex engines incorrectly removed captures made within positive lookbehind assertions. The issue has already been fixed in the current codebase, and this PR adds comprehensive regression tests to prevent future regressions.
Key Changes:
- Added 4 new test cases specifically validating capture counts and values in lookbehind patterns
- Tests cover lazy quantifiers, bounded quantifiers, simple lookbehinds, and multiple groups in lookbehinds
- All tests properly skip the NonBacktracking engine which doesn't support lookbehinds
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
|
/ba-g unrelated wasm failure (" session not created: DevToolsActivePort file doesn't exist (SessionNotCreated)") |
@stephentoub What does it mean? In the past regression tests, did u only check whether a match succeeds and the overall matched values (the For slightly complex regex, even if all values in the final result (the matched value and all captures value) are exactly the same, the underlying matching process might still be incorrect. The safest approach is to verify that the captured group content changes synchronously during the intermediate steps of matching. For me, getting the captured values during the intermediate matching process isn’t very convenient. I have tried using the following steps to obtain the values of an intermediate matching process:
However, |
Closes #117605
Summary
This PR adds comprehensive regression tests to verify that captures made within positive lookbehind assertions are preserved correctly across all regex engines (Interpreter, Compiled, and SourceGenerated).
Background
Issue #117605 reported that in Compiled mode, the .NET regex engine was incorrectly removing captures made by capturing groups within positive lookbehind assertions, while the Interpreted mode preserved them correctly. For example:
Changes
Upon investigation, I found that the issue has already been fixed in the current codebase. Both Compiled and Interpreted modes now produce identical results for all test cases mentioned in the issue.
However, the existing tests in
Match_MemberData_Casesonly verified match success and matched values, not the actual capture counts and values. This PR adds comprehensive tests toMatch_Advanced_TestDatathat specifically verify:+?,{2,}?, etc.)Tests Added
(?<=(abc)+?123)a- Verifies lazy quantifier in lookbehind(?<=(abc){2,}?123)a- Verifies bounded quantifier with 2 captures(?<=(abc)+?)a- Verifies basic lazy quantifier behavior(?<=(?'1'abc)+?(?'2')123)a- Verifies multiple named groups in lookbehindAll tests properly handle the NonBacktracking engine (which doesn't support lookbehinds) and run against all other available engines.
Test Results
All 30,335 regex tests pass successfully with no regressions.
Original prompt
Fixes #117651
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.