Fix handling of null
with spread operator and improve test suite using mutation testing
#129
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Each commit can be reviewed independently. 4a9c4e2 and c920ac6 were largely done by an AI agent. I used VS code with the GitHub Copilot Chat extension in agent mode with Claude Sonnet 4 as the model.
4a9c4e2 was done using the following initial prompt:
The agent more or less figured out what to do, but was trying to introduce new variants of
assertIntercept
, so I had to prompt a bit further to refine things. There was also some back-and-forth where the agent got pretty confused about regex escaping, but it was able to clean things up with some further prompting and suggestions of what approach to use.c920ac6 was done using the following initial prompt:
That more or less worked, but there were some missing imports. The agent wanted to reset
SandboxTransformerTest
using git history and try again, but I stopped it and suggested that the problem might just be some missing imports, at which point it made the necessary changes successfully.58d6069 and 5df914c were mostly done by me, although I used the agent to set up Pitest (https://pitest.org/). I asked the agent to summarize the Pitest report and indicate important cases where the test suite could be improved. The summary and suggestions were fine but not great.
I asked the agent to write some new tests to cover the gaps show by the mutation testing, and it was able to create new tests following the pattern of the existing one, but it went a bit too far and introduced a lot of redundant tests. For example, it tried to generate 6 new spread operator-related tests when all we really needed was one to cover the case of
null
in the list being spread over (e.g.['a', null, 'ab']*.length()
). Interestingly, this specific case actually surfaced a bug in howgroovy-sandbox
handlesnull
with the spread operator (doesn't really matter for Pipeline because spread method calls are not supported in CPS mode). It was interesting to see the agent check the expected results by runninggroovy
, although it didn't notice on its own that our implementation had a bug. I tried getting the agent to refine the tests, but ultimately it was simpler to just do that on my own looking at the mutation testing report for interesting coverage gaps.I did try to see if the agent could generate something like
blocksUnintendedCallsToNonSyntheticConstructors2
, and although it did try quite a lot of things, it never managed to trigger the critical case. This was not that surprising though to me - this case is pretty subtle, the logic is totally unique to our project, and it even took me a while to get the test right.Testing done
See above as well as new automated tests.
Submitter checklist