-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Tokenizer: add support for PHP8 attributes #3203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer: add support for PHP8 attributes #3203
Conversation
24288d1 to
3c01426
Compare
|
Hi @alekitto Thank you so much for submitting this and for your willingness to take this on! I'd like to take some time to go through this in detail and also do some benchmark checks, as I'm sure @gsherwood will as well, but at first glance it looks like a great step forward. Some suggestions/feedback based on first impressions only:
|
3c01426 to
40b9c23
Compare
|
@jrfnl Thanks for the quick feedback. I'll answer here one question at a time:
Not really, it changed four or five times while I was coding this and this was the final form. I just transformed it into a proper method.
No, I think I need some help for this point: what kind of tests can I execute to have some useful information?
Yes, you're right. I moved that part in
Sorry, can't see it, maybe I'm just too tired 😅
👍 Done.
Correct. I converted
Tried to do it on
Ok, I've added lists of tokens in
I missed that! Added
Moved |
|
@alekitto Thanks a lot for your quick turn around and the updates. I'm sort of between PCs at the moment, but will try to do some additional testing as soon as I'm properly back up and running again. I may have more questions after that. Re: benchmarking - you could try to do some runs with BlackFire to see the performance impact of the changes and try some variations to see if it becomes better/worse. You'll need to create an account and install the Blackfire agent and phar locally - see: https://blackfire.io/docs/up-and-running/installation I'll leave a comment in-line to pinpoint the array return type tokenizer code change. |
Never mind that. Looks like that change got removed in the mean time. |
|
@gsherwood As this introduces a new token, this PR - if accepted - will have to go into a minor rather than a patch release. With that in mind, could this PR still be earmarked for 3.6.0 ? Would be a shame to leave it until 3.7.0 which still may be a while. |
|
@jrfnl I tried some variations and benchmarked them with Blackfire. PHPUnit executing I chose to replace only one I think this is the most performant path since Anyway I was thinking that the |
|
@alekitto Thanks for running those tests & benchmarking.
We can hope, but that's not necessarily true, especially for open source projects. So a project (potentially) containing PHP 8 code, doesn't necessarily mean that a PHPCS run for such a package will be done using PHP 8.x. Different people in a team may be working on different versions of the code and because of that may use a different local PHP version, CI may use yet another PHP version. This is all the more reason why the tokenizer should be stable and return the same tokens independently of the PHP version PHPCS is being run on as otherwise sniffs may return different results when run on different PHP versions. |
Agree. My thoughts was meant on the performance side of the problem: a 3ms loss per file could be acceptable on a path that is unlikely to be called. If a case of performance issues will be discovered, will be debugged and benchmarked to resolve that particular issue. Anyway my thoughts are probably philosophy applied to code (and could be ignored easily). I'll wait for your feedback if something new has to be fixed. Thanks again for your work. |
|
Just to help myself out with a review, below are the tokens produced by <?php
#[CustomAttribute, AttributeWithParams(\'foo\'), AttributeWithParams(\'foo\', bar: [\'bar\' => \'foobar\'])]
function attribute_grouping_test() {}
function multiple_attributes_on_parameter_test(#[ParamAttribute, AttributeWithParams(/* another comment */ \'foo\')] int $param) {}
Tokens in PHP < 8Tokens in PHP 8+When I get around to reviewing the PR properly, I'll use this output to make sure the backfill is working correctly. Given the backfill uses |
|
Just a thought about the attribute tokenization from a sniff writing perspective as (formatting) rules for tokens within an attribute may be different from the rules for (formatting) tokens in a non-attribute context. In a sniff which registers the Would it be a good idea to add the The alternative would be that sniffs would have to walk the token tree to see if a token is within an attribute, which I fear will quickly become a performance drain. |
|
@jrfnl Ok, done. |
|
@alekitto Thanks, you are quick ;-) I actually meant for my remark to be the starting point for discussion about this. Another things which comes to mind in that context: I also seem to remember that there is/was some discussion about possibly allowing nested attributes in PHP in the future. Not sure what the current status is of that, but if that is a realistic possibility in the near future, this may need to be handled via a What do you think ? |
|
I was thinking about it today. Probably there is this possibility already: attribute on closure or parameter of a closure passed as argument of attribute (don't know if it allowed, but I think it is). A I'll try this tomorrow and write here the result. |
|
I think you should test also attributes with namespaces, eg. |
|
Good point by @kukulich. On that note and taking other tokenizer differences across PHP versions into account, the |
Yes, I think it's necessary. |
Shouldn't be necessary. The tokens from Only nested attributes can break something right now. Anyway, I'll the test for FCQN token ASAP. |
|
Additional unit tests with as many attribute code variants as possible, including verifying the token codes/types of the tokens within the attribute for all of them, should be able to verify whether or not it is necessary, as well as to safeguard it for the future. Sorry to keep hammering on the test redundancy, but for things like this, I really feel it is important to cover this sufficiently. |
|
@jrfnl Nothing to be sorry about :) I just wanted to inform everyone of my thoughts while I was writing this: I tried to make this as little as possibile injecting the re-parsed tokens in the token stream to allow them to be parsed without losing the other tokenizer's changes/backfills. However, nested attributes tokenization was harder than I thought but it's done. It is not valid PHP code (the RFC does not allow nested attributes), but should be tokenized correctly. EDIT: I've not added the |
|
Ok, added |
|
@alekitto Tested in slevomat/coding-standard#1158 and it works! Nice work 👍 |
9a8821a to
e464df8
Compare
|
Just dropping a note here about a follow-up action item - this doesn't have to be addressed in this PR! A scan/search needs to be done for sniffs which may need adjusting to allow for attributes. Sniff(s) for which it is already known that they will most likely need adjusting:
|
|
Also |
|
@alekitto Can you solve the conflict please? |
e464df8 to
cd75b91
Compare
|
@kukulich rebased on |
|
How can I help with this one? |
|
@vladyslavstartsev Well, both this PR as well as #3226 (+ the follow-up PRs for sniff changes) still need to be merged for full PHP 8 support. AFAIK, the intention is to release 3.6.0 once these two PRs (+ sniff follow ups if they can be pulled quickly) have been merged. Any testing you can do with either of those two currently open PRs would be helpful. For match structures (PR #3226), you can find a list of sniffs already identified to need changes and their status here: #3037 (comment) I don't think such a list exists yet for attributes. |
cd75b91 to
539db72
Compare
…side an attribute
539db72 to
6367e71
Compare
|
Thanks for all the effort you put into getting attribute support working. I've now reviewed and merged, and will start looking at sniffs that might be impacted, starting with the side effects sniff. |
Up to now, when a Phar was being created using PHP < 8.0 with the PHP compactor enabled, PHP 8.0+ attributes would be incorrectly removed as if they were comments. This has now been fixed. Both the solution implemented, as well as the unit tests, are largely inspired by the solution for handling attributes when code is being tokenized on PHP < 8.0, as implemented in `PHP_CodeSniffer` in squizlabs/PHP_CodeSniffer#3203 and squizlabs/PHP_CodeSniffer#3299. Props to alekitto. Fixes 567 Co-authored-by: Alessandro Chitolina <[email protected]>
Up to now, when a Phar was being created using PHP < 8.0 with the PHP compactor enabled, PHP 8.0+ attributes would be incorrectly removed as if they were comments. This has now been fixed. Both the solution implemented, as well as the unit tests, are largely inspired by the solution for handling attributes when code is being tokenized on PHP < 8.0, as implemented in `PHP_CodeSniffer` in squizlabs/PHP_CodeSniffer#3203 and squizlabs/PHP_CodeSniffer#3299. Props to alekitto. Fixes 567 Co-authored-by: Alessandro Chitolina <[email protected]>
I tried to add PHP8 attributes parsing functionalities in PHP tokenizer.
Hope this could help completing PHP8 support on this great project.
PS: Since its my first PR here, please indicate me if there's something wrong with it. Thanks