-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Rewrite the compiletest directive parser #128070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The job Click to see the possible cause of the failure (guessed by this bot)
|
Thank you for taking this on! I've been wanting to do this myself, but never quite gotten around to it. I'll take a closer look at the approach idea tomorrow. I want to do a survey of what compiletest currently does, and look at the problems before I look at your solution sketch. That is, I want to try to come up independently with a potential solution sketch first, and then compare notes with what you have here in case we miss things or have other considerations. |
I have not yet looked at your PR specifics (intentionally as aforementioned), but here are some of the thoughts I had previously (I will check against this when I actually look at your solution sketch):
|
FYI: there has been ideas to eventually migrate to cf. a series of MCPs: |
After I take an initial look, I'll post this PR for discussion to the #t-compiler zulip thread, this could use a few more eyes and brains on a solution sketch and see how we can make this more robust and maintainable (whatever we have now is the opposite of maintainable unfortunately, it's mostly due to organic growth). |
Just some quick replies:
Well you can look at it 😆 only thing is the code isn't as clean as it could be - functions not in logical ordering, rough namings, TODOs.
With respect to this, are you only talking about handling of the directives within At least some of this can probably be accomplished here as far as making it easier to split things out, but I think I might need an example to see exactly what you mean.
Totally possible, this can be filtered either at parse time or as a pass at validation time. At its basic level, this can just be a pass that populates test-specific
I think that the prefix directives could probably use some thought about what should be accepted. Maybe they take some other form like However, regarding fail-fast and helpful suggestions: yes, huge goal here! E.g. typing
As in, autogenerated? I had a few ideas for this but didn't yet implement it.
Here, regex is limited to a few places that have a somewhat wide range of valid inputs:
Everything else is just standard string operations.
Could you give an example of this, assuming there is more than just |
Not all of these are related to parsing, but I think we may be able to generally handle them better: Currently, uppercase directives that start with rust/src/tools/compiletest/src/load_cfg/itemlist.rs Lines 391 to 407 in db8bba1
(first item is regex that matches an item pattern, second is the fallback for near misses to warn on, third constructs the item from the regex capture) Maybe we could do something better, unsure what that would be. The syntax errors/warnings could be addressed here easily enough. I think I have this raise an error if you type Absolutely a goal here. Easy enough to add. I am not sure exactly where this behavior comes from, but seems like it might be fixable now even? Almost solved - the rust/src/tools/compiletest/src/load_cfg/prepare.rs Lines 145 to 175 in db8bba1
Should raise an error, but I need to add a test here. |
Yeah, I was just jotting down some thoughts and info/context I had so they existed somewhere, you don't necessarily need to respond :3 Also I would maybe wait a bit on the overall design direction, no need to fix the details just yet |
You collecting it together is much appreciated :) I guess the decision of whether to use ui_test or to try to improve what is in-tree remains the biggest question still. |
Err sorry let me clarify, ui_test only supports ui tests currently (as in the ui test suite style tests), it wouldn't replace compiletest 1-to-1, so we still want and need compiletest to be maintainable in any forseeable future. So yes, this PR is very valuable. If anything, ui_test might be what compiletest uses under the hood (for ui tests and perhaps more), but compiletest definitely isn't going away. |
I've reposted my previous comment on #t-compiler, and I would move this discussion to the zulip stream for more visiblity to other T-compiler people. Link: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Reworking.20compiletest.20directive.20parsing. |
const fn directive(self) -> &'static str { | ||
match self { | ||
CommentTy::Slashes => "//@", | ||
CommentTy::Hash => "#@", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: unfortunately, Makefile
s are still using the same directive prefix #
as comments #
, because I didn't try to port them when I ported //
-> //@
since we're porting Makefile
s into rmake.rs
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an interesting twist - I think it would be easy enough to just ignore unknown directives for specific kinds of tests rather than raising errors. Unless #121876 happens to be done sooner than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense to me. We don't really need to error for existing Makefile tests, and at least we are guarding the test suite with a tidy check to disallow new Makefiles to make it in.
There might also be rustdoc annotations, but yes that sounds about right to me.
I'm not 100% sure you mean by this, do you mean
If the comment starts with
Yes, sounds good.
Yeah, we do want to be careful about how much false positives (looks like directive but are actually intended comments) this detection does.
Yes, that's intended as a short term bandaid.
Yes, definitely. In general, we want to reject directives that are invalid for a test suite, as well
I can't immediately recall either, sorry! But e.g. currently
I do like how this looks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some partial reviews (I haven't gotten through the whole PR)
match self { | ||
CommentTy::Slashes => "//@", | ||
CommentTy::Hash => "#@", | ||
CommentTy::Semi => ";@", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I don't think I've ported any ; <directive-names>
directives over either, are these for like assembly tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking LLVM of IR, but duh that's not our job to test 🤦
#[derive(Debug)] | ||
struct FullError { | ||
msg: Box<str>, | ||
fname: Option<PathBuf>, | ||
pos: Option<LineCol>, | ||
context: Vec<Box<str>>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: I really like having better context for error messages 👍
pub type Error = Box<dyn std::error::Error>; | ||
pub type Result<T, E = Error> = std::result::Result<T, E>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: we might even use some kind of anyhow/thiserror in the future.
// TODO: parse revisions into `ItemVal`s. | ||
|
||
#[derive(Clone, Debug, PartialEq)] | ||
pub enum ItemVal<'src> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: so this is what I had in mind when I mentioned test suite specific directives: some of these directives make sense for ui/ui-fulldeps but not for run-make, for example, and I think we might want to separate ItemVal
(I would even just call this a Directive
?) for specific test suites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directive
does make sense, my naming isn't as clean as it could be 😄. I think that having all directives listed in one place simplifies things, but I was planning to handle this. roughly:
trait TestKind {
type Config;
/// Called within `itemlist` to filter which directives we look for
fn accepts_directive(m: &MatchItem /* I know, MatchItem is a bad name */) -> bool;
/// Called within `prepare` to set its own config
fn visit_build_config(item: &Item, cfg: &mut Self::Config) -> Result<()>;
}
This would be the only test-specific interface needed - just needs to say whether or not it supports a specific directive kind, and then say how to update its config when it gets to relevant items.
(Still mind mapping that bit out, but I think this seems reasonably clean and encapsulated)
/* Directives that are single keys but have variable suffixes */ | ||
/// Ignore something specific about this test. | ||
/// | ||
/// `//@ ignore-x86` | ||
Ignore { | ||
what: &'src str, | ||
}, | ||
/// Components needed by the test | ||
/// | ||
/// `//@ needs-rust-lld` | ||
Needs { | ||
what: &'src str, | ||
}, | ||
/// Run only if conditions are met | ||
/// | ||
/// `//@ only-linux` | ||
Only { | ||
what: &'src str, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion (not necessarily for this PR): only-*
and ignore-*
are very tricky (if we look at header/cfg.rs
) in that the suffixes are parsed with a sequence of precedence-based matchers. If you specify e.g. ignore-x86
, you have to look and reason really hard to realize "oh it's the target_cfg.all_archs
that's matching and taking effect". We probably want to eventually straight up ask for a longer prefix to help make it excruciatingly clear what we're trying to match on (this would actually better match ui_test
)
//@ only-arch-x86
/// | ||
/// `//@ normalize-stdout: foo -> bar` | ||
/// `//@ normalize-stderr-32bit: bar -> baz` | ||
/// `//@ normalize-stderr-test: "h[[:xdigit:]]{16}" -> "h[HASH]"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: I don't recall, does normalize-*
handle string escapes in regexes correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet gotten to the reimplementation yet, but I think it just does a regex match. Meaning if there are multiple ->
in the string, results might be surprising.
We should probably figure out some quoting rules and use them everywhere relevant. Do you think it makes sense to do something like shell, '...'
for literals and "..."
allowing escapes? (and interpolations, but hopefully not relevant here).
/// | ||
/// `//@[revname] directive` | ||
RevisionSpecificItems { | ||
revs: &'src str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: we probably want revisions to be represented as BTreeSet<&'src str>
, maybe. Or better yet, we could try to do something similar to ui_test
and have revision-specific directive set:
enum UiRevision {
None,
Named(String),
}
/// Revision specific?
struct UiDirectiveSet {
check_pass: bool,
edition: String,
// ...
}
type UiDirectives = BTreeMap<UiRevision, UiDirectiveSet>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ui_test
form is almost exactly what I was going for, just haven't gotten to yet :) my plan is to parse all config that is not specific to revisions into one Config
, then clone it per revision, then update the config with anything that is per revision (output lives around here https://github.com/rust-lang/rust/pull/128070/files#diff-4653d6449a3cc26e6235d71fbf2919348e09fe8eaa72db6fc8e623eaa92ce1f5R38-R43, didn't handle the 0-revision case yet).
/// True if this is an `//@` directive, i.e. something in a header` | ||
pub fn is_header_directive(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: compiletest mixes the terms directives, headers and cfgs, but AFAICT they are trying to say the same thing. Headers do not, in fact, have to appear in the head of the file (that was a historical restriction AFAIK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just been using "header directives" to refer to //@
things, "UI directives" to refer to //~ ERROR
things, and "filecheck directives" to refer to // CHECK:
things. But I agree this isn't as clear as it could be (nor consistent across documentation) - do you have preferred names? "Config directive" instead of "header directives" maybe.
(I actually currently enforce that config directives are above UI or filecheck directives as kind of a tidy thing, since having config directives later in the file seems easy to miss. I did see on Zulip though that sometimes you want to be able to append directives - maybe we can leave the check as-is but add a --no-check-directive-order
flag to opt out. Seems like possible overkill, not sure if that is worth it).
@jieyouxu hey dude this needs another review pass, don't forgor |
Oh I also need to actually work on it more, just haven't gotten back to it yet :) |
I keep meaning to do another review pass but keep forgor-ing. Maybe just wait for me a little bit at least for another review pass. |
☔ The latest upstream changes (presumably #139552) made this pull request unmergeable. Please resolve the merge conflicts. |
It has come up a few times that the way we handle files compile test files has some shortcomings. This is the start of an attempt to rewrite it in a way that gives us (hopefully) a lot more flexibility to identify errors.
It uses a new parser that is pretty simple, with this basic architecture:
Try to parse each line as an
Item
, which is anything we care about: header directives, UI error annotations, LLVM filecheck annotations (so we can validate those).If nothing shows up, try to parse as a near miss. This should catch errors in the syntax and things like typos.
If it didn't match an
Item
nor its similar parser, ignore the lineBuild these all into a
Vec<Item>
, the output ofitemlist.rs
We can then do a handful of passes to either check for errors or perform some behavior. Some important ones:
Config
Config
The goal at this point is to catch anything that is correct syntax but invalid for whatever reason, and to construct configuration to run tests.
Rather than
panic!
, errors get raised and tagged with location info (line, col, file) as they bubble up.I think this should help solve a number of problems:
strsim
) or to catch things that look like they are supposed to be UI directives but aren't (e.g.//~[revision]
instead of//[revision]~
)It is pretty early and I haven't started connecting everything into the existing infrastructure, but I just wanted to get your feedback before I get too far along @jieyouxu. So please don't read the code too much, not super clean :)
r? @jieyouxu