-
Notifications
You must be signed in to change notification settings - Fork 36
Initial experiments with serialized testing #593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Let me start with a dumb question: I think what this PR is aiming to do is a way of saying "for a grammar G and an input I, let me write a test for it". And, I think, furthermore it's trying to allow users to say "I'm only going to write a test for a subset of I w.r.t. G"? |
I'm not certain I understand the second part specifically w.r.t
So essentially there are 4 kinds of tests supported from With a default value, of I guess I would say this more formally as: "for a grammar G, an input I, and an expected result E, the test is the comparison |
grammar_testing/src/lib.rs
Outdated
#[serde(default)] | ||
pass: Option<bool>, | ||
#[serde(default)] | ||
errors: Option<Vec<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm not really a big fan of this structure layout,
The fact that you can specify both ast
, errors
, and pass
leads to non-sensical values like:
{
pass: true,
errors: ["some error"],
}
The layout was largely driven by the serialized input, I think ideally this would be some kind of enum
struct GrammarTest {
input: String,
expected_output: Option<GrammarTestOutput>
}
enum GrammarTestOutput {
PassFail{pass: bool},
AST{ast: ASTRepr},
Errors{errors: Vec<String>},
}
I will see if I cannot get the same or effectively similar serialized input to map to that changed structure,
perhaps using the unwrap_variant_newtypes
, it doesn't sound like it but it seems worth playing around with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed around an alternative structure, using the unwrap_variant_newtypes
extension,
#[derive(Deserialize, Serialize, PartialEq, Debug, Clone)]
#[serde(deny_unknown_fields)]
pub struct GrammarTest {
input: String,
/// A `None` value indicates expecting input to successfully parse.
expect: Option<TestOutput>,
}
#[derive(Deserialize, Serialize, PartialEq, Debug, Clone)]
pub enum TestOutput {
Fail,
AST(ASTRepr),
Errors(Vec<String>),
}
/// Not real parsing output, just testing round tripping of the struct serialization.
[
(input: "b"),
(input: "a", expect: AST("start", [("character", "a")])),
(input: "abc", expect: Fail),
(input: "abc", expect: Errors(["some error"])),
]
Even though it is a bit wordier, it feels like it might be an improvement, and it isn't that much wordier?
Edit: One other thing worth considering is only enabling the extensions by default for serialization,
which then requires the input to start with attributes enabling them for them to be enabled via deserialization, I don't really have a strong preference about that.
#![enable(implicit_some)]
#![enable(unwrap_variant_newtypes)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICS you only need an enum
here? SO I think one would have:
enum Test {
TestError { input: String, errors: Vec<String> },
TestFail {input: String},
TestSuccess { input: String, ast: Option<...> },
}
One could split TestSuccess
into two. I also wonder if Fail
and Error
even need splitting. They could probably be TestError { input: String, errors: Option<Vec<String>> }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, I'll give that a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9e2f213 is based off that idea, it does combine TestError
and TestFail
into just one, though now that I think about it, I kind of wonder if TestFail{input, errors}
, isn't preferrable if errors
is now optional.
I guess the thing to note is this always requires the outer enum variant where previously we could just say:
(input: "...")
now we must say either TestSuccess(input: "...")
or TestError(input: "...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so matching against serde probably isn't a good idea. Can we define our own, stable, output perhaps? Could it be as simple as the indented "sideways tree!" output that nimbleparse gives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree for for neither string matching, nor tree matching serde really works here :(
My fear with string matching is (it seems to me like) we are likely to end up with multiple levels of escaping required, for instance encoding into a string we'll have nested quotation marks (avoidable with raw strings), but the wildcards for the fuzzy matching will also need escaping when they can be present in the output.
I was really trying to/hoping to avoid that kind of escaping, so that the stuff embedded within the tests was representative of the stuff being matched.
We have managed to get rid of escaping quotes with raw strings, I'm not aware of anything analogous for wildcard matching though. So one of the benefits I perceive of tree matching is that the wildcards are embedded as e.g. _
nodes within the tree, rather than embedded as characters within the string, and that kind of matching can be done without escaping data within actual tree nodes.
Because I guess thats where most of my hesitance towards string matching lies, because we can't really just pick non-conflicting characters to use for wildcards since the grammars are arbitrary. So I would like to play with tree matching.
Another minor interesting thought is diffing it had always been eventually instead of saying AST(x) != AST(y)
for random potentially long AST, we could try and produce nice diff of the AST's, and while we can diff trees and text without wildcards/fuzzy matching. I've never really thought about diffing fuzzy matches.
Maybe that is just a case where you only produce diffs in the case where the "expected output" has no wildcards.
I guess I don't have any good intuition regarding diff production for the fuzzy matching case, other than it definitely feels like a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String matching definitely has challenges, but in terms of quoting, doesn't embedding tests in another language (in this case, Rust) always involve this in some way?
That said, maybe we can break this into two. One is the general "testing grammar inputs" problem. Second is what we're serialising / testing against. Is that split correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my only hesitation to splitting them up is that "testing grammar inputs", does lock us into an input format, let me use a slightly extended ron format which allows _
to be used as an example
The extension somewhat inspired by rust/ML _
wildcard but also the syntax used e.g. here
which is highly relevant https://compiler.club/pattern-matching-in-trees/
[TestSuccess(input: "a", ast: (_, [("character", "a")]))]
In theory though maybe we could add a Wildcard
or Wild
variant to ASTRepr
,
or whatever match/pattern structure we end up using for ast matching,
I believe this or something similar might be compatible with Ron but in some ways using _
feels natural
So perhaps I was wrong that serde won't work for tree matching and it may, but it lacks syntactic sugar.
[TestSuccess(input: "a", ast: (Wildcard, [("character", "a")]))]
Hopefully this kind of also explains why I was hoping we could perhaps avoid escaping wildcards, because we're still encoding a tree-like data structure, but wildcards are embedded within the structure as unitary values, rather than within any character representation.
Hope I'm not being difficult here, it just feels like it might be worth attempting, (or maybe dejagnu just spoiled string based testing for me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess some things worth pointing out, is the paper I had linked to pattern matching in trees, asks a slightly different problem than what we want, it's algorithm produces all the subtrees that match a pattern tree.
While what I believe we're after is more akin to the tree inclusion problem, which asks, given a pattern tree P and a tree T, whether we can produce P
from T
by deleting nodes in T
.
That is to say that the tree inclusion problem omits wildcards, and holes. While the pattern matching in trees admits them. If we slightly modify the rules of the tree inclusion problem, to restrict the number of deletions allowed... e.g. a hole might allow only one deletion, while a wildcard allows any number of deletions.
I suppose it's interesting for two reasons, because of the lack of wildcards in the original formulation, but also just the slightly different formulation of the problem.
I quickly started looking at wildcards, got distracted, and dropped the ball. Sorry! Let me take a step back and say what I think we've got here:
Have I got that right? Next question: if things don't match in (2), do we just get "fail" or is the aim to give a detailed diff of some sorts? |
No worries, this is I guess the kind of thing I'd prefer to spend time on and get it done right than get done quickly, What I currently have implemented 1 is almost right except we do not have a stable textual version of the AST. For point 2 then, Serialization is not used for the actual testing, but only to encode asts into tests. Most of my thoughts have aimed to preserve this basic structure. e.g.
Then for the wildcards/matching
That is to say we would The down sides being i'm not super familiar with the state of the art of these tree/pattern matching algorithms and in many cases they seem to be NP-hard, anyhow that is at least where my head had largely been with this patch. For your Next question: Once we have an encoded ast we can either do something dumb (Probably a good starting point) and just dump both AST's in the future we could do something smarter like produce a diff of the ASTs my hope was to have diffing added in a follow-up, but keep it in mind so that we can reasonably implement it in the future. I think for things with patterns the easiest thing to do is to output both the AST, and the pattern, maybe there is some sensible diff algorithm based on partial matches, but isn't clear to me. |
Quick dumb question, which might just be terminological: when you say "AST" I think I would say "parse tree"? In other words, we mean a tree that still has "all the stuff in it from the grammar", much of which a compiler might later choose to discard because it has no (subsequent) semantic effect. I think the fundamental question in terms of effort then becomes:
So because (2) is so limited, it makes us think "how do we introduce some sort of fuzziness" to matching so that not every test is invalidated as soon as G changes? There are various tree-based algorithms which could help with the fuzziness. I have seen someone implement those in the past, and it's doable, but not a 60 minute job by any means! Another question is "how do we want users to encode these tests?" Do we want them to express them directly in Rust? Or some sort of DSL? Or both? I'm actually starting to wonder if what we're slowly groping our way towards is a "grammar testing DSL". I don't know if any thing like this exists, but we can fairly easily imagine something. Let's imagine I have a calculator grammar G; I might then write multiple tests in our imaginary DSL (which, because I'm unimaginative, I'm thinking of in lang_tester-esque syntax) like this:
and:
What about fuzzy matching? Well, I've deliberately used nimbleparse-ish parse-tree-indented syntax above. I think that lends itself well to very basic fm-textual wildcarding:
where I could change You may notice, though, that I've still had to specify my indentation level very carefully in that test -- I think the easy way around this would be to say that indentation levels in this DSL are relative, not absolute. So when This is a long morning ramble, and I might have completely failed to understand subtle points here, so buyer beware :) |
Yeah, when I say "AST" I mean "parse tree", or "serializable generic parse tree", (I have a genetic issue with my hands that makes most of my fingers not actually bend, so I tend to fall back on names that are as few characters as possible, and shorten them by frequency of use, sorry)
Yeah indeed, let me also take a few steps back, and lay out some more basic thoughts, https://github.com/ratmice/crimson/blob/main/nimbleparse.toml One of the big limitations I was hoping to lift with this was the ability to encode multiple tests into a single file, using a vector of inputs. Simply because I found myself having a lot of really small input files, with silly names like "struct1.test", "struct2.test", which would be nice to have fewer files and combine into one file containing a vector of test inputs named "struct.test". From there the next logical step from that are also encoding the expected parse tree, and the expected error if any. For embedding inputs/outputs my concern here is more that we aren't making any format choices that preclude us from encoding the expected outputs in nice way. But it was not really my focus so much as being the primary mode of use.
For something like nimbleparse_lsp it was testing occurred at every character press (within reason) of the change to the test files, or the change to the grammar/lexer, (Essentially there was a timeout which was reached when you stopped editing for a few seconds it would generate a rtparser builder, and start feeding that rtparserbuilder the test input. I had also experimented with using So I would definitely prefer it not be in Rust, because that requires a whole compilation cycle which isn't really feasible at interactive speeds. Generating an RTParserBuilder is already pushing it for large grammars. What I would do was wrap the So I definitely was going for DSL rather than anything directly testing rust user actions, I think the latter is likely to need a significantly different design. (At the very least for nimbleparse_lsp which intends to be run in the browser, so we're trying to avoid porting the rust compiler to run in the browser and generate web-assembly which will subsequently get run in the browser) So my focus has definitely been on DSL over rust for expressing tests.
So I guess my difficulty with this syntax is that it isn't really explicit about newlines is that This explicit input is why I went with Ron for the first try, because
So it has both the power of raw string for longer multi-line examples, but also explicit quoting requiring escapes
Yeah here is where I start worrying about the case where the parse tree has embedded
It just seems to me like it'll end up with special cases where we need to escape to avoid considering output to be a wildcard which we can avoid with the tree matching in the case of something like:
Because wildcards are part of the tree format, rather than being a parameter to a terminal or non-terminal node. I'm still using ron with explicit types here, using the
Using a DSL we can replace
I guess my concern is that if we treat
Hopefully that goes into more detail at least about my whole hang-up on this whole approach. |
Sorry, you're the victim of my laziness and haste. I was assuming that we format terminals quoted (and escaped as necessary). Then we can allow the user to specify their own wildcard if we want, though I'm not even sure if |
Makes sense, Initially I guess my hesitation to doing a custom format was just that the rust raw string format isn't something we can easily replicate with Above and beyond that I'm not certain we can rely on lrpar without introducing any bootstrapping issues, I think since it is in lrlex ( Anyhow I think I have a good enough idea now that I should probably be able to start playing with implementations. |
Bootstrapping introduces all sorts of fun and games. I'm not sure I'd advocate for it TBH if there's even the slightest whiff of problems! For such a simple DSL it isn't too difficult to write a recursive descent parser (a la cfgrammar) if neccessary. |
Yeah, indeed. I also realized that even if we did avoid raw string context sensitivity we probably still couldn't use lrlex, just because of bootstrapping it would introduce a circular dependency. |
So coming back to this: I think I am still strongly in favour of the DSL route. The exact syntax we could endlessly bikeshed if we wanted to :) I think the crucial thing is (1) to allow multiple tests in a file (fortunately this is easy!) (2) some concept of wildcarding so that users can have tests that don't require full rewriting for every minor tweak of a grammar. |
Sorry I have been basically chewing on the yaml like syntax, I really am basically totally unfamiliar with yaml, I assume with those we would use the yaml-like vector syntax for (1) multi-tests
What I have a bit of difficulty wrapping my head around is we've now got multiple layers of whitespace sensitivity One of the things that bothers me a bit, is that it seems like it requires you to reindent the tree data to conform to the yaml syntax. Which could inhibit just copy/pasting tree data from nimbleparse to a test file. Anyhow one of my thoughts has been just doing a first pass, trying to reuse the |
I am not sure I would recommend yaml TBH. It's a very horrible syntax when it comes to newlines. I think we can just use our own (e.g. in something approaching lang_tester format). |
Ahh, sorry I thought the lang_tester format was using yaml |
Anyhow, I guess my question of how we want to handle the multi-test case files remains,
otherwise perhaps we could throw curly braces around it and use the normal vector style?
|
Curly brackets are fine with me! |
So, sorry I haven't gotten back to this one yet, I haven't been in a place where I could page it all back into my head, nor reasonably flush all the other stuff out. That's largely done now, and I just need to find the time/motivation to get started again on this (I always have trouble getting started). Sorry, appreciate the patience. |
Totally understood, and I for one am grateful for whatever time you're able to find, whenever that is! |
This is mostly just for collecting feedback, as this patch is fairly incomplete,
It implements the serialization stuff, and it adds a
-s
switch to nimbleparse to output the serialized format,It also adds some design docs which is a basic overview of what I intend to do, but this is necessarily incomplete,
and at times speculative based upon how I hope or imagine it might be made to work.
I don't think it is too far from actually deserializing tests and doing the pass/fail checking, but before going through the trouble of implementing that, it seemed worth doing a fairly complete proof of concept of the serialization format stuff.
The basic goals here have been to keep the syntax fairly noise free while being well typed,
I.e. instead of dynamically doing tests and vectors of tests, using a separate extension for the vector of tests.
and work well with both raw strings/and escaped strings, for grammars containing quotation marks, newlines, raw strings, etc.
While doing this, I noted that
test_files
currently only allows a singleglob
, but when we start dealing with multiple file extensions, we probably are going to need a way to specify multiple globs.