Skip to content

Commit 2eb908d

Browse files
committed
Update README.md
1 parent c42d323 commit 2eb908d

File tree

1 file changed

+4
-13
lines changed

1 file changed

+4
-13
lines changed

Tools/cases_generator/README.md

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ What's currently here:
44

55
- lexer.py: lexer for C, originally written by Mark Shannon
66
- plexer.py: OO interface on top of lexer.py; main class: `PLexer`
7+
- parser.py: Parser for instruction definition DSL; main class `Parser`
78
- `generate_cases.py`: driver script to read `Python/bytecodes.c` and
89
write `Python/generated_cases.c.h`
910

@@ -18,16 +19,16 @@ The DSL for the instruction definitions in `Python/bytecodes.c` is described
1819
Note that there is some dummy C code at the top and bottom of the file
1920
to fool text editors like VS Code into believing this is valid C code.
2021

21-
## A bit about the parsers
22+
## A bit about the parser
2223

23-
The parser classes use use a pretty standard recursive descent scheme,
24+
The parser class uses a pretty standard recursive descent scheme,
2425
but with unlimited backtracking.
2526
The `PLexer` class tokenizes the entire input before parsing starts.
2627
We do not run the C preprocessor.
2728
Each parsing method returns either an AST node (a `Node` instance)
2829
or `None`, or raises `SyntaxError` (showing the error in the C source).
2930

30-
All parsing methods are decorated with `@contextual`, which automatically
31+
Most parsing methods are decorated with `@contextual`, which automatically
3132
resets the tokenizer input position when `None` is returned.
3233
Parsing methods may also raise `SyntaxError`, which is irrecoverable.
3334
When a parsing method returns `None`, it is possible that after backtracking
@@ -36,13 +37,3 @@ a different parsing method returns a valid AST.
3637
Neither the lexer nor the parsers are complete or fully correct.
3738
Most known issues are tersely indicated by `# TODO:` comments.
3839
We plan to fix issues as they become relevant.
39-
40-
A particular problem is that we don't know which identifiers are typedefs.
41-
This makes some parts of the C grammar ambiguous, for example,
42-
`(x)*y` could be a cast to type `x` of the pointer dereferencing `*y`,
43-
or it could mean to compute `x` times `y`.
44-
Similarly, `(x)(y)` could cast `y` to type `x`,
45-
or call function `x` with argument `y`.
46-
Our parser currently interprets such cases as casts.
47-
We will solve this when we need to understand expressions in more detail
48-
(for example, by providing a list of known typedefs generated by hand).

0 commit comments

Comments
 (0)