-
Notifications
You must be signed in to change notification settings - Fork 482
Skeleton for QGM generation from Hir #8801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wangandi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my feedback are just requests for more comments in the .rs files.
src/sql/src/query_model/mod.rs
Outdated
| pub limit: Option<Expr>, | ||
| pub offset: Option<Expr>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can limit and offset ever be anything beside literal uints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the standard, but at least Postgres supports anything that is a constant within the context of the query block, for example, lateral column references.
asenac=> select * from t1, lateral(select * from t2 limit t1.f1 offset t1.f2) foo;
f1 | f2 | f1 | f2
----+----+----+----
(0 rows)
asenac=>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is translating HirRelationExpr::TopK to QGM currently an open question according to #8073?
Hmm...neither HirRelationExpr nor MirRelationExpr supports limit or offset being anything besides literal uints. Thus, I lean towards having limit and offset be uints until 1) we decide to work on adding support for generalized limit/offset to the public interface or 2) we have a DSL for creating a QGM directly so we can test the generalized limit/offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because TopK is the combination QGM's Grouping and Select operator in a single operator, so I guess the only option we have is to add a TopK box in QGM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, there is an issue tracking adding support for non-literal limit/offset https://github.com/MaterializeInc/materialize/issues/7071
aalexandrov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks for starting this! I have some remarks but most of them are minor.
My main concern in terms around the (1) precision, (2) encapsulation, and (3) ergonomics the current QGM model encoding.
Regarding (1), the design doc lists quite a few constraints that are currently implicit. I think it will be better if we can model them directly as syntactic restrictions, even at the cost of a bit bloated model code .
Regarding (2), the model currently models vertex (box) and edge (quantifier) identity not through references, but through usize-typed IDs, and requires explicit lookup to follow those. This leaves the door open for dangling references and requires explicit GC calls to free up the model from unused vertices or objects.
Regarding (3), I hope that the need of a dedicated type QueryBox that factors out duplicated attributes can be removed with a lightweight rule-based macro.
I'll try to sketch something minimal along the above lines, but I'm sure that @asenac has already invested exploring similar ideas, so it will be great to have a summary of his experience here as well.
b2c71d4 to
2f3de5a
Compare
|
Regarding (1), I think there is a trade-off between enforcing constraint through syntax and providing a generic API that can be used to traverse and alter a graph generically. I think we could enforce some of those constraints through a |
1e069cc to
5d3c6a7
Compare
antiguru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left some minor comments.
|
|
||
| /// A Query Graph Model instance represents a SQL query. | ||
| /// See: https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/20210707_qgm_sql_high_level_representation.md | ||
| #[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Derive Default.
| for (correlated_q, quantifiers) in correlation_info.iter() { | ||
| for q in quantifiers.iter() { | ||
| self.new_line(&format!( | ||
| "Q{0} -> Q{1} [ label = \"correlation\", style = filled, color = red ]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Use a different color since some folks can't see red as a distinct color.
12c7ae9 to
bb24199
Compare
1c11b61 to
ccb5c9a
Compare
wangandi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments re: panics in the code.
| BoxScalarExpr::Literal(row.clone(), col_type.clone()) | ||
| } | ||
| HirScalarExpr::Column(c) => { | ||
| assert!(c.level == 0, "correlated columns not yet supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to assert_eq!(c.level, 0, "correlated columns ...") so that the level prints out if the equality fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert is removed by the following patch in #8969, so it won't last very long
| } | ||
| position -= ib.columns.len(); | ||
| } | ||
| panic!("column not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the panics in this file return the column that was not found? Also, should they be errors since eventually QGM will be used for SQL resolution, so it may be necessary to throw an error in the case of a malformed SQL query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll change this to be an error instead. But this panic is only reached in case of a malformed Hir, which the planner should never produce. Perhaps I should use unreachable! instead.
aalexandrov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
main got broken due to merge skew from these two PRs: MaterializeInc#8976 MaterializeInc#8801 Signed-off-by: Petros Angelatos <[email protected]>




Motivation
This PR is the first deliverable of the QGM effort (#8073). It contains the model definition as described in the corresponding design doc, plus a module that converts queries from Hir into QGM.
The support for the rest of Hir's operator will be added in separate PRs.
Another followup PRs will contain the rewrite engine for transforming query graph models as well as the module for lowering QGM to Mir.
The code added by this PR is not used by the actual product and it's only compiled in test config, since it is only exercised by unit tests.
This is a subset of #8775.
Tips for reviewer
This first commit contains the definition of the Query Graph Model as described here, the representation that will be used for performing SQL-to-SQL transformations. It also includes code for generating graphviz graphs from query graph models. This generator will be used extensively for debugging and testing.
The second commit contains the skeleton of the module that converts Hir into QGM. It also includes a basic data driven test dumping the resulting graph.
TestCatalogneeds to be extended so that actual tables and views can be injected into it to test queries referencing actual catalog items.The third commit adds a README file basically explaining how to render the graphviz graphs that are the output of the tests, for visually validating the.
The forth commit adds support for Hir's Map, Project and Literal. This is a commit that can be used as a reference when extending the Hir->QGM generator.
Checklist
user-facing behavior changes.