Skip to content

Conversation

@asenac
Copy link
Contributor

@asenac asenac commented Oct 26, 2021

Motivation

  • This PR adds a known-desirable feature. MaterializeInc/database-issues#2474

This PR is the first deliverable of the QGM effort (#8073). It contains the model definition as described in the corresponding design doc, plus a module that converts queries from Hir into QGM.

The support for the rest of Hir's operator will be added in separate PRs.

Another followup PRs will contain the rewrite engine for transforming query graph models as well as the module for lowering QGM to Mir.

The code added by this PR is not used by the actual product and it's only compiled in test config, since it is only exercised by unit tests.

This is a subset of #8775.

Tips for reviewer

This first commit contains the definition of the Query Graph Model as described here, the representation that will be used for performing SQL-to-SQL transformations. It also includes code for generating graphviz graphs from query graph models. This generator will be used extensively for debugging and testing.

The second commit contains the skeleton of the module that converts Hir into QGM. It also includes a basic data driven test dumping the resulting graph. TestCatalog needs to be extended so that actual tables and views can be injected into it to test queries referencing actual catalog items.

The third commit adds a README file basically explaining how to render the graphviz graphs that are the output of the tests, for visually validating the.

The forth commit adds support for Hir's Map, Project and Literal. This is a commit that can be used as a reference when extending the Hir->QGM generator.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any
    user-facing behavior changes.

@asenac
Copy link
Contributor Author

asenac commented Oct 26, 2021

The graphs in basic test included in this PR are:

basic-0000 dot
basic-0001 dot
basic-0002 dot
basic-0003 dot

@aalexandrov aalexandrov added the A-optimization Area: query optimization and transformation label Oct 26, 2021
Copy link
Contributor

@wangandi wangandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my feedback are just requests for more comments in the .rs files.

Comment on lines 387 to 395
pub limit: Option<Expr>,
pub offset: Option<Expr>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can limit and offset ever be anything beside literal uints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the standard, but at least Postgres supports anything that is a constant within the context of the query block, for example, lateral column references.

asenac=> select * from t1, lateral(select * from t2 limit t1.f1 offset t1.f2) foo;
 f1 | f2 | f1 | f2 
----+----+----+----
(0 rows)

asenac=>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is translating HirRelationExpr::TopK to QGM currently an open question according to #8073?

Hmm...neither HirRelationExpr nor MirRelationExpr supports limit or offset being anything besides literal uints. Thus, I lean towards having limit and offset be uints until 1) we decide to work on adding support for generalized limit/offset to the public interface or 2) we have a DSL for creating a QGM directly so we can test the generalized limit/offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because TopK is the combination QGM's Grouping and Select operator in a single operator, so I guess the only option we have is to add a TopK box in QGM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, there is an issue tracking adding support for non-literal limit/offset https://github.com/MaterializeInc/materialize/issues/7071

Copy link
Contributor

@aalexandrov aalexandrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks for starting this! I have some remarks but most of them are minor.

My main concern in terms around the (1) precision, (2) encapsulation, and (3) ergonomics the current QGM model encoding.

Regarding (1), the design doc lists quite a few constraints that are currently implicit. I think it will be better if we can model them directly as syntactic restrictions, even at the cost of a bit bloated model code .

Regarding (2), the model currently models vertex (box) and edge (quantifier) identity not through references, but through usize-typed IDs, and requires explicit lookup to follow those. This leaves the door open for dangling references and requires explicit GC calls to free up the model from unused vertices or objects.

Regarding (3), I hope that the need of a dedicated type QueryBox that factors out duplicated attributes can be removed with a lightweight rule-based macro.

I'll try to sketch something minimal along the above lines, but I'm sure that @asenac has already invested exploring similar ideas, so it will be great to have a summary of his experience here as well.

@asenac asenac force-pushed the qgm-model-hir branch 6 times, most recently from b2c71d4 to 2f3de5a Compare November 2, 2021 12:06
@asenac
Copy link
Contributor Author

asenac commented Nov 2, 2021

Regarding (1), I think there is a trade-off between enforcing constraint through syntax and providing a generic API that can be used to traverse and alter a graph generically. I think we could enforce some of those constraints through a validate method that, in debug mode is invoked before and after each rewrite rule and, in release mode, it gets called at least once, which would still allow us to write generic code independent from the box type.

@asenac asenac force-pushed the qgm-model-hir branch 2 times, most recently from 1e069cc to 5d3c6a7 Compare November 3, 2021 14:47
@aalexandrov aalexandrov requested review from a team and removed request for a team November 3, 2021 17:51
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left some minor comments.


/// A Query Graph Model instance represents a SQL query.
/// See: https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/20210707_qgm_sql_high_level_representation.md
#[derive(Debug)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derive Default.

for (correlated_q, quantifiers) in correlation_info.iter() {
for q in quantifiers.iter() {
self.new_line(&format!(
"Q{0} -> Q{1} [ label = \"correlation\", style = filled, color = red ]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use a different color since some folks can't see red as a distinct color.

@asenac asenac force-pushed the qgm-model-hir branch 7 times, most recently from 12c7ae9 to bb24199 Compare November 4, 2021 13:09
@asenac asenac force-pushed the qgm-model-hir branch 2 times, most recently from 1c11b61 to ccb5c9a Compare November 4, 2021 16:33
Copy link
Contributor

@wangandi wangandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments re: panics in the code.

BoxScalarExpr::Literal(row.clone(), col_type.clone())
}
HirScalarExpr::Column(c) => {
assert!(c.level == 0, "correlated columns not yet supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to assert_eq!(c.level, 0, "correlated columns ...") so that the level prints out if the equality fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is removed by the following patch in #8969, so it won't last very long

}
position -= ib.columns.len();
}
panic!("column not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the panics in this file return the column that was not found? Also, should they be errors since eventually QGM will be used for SQL resolution, so it may be necessary to throw an error in the case of a malformed SQL query.

Copy link
Contributor Author

@asenac asenac Nov 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll change this to be an error instead. But this panic is only reached in case of a malformed Hir, which the planner should never produce. Perhaps I should use unreachable! instead.

Copy link
Contributor

@aalexandrov aalexandrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@asenac asenac merged commit a6bb9a2 into MaterializeInc:main Nov 8, 2021
petrosagg added a commit to petrosagg/materialize that referenced this pull request Nov 10, 2021
main got broken due to merge skew from these two PRs:

MaterializeInc#8976
MaterializeInc#8801

Signed-off-by: Petros Angelatos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-optimization Area: query optimization and transformation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants