Make tests for `simplify` and `Simplifer` consistent #1376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

alamb merged 6 commits into apache:master from alamb:alamb/simplify_simplify2

Dec 7, 2021

Contributor

alamb commented Nov 28, 2021 •

edited

Loading

Which issue does this PR close?

~~Builds on #1374 and #1375~~

Re #1160 (more sophisticated expression simplification)

Rationale for this change

There is redundant code in simplify and Simplifier which makes it hard to know what is simplified and what is not. It is also hard to work with the tests in this file

What changes are included in this PR?

Refactor the tests for simplify and Simplifier to be a consistent style (so that when I combine the logic of the two it will be clear that behavior has not changed)

Are there any user-facing changes?

No

github-actions bot added the datafusion label

alamb changed the title ~~(WIP) Move ConstEvaluator into simplity_expressions.rs~~ (WIP) Move ConstEvaluator into simplify_expressions.rs

alamb changed the title ~~(WIP) Move ConstEvaluator into simplify_expressions.rs~~ (WIP) Make tests for simplify and Simplifer consistent

alamb force-pushed the alamb/simplify_simplify2 branch from 22be605 to a029d37 Compare

December 2, 2021 22:45


          Begin to combine simplify and Simplifer: consolidate tests

370f8ca

alamb force-pushed the alamb/simplify_simplify2 branch from a029d37 to 370f8ca Compare

December 2, 2021 22:52


          Merge remote-tracking branch 'apache/master' into alamb/simplify_simp…

8d607f5

…lify2

alamb marked this pull request as ready for review

December 4, 2021 17:37

alamb changed the title ~~(WIP) Make tests for simplify and Simplifer consistent~~ Make tests for simplify and Simplifer consistent

alamb requested review from andygrove, houqp and jimexist

December 4, 2021 17:39

Contributor Author

alamb commented Dec 4, 2021

This PR is now ready for review 🙏

alamb mentioned this pull request

Fix bugs with nullability during rewrites: Combine simplify and Simplifier #1401

Merged

capkurmagati reviewed

View reviewed changes

Contributor

capkurmagati left a comment

Thanks! Looks much cleaner.
Left several nit comments.

datafusion/src/optimizer/simplify_expressions.rs Outdated

    
                  }

                      Ok(())

                  fn lit_true() -> Expr {

Contributor

capkurmagati Dec 6, 2021

I wonder you intended to use lit_true() and lit(true) in different situations.

Contributor Author

alamb Dec 6, 2021

No this is a great point. I will change to use lit(true) and lit(false) -- that is much better I think

datafusion/src/optimizer/simplify_expressions.rs

    
                      //

                      // Make sure c1 column to be used in tests is not boolean type

                      assert_eq!(col("c1").get_type(&schema)?, DataType::Utf8);

                      assert_eq!(col("c1").get_type(&schema).unwrap(), DataType::Utf8);

Contributor

capkurmagati Dec 6, 2021

Shall we add a test for nonboolean type for logical plan? Something like

  #[test]
    fn test_simplity_skip_nonboolean_type() {
        let table_scan = test_table_scan();
        let plan = LogicalPlanBuilder::from(table_scan)
            .filter(col("d").eq(lit(false)).not())
            .unwrap()
            .project(vec![col("a")])
            .unwrap()
            .build()
            .unwrap();

        let expected = "\
        Projection: #test.a\
        \n  Filter: NOT #test.d = Boolean(false)\
        \n    TableScan: test projection=None";

        assert_optimized_plan_eq(&plan, expected);
    }

Contributor Author

alamb Dec 6, 2021

So I think that plan would error at runtime (because "d" is defined to be DataTypre::UInt32 and if you try to do UInt32 != Bool you get an error:

❯ select 1 != true;
Plan("'Int64 != Boolean' can't be evaluated because there isn't a common type to coerce the types to")

Though it does work if we have an explicit CAST:

❯ select 1 != cast(true as int);
+------------------------------------------+
| Int64(1) != CAST(Boolean(true) AS Int32) |
+------------------------------------------+
| false                                    |
+------------------------------------------+
1 row in set. Query took 0.003 seconds.

I guess I was thinking the plan tests are for ensuring that simplify is being called correctly on the expressions in the plan, rather than testing the various Expr corner cases of the simplification logic itself

Contributor

capkurmagati Dec 7, 2021

I guess I was thinking the plan tests are for ensuring that simplify is being called correctly on the expressions in the plan, rather than testing the various Expr corner cases of the simplification logic itself

Thanks for the explanation. Makes sense to me.

datafusion/src/optimizer/simplify_expressions.rs

    
                          col("c2").not(),

                      );

                  // TODO rename to simplify

                  fn do_simplify(expr: Expr) -> Expr {

Contributor

capkurmagati Dec 6, 2021

👍 and looks ready to rename?

Contributor Author

alamb Dec 6, 2021

I actually have plans to rename it in a follow on PR (b/c I want to remove simpify) -- you can see what I have in mind here: #1401

xudong963 reviewed

View reviewed changes

Member

xudong963 left a comment

Nice work ❤️ @alamb. I reviewed it carefully and left some of my suggestions. The overall code is definitely cleaner and makes sense to refactor！

datafusion/src/optimizer/simplify_expressions.rs Outdated Show resolved Hide resolved

datafusion/src/optimizer/simplify_expressions.rs

    
                      let expr_a = binary_expr(col("c"), Operator::Multiply, lit(1));

                      let expr_b = binary_expr(lit(1), Operator::Multiply, col("c"));

                      let expected = col("c");

                  fn test_simplify_multiply_by_one() {

Member

xudong963 Dec 6, 2021

Maybe you can also add test_simlify_multiply_by_zero -> 0 ？

Contributor Author

alamb Dec 6, 2021

I tried that, and it turns out that rewrite rule is not implemented LOL 👍

diff --git a/datafusion/src/optimizer/simplify_expressions.rs b/datafusion/src/optimizer/simplify_expressions.rs
index 3c4af838b..68d45c446 100644
--- a/datafusion/src/optimizer/simplify_expressions.rs
+++ b/datafusion/src/optimizer/simplify_expressions.rs
@@ -851,6 +851,16 @@ mod tests {
         assert_eq!(simplify(&expr_b), expected);
     }
 
+    #[test]
+    fn test_simplify_multiply_by_zero() {
+        let expr_a = binary_expr(col("c2"), Operator::Multiply, lit(0));
+        let expr_b = binary_expr(lit(0), Operator::Multiply, col("c2"));
+        let expected = lit(0);
+
+        assert_eq!(simplify(&expr_a), expected);
+        assert_eq!(simplify(&expr_b), expected);
+    }
+
     #[test]
     fn test_simplify_divide_by_one() {
         let expr = binary_expr(col("c2"), Operator::Divide, lit(1));

Resulted in

---- optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero stdout ----
thread 'optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero' panicked at 'assertion failed: `(left == right)`
  left: `#c2 * Int32(0)`,
 right: `Int32(0)`', datafusion/src/optimizer/simplify_expressions.rs:860:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Contributor Author

alamb Dec 6, 2021

Tracking in #1406

datafusion/src/optimizer/simplify_expressions.rs

    
                  #[test]

                  fn test_simplify_negated_and() -> Result<()> {

                  fn test_simplify_negated_and() {

Member

xudong963 Dec 6, 2021

Why can't be simplified to false?

Contributor Author

alamb Dec 6, 2021

There is no theoretical reason. I think this test is to ensure a particular corner case of how the rewrite rule for A and (B and A) is implemented.

Contributor Author

alamb Dec 6, 2021

Tracking in #1406

datafusion/src/optimizer/simplify_expressions.rs Outdated

    
                  fn test_simplify_and_and_false() -> Result<()> {

                      let expr =

                          binary_expr(lit(ScalarValue::Boolean(None)), Operator::And, lit(false));

                  fn test_simplify_and_and_false() {

Member

xudong963 Dec 6, 2021

Maybe it should be test_simplify_null_and_false?

Contributor Author

alamb Dec 6, 2021

Good 👀 👍

xudong963 reviewed

View reviewed changes

datafusion/src/optimizer/simplify_expressions.rs

    
                          .project(vec![col("a")])?

                          .build()?;

                          .filter(col("b").not_eq(lit(true)))

                          .unwrap()

Member

xudong963 Dec 6, 2021

I noticed you replaced ? with unwrap. Why, just curious. Both ways make sense to me.

Contributor Author

alamb Dec 6, 2021

It helps me more easily find the source of the problem -- by using unwap() whenever an error happens, you can set RUST_BACKTRACE=1 and know exactly at what site the problem happened. If the tests return Error, often the Error does not have sufficient context to know exactly where it was generated.


          Update datafusion/src/optimizer/simplify_expressions.rs

19d8e45

Co-authored-by: Carlos <[email protected]>

alamb mentioned this pull request

Add additional simplification rules #1406

Closed

alamb added 3 commits

December 6, 2021 18:25


          fix test name

81cfea4


          Merge branch 'alamb/simplify_simplify2' of github.com:alamb/arrow-dat…

bc3993c

…afusion into alamb/simplify_simplify2


          Use lit(true) and lit(false)

7b31d94

xudong963 approved these changes

View reviewed changes

Member

xudong963 left a comment

LGTM 👍

Contributor Author

alamb commented Dec 7, 2021

Thanks for the careful review @xudong963 and @capkurmagati

alamb merged commit 32e24d2 into apache:master

alamb deleted the alamb/simplify_simplify2 branch

December 7, 2021 22:25

alamb mentioned this pull request

Introduce ProjectionMask To Allow Nested Projection Pushdown #2581

Open

unkloud pushed a commit to unkloud/datafusion that referenced this pull request


          fix: disable checking for uint_8 and uint_16 if complex type readers …

57a4dca

…are enabled (apache#1376)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

xudong963 xudong963 approved these changes

andygrove Awaiting requested review from andygrove

houqp Awaiting requested review from houqp

jimexist Awaiting requested review from jimexist

+1 more reviewer

capkurmagati capkurmagati left review comments

Labels

None yet