- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
          Make tests for simplify and Simplifer consistent
          #1376
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
simplify and Simplifer consistent
      22be605    to
    a029d37      
    Compare
  
    a029d37    to
    370f8ca      
    Compare
  
    simplify and Simplifer consistentsimplify and Simplifer consistent
      | This PR is now ready for review 🙏 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks much cleaner.
Left several nit comments.
| } | ||
|  | ||
| Ok(()) | ||
| fn lit_true() -> Expr { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder you intended to use lit_true() and lit(true) in different situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is a great point. I will change to use lit(true) and lit(false) -- that is much better I think
| // | ||
| // Make sure c1 column to be used in tests is not boolean type | ||
| assert_eq!(col("c1").get_type(&schema)?, DataType::Utf8); | ||
| assert_eq!(col("c1").get_type(&schema).unwrap(), DataType::Utf8); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a test for nonboolean type for logical plan? Something like
  #[test]
    fn test_simplity_skip_nonboolean_type() {
        let table_scan = test_table_scan();
        let plan = LogicalPlanBuilder::from(table_scan)
            .filter(col("d").eq(lit(false)).not())
            .unwrap()
            .project(vec![col("a")])
            .unwrap()
            .build()
            .unwrap();
        let expected = "\
        Projection: #test.a\
        \n  Filter: NOT #test.d = Boolean(false)\
        \n    TableScan: test projection=None";
        assert_optimized_plan_eq(&plan, expected);
    }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think that plan would error at runtime (because "d" is defined to be DataTypre::UInt32 and if you try to do UInt32 != Bool you get an error:
❯ select 1 != true;
Plan("'Int64 != Boolean' can't be evaluated because there isn't a common type to coerce the types to")
Though it does work if we have an explicit CAST:
❯ select 1 != cast(true as int);
+------------------------------------------+
| Int64(1) != CAST(Boolean(true) AS Int32) |
+------------------------------------------+
| false                                    |
+------------------------------------------+
1 row in set. Query took 0.003 seconds.
I guess I was thinking the plan tests are for ensuring that simplify is being called correctly on the expressions in the plan, rather than testing the various Expr corner cases of the simplification logic itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was thinking the plan tests are for ensuring that simplify is being called correctly on the expressions in the plan, rather than testing the various Expr corner cases of the simplification logic itself
Thanks for the explanation. Makes sense to me.
| col("c2").not(), | ||
| ); | ||
| // TODO rename to simplify | ||
| fn do_simplify(expr: Expr) -> Expr { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 and looks ready to rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have plans to rename it in a follow on PR (b/c I want to remove simpify) -- you can see what I have in mind here: #1401
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work ❤️ @alamb. I reviewed it carefully and left some of my suggestions. The overall code is definitely cleaner and makes sense to refactor!
| let expr_a = binary_expr(col("c"), Operator::Multiply, lit(1)); | ||
| let expr_b = binary_expr(lit(1), Operator::Multiply, col("c")); | ||
| let expected = col("c"); | ||
| fn test_simplify_multiply_by_one() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can also add test_simlify_multiply_by_zero -> 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, and it turns out that rewrite rule is not implemented LOL 👍
diff --git a/datafusion/src/optimizer/simplify_expressions.rs b/datafusion/src/optimizer/simplify_expressions.rs
index 3c4af838b..68d45c446 100644
--- a/datafusion/src/optimizer/simplify_expressions.rs
+++ b/datafusion/src/optimizer/simplify_expressions.rs
@@ -851,6 +851,16 @@ mod tests {
         assert_eq!(simplify(&expr_b), expected);
     }
 
+    #[test]
+    fn test_simplify_multiply_by_zero() {
+        let expr_a = binary_expr(col("c2"), Operator::Multiply, lit(0));
+        let expr_b = binary_expr(lit(0), Operator::Multiply, col("c2"));
+        let expected = lit(0);
+
+        assert_eq!(simplify(&expr_a), expected);
+        assert_eq!(simplify(&expr_b), expected);
+    }
+
     #[test]
     fn test_simplify_divide_by_one() {
         let expr = binary_expr(col("c2"), Operator::Divide, lit(1));Resulted in
---- optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero stdout ----
thread 'optimizer::simplify_expressions::tests::test_simplify_multiply_by_zero' panicked at 'assertion failed: `(left == right)`
  left: `#c2 * Int32(0)`,
 right: `Int32(0)`', datafusion/src/optimizer/simplify_expressions.rs:860:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking in #1406
|  | ||
| #[test] | ||
| fn test_simplify_negated_and() -> Result<()> { | ||
| fn test_simplify_negated_and() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't be simplified to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no theoretical reason. I think this test is to ensure a particular corner case of how the rewrite rule for A and (B and A) is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking in #1406
| fn test_simplify_and_and_false() -> Result<()> { | ||
| let expr = | ||
| binary_expr(lit(ScalarValue::Boolean(None)), Operator::And, lit(false)); | ||
| fn test_simplify_and_and_false() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be test_simplify_null_and_false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good 👀 👍
| .project(vec![col("a")])? | ||
| .build()?; | ||
| .filter(col("b").not_eq(lit(true))) | ||
| .unwrap() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed you replaced ? with unwrap. Why,  just curious. Both ways make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps me more easily find the source of the problem -- by using unwap() whenever an error happens, you can set RUST_BACKTRACE=1 and know exactly at what site the problem happened. If the tests return Error, often the Error does not have sufficient context to know exactly where it was generated.
Co-authored-by: Carlos <[email protected]>
…afusion into alamb/simplify_simplify2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
| Thanks for the careful review @xudong963 and @capkurmagati | 
Which issue does this PR close?
Builds on #1374 and #1375Re #1160 (more sophisticated expression simplification)
Rationale for this change
There is redundant code in
simplifyandSimplifierwhich makes it hard to know what is simplified and what is not. It is also hard to work with the tests in this fileWhat changes are included in this PR?
simplifyandSimplifierto be a consistent style (so that when I combine the logic of the two it will be clear that behavior has not changed)Are there any user-facing changes?
No