python `lit` function to support bool and byte vec #1152

jimexist · 2021-10-20T15:54:33Z

Which issue does this PR close?

Closes #

Rationale for this change

python lit function to support bool and byte vec

What changes are included in this PR?

Are there any user-facing changes?

python/src/functions.rs

houqp

LGTM other than the error message that @xudong963 mentioned 👍

pjmore

An alternative to manually trying to extract values is to create an enum with the allowed rust types and derive FromPyObject for it. I added in implementation for Literal for & [u8] for checking this locally.


#[derive(FromPyObject)]
enum PythonLiteral<'a>{
    UInt(u64),
    Int(i64),
    Float(f64),
    Str(&'a str),
    Boolean(bool),
    Binary(&'a [u8])
}

impl<'a> Literal for PythonLiteral<'a>{
    fn lit(&self) -> logical_plan::Expr {
        match self{
            PythonLiteral::Int(val) => val.lit(),
            PythonLiteral::UInt(val) => val.lit(),
            PythonLiteral::Float(val) => val.lit(),
            PythonLiteral::Str(val) => (*val).lit(),
            PythonLiteral::Boolean(val) => val.lit(),
            PythonLiteral::Binary(val) => (*val).lit(), 
        }
    }
}

/// Expression representing a constant value
#[pyfunction]
#[pyo3(text_signature = "(value)")]
fn lit(value: &PyAny) -> PyResult<expression::Expression> {
        let py_lit = value.extract::<PythonLiteral>()?;
        let expr = py_lit.lit();
        Ok(expression::Expression { expr })
}

This outputs error messages like

TypeError: failed to extract enum PythonLiteral ('Union[Int, UInt, Float, String, Boolean, Binary]')
       - variant Int (Int): 'list' object cannot be interpreted as an integer
       - variant UInt (UInt): 'list' object cannot be interpreted as an integer
       - variant Float (Float): must be real number, not list
       - variant String (String): 'list' object cannot be converted to 'PyString'
       - variant Boolean (Boolean): 'list' object cannot be converted to 'PyBool'
       - variant Binary (Binary): 'list' object cannot be converted to 'PyBytes'

jimexist · 2021-10-21T03:23:20Z

An alternative to manually trying to extract values is to create an enum with the allowed rust types and derive FromPyObject for it. I added in implementation for Literal for & [u8] for checking this locally.


#[derive(FromPyObject)]
enum PythonLiteral<'a>{
    UInt(u64),
    Int(i64),
    Float(f64),
    Str(&'a str),
    Boolean(bool),
    Binary(&'a [u8])
}

impl<'a> Literal for PythonLiteral<'a>{
    fn lit(&self) -> logical_plan::Expr {
        match self{
            PythonLiteral::Int(val) => val.lit(),
            PythonLiteral::UInt(val) => val.lit(),
            PythonLiteral::Float(val) => val.lit(),
            PythonLiteral::Str(val) => (*val).lit(),
            PythonLiteral::Boolean(val) => val.lit(),
            PythonLiteral::Binary(val) => (*val).lit(), 
        }
    }
}

/// Expression representing a constant value
#[pyfunction]
#[pyo3(text_signature = "(value)")]
fn lit(value: &PyAny) -> PyResult<expression::Expression> {
        let py_lit = value.extract::<PythonLiteral>()?;
        let expr = py_lit.lit();
        Ok(expression::Expression { expr })
}

This outputs error messages like

TypeError: failed to extract enum PythonLiteral ('Union[Int, UInt, Float, String, Boolean, Binary]')
       - variant Int (Int): 'list' object cannot be interpreted as an integer
       - variant UInt (UInt): 'list' object cannot be interpreted as an integer
       - variant Float (Float): must be real number, not list
       - variant String (String): 'list' object cannot be converted to 'PyString'
       - variant Boolean (Boolean): 'list' object cannot be converted to 'PyBool'
       - variant Binary (Binary): 'list' object cannot be converted to 'PyBytes'

thanks - it's a better solution so let me adapt it

houqp · 2021-10-21T04:33:30Z

nice tip @pjmore , definitely looks much cleaner :)

…che#1152) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports

* feat: add support for array_contains expression * test: add unit test for array_contains function * Removes unnecessary case expression for handling null values * chore: Move more expressions from core crate to spark-expr crate (apache#1152) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports * remove dead code (apache#1155) * fix: Spark 4.0-preview1 SPARK-47120 (apache#1156) ## Which issue does this PR close? Part of apache/datafusion-comet#372 and apache/datafusion-comet#551 ## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR fixes the new test SPARK-47120 added in Spark 4.0 ## How are these changes tested? tests enabled * chore: Move string kernels and expressions to spark-expr crate (apache#1164) * Move string kernels and expressions to spark-expr crate * remove unused hash kernel * remove unused dependencies * chore: Move remaining expressions to spark-expr crate + some minor refactoring (apache#1165) * move CheckOverflow to spark-expr crate * move NegativeExpr to spark-expr crate * move UnboundColumn to spark-expr crate * move ExpandExec from execution::datafusion::operators to execution::operators * refactoring to remove datafusion subpackage * update imports in benches * fix * fix * chore: Add ignored tests for reading complex types from Parquet (apache#1167) * Add ignored tests for reading structs from Parquet * add basic map test * add tests for Map and Array * feat: Add Spark-compatible implementation of SchemaAdapterFactory (apache#1169) * Add Spark-compatible SchemaAdapterFactory implementation * remove prototype code * fix * refactor * implement more cast logic * implement more cast logic * add basic test * improve test * cleanup * fmt * add support for casting unsigned int to signed int * clippy * address feedback * fix test * fix: Document enabling comet explain plan usage in Spark (4.0) (apache#1176) * test: enabling Spark tests with offHeap requirement (apache#1177) ## Which issue does this PR close? ## Rationale for this change After apache/datafusion-comet#1062 We have not running Spark tests for native execution ## What changes are included in this PR? Removed the off heap requirement for testing ## How are these changes tested? Bringing back Spark tests for native execution * feat: Improve shuffle metrics (second attempt) (apache#1175) * improve shuffle metrics * docs * more metrics * refactor * address feedback * fix: stddev_pop should not directly return 0.0 when count is 1.0 (apache#1184) * add test * fix * fix * fix * feat: Make native shuffle compression configurable and respect `spark.shuffle.compress` (apache#1185) * Make shuffle compression codec and level configurable * remove lz4 references * docs * update comment * clippy * fix benches * clippy * clippy * disable test for miri * remove lz4 reference from proto * minor: move shuffle classes from common to spark (apache#1193) * minor: refactor decodeBatches to make private in broadcast exchange (apache#1195) * minor: refactor prepare_output so that it does not require an ExecutionContext (apache#1194) * fix: fix missing explanation for then branch in case when (apache#1200) * minor: remove unused source files (apache#1202) * chore: Upgrade to DataFusion 44.0.0-rc2 (apache#1154) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports * save * save * save * remove unused imports * clippy * implement more hashers * implement Hash and PartialEq * implement Hash and PartialEq * implement Hash and PartialEq * benches * fix ScalarUDFImpl.return_type failure * exclude test from miri * ignore correct test * ignore another test * remove miri checks * use return_type_from_exprs * Revert "use return_type_from_exprs" This reverts commit febc1f1ec1301f9b359fc23ad6a117224fce35b7. * use DF main branch * hacky workaround for regression in ScalarUDFImpl.return_type * fix repo url * pin to revision * bump to latest rev * bump to latest DF rev * bump DF to rev 9f530dd * add Cargo.lock * bump DF version * no default features * Revert "remove miri checks" This reverts commit 4638fe3aa5501966cd5d8b53acf26c698b10b3c9. * Update pin to DataFusion e99e02b * update pin * Update Cargo.toml Bump to 44.0.0-rc2 * update cargo lock * revert miri change --------- Co-authored-by: Andrew Lamb <[email protected]> * update UT Signed-off-by: Dharan Aditya <[email protected]> * fix typo in UT Signed-off-by: Dharan Aditya <[email protected]> --------- Signed-off-by: Dharan Aditya <[email protected]> Co-authored-by: Andy Grove <[email protected]> Co-authored-by: KAZUYUKI TANIMURA <[email protected]> Co-authored-by: Parth Chandra <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Raz Luvaton <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>

python lit function to support bool and byte vec

b1a1653

jimexist requested review from alamb and houqp and removed request for houqp October 20, 2021 15:54

github-actions bot added datafusion labels Oct 20, 2021

xudong963 reviewed Oct 20, 2021

View reviewed changes

python/src/functions.rs Outdated Show resolved Hide resolved

houqp approved these changes Oct 20, 2021

View reviewed changes

pjmore reviewed Oct 20, 2021

View reviewed changes

update per comment

d793ac2

jimexist force-pushed the python-lit-bool-bytes branch from 779f806 to d793ac2 Compare October 21, 2021 04:23

houqp approved these changes Oct 21, 2021

View reviewed changes

houqp added this to the 6.0.0 milestone Oct 21, 2021

jimexist merged commit f455357 into apache:master Oct 21, 2021

jimexist deleted the python-lit-bool-bytes branch October 21, 2021 05:04

houqp added the enhancement New feature or request label Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

python `lit` function to support bool and byte vec #1152

python `lit` function to support bool and byte vec #1152

Uh oh!

jimexist commented Oct 20, 2021

Uh oh!

Uh oh!

houqp left a comment

Uh oh!

pjmore left a comment •

edited

Loading

Uh oh!

jimexist commented Oct 21, 2021

Uh oh!

houqp commented Oct 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

python lit function to support bool and byte vec #1152

python lit function to support bool and byte vec #1152

Uh oh!

Conversation

jimexist commented Oct 20, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Uh oh!

houqp left a comment

Choose a reason for hiding this comment

Uh oh!

pjmore left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimexist commented Oct 21, 2021

Uh oh!

houqp commented Oct 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

python `lit` function to support bool and byte vec #1152

python `lit` function to support bool and byte vec #1152

pjmore left a comment •

edited

Loading