Skip to content

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Jul 23, 2025

Which issue does this PR close?

A followup on #1997
Closes #1993 .

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.75%. Comparing base (f09f8af) to head (660c8ba).
⚠️ Report is 340 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2040      +/-   ##
============================================
+ Coverage     56.12%   58.75%   +2.63%     
- Complexity      976     1253     +277     
============================================
  Files           119      137      +18     
  Lines         11743    13164    +1421     
  Branches       2251     2390     +139     
============================================
+ Hits           6591     7735    +1144     
- Misses         4012     4196     +184     
- Partials       1140     1233      +93     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich
Copy link
Contributor

mbutrovich commented Jul 23, 2025

This PR apache/datafusion#16290 changed the signature for some functions to return Utf8View. I can start bringing in changes to support Utf8View from my branch (https://github.com/mbutrovich/datafusion-comet/tree/german_style_strings) but this has a huge blast radius for Comet. I'll investigate further.

@comphead
Copy link
Contributor Author

Fixed windows and crypto tests, the ignore nulls correctness issue is weird. Ive localized it with 3 rows test

  test("first/last with ignore null") {
    val data = Range(0, 3).flatMap(n => Seq((n, 1), (n, 2))).toDF("a", "b")
    withTempDir { dir =>
      val filename = s"${dir.getAbsolutePath}/first_last_ignore_null.parquet"
      data.repartition(2).write.parquet(filename)
      withSQLConf(CometConf.COMET_BATCH_SIZE.key -> "1") {
        spark.read.parquet(filename).createOrReplaceTempView("t1")
        for (expr <- Seq("first", "last")) {
          // deterministic query that should return one non-null value per group
          val df = spark.sql(
            s"SELECT a, $expr(IF(b==1,null,b)) IGNORE NULLS FROM t1 GROUP BY a ORDER BY a")
          checkSparkAnswerAndOperator(df)
        }
      }
    }
  }

It has something with Batch size and number of input files.

@andygrove
Copy link
Member

I ran TPC-H benchmarks locally and saw no difference in performance compared to main branch

@comphead comphead marked this pull request as ready for review August 1, 2025 01:27
@comphead
Copy link
Contributor Author

comphead commented Aug 1, 2025

@mbutrovich @andygrove please have a look

@comphead comphead requested a review from andygrove August 1, 2025 01:27

let sort = Arc::new(
SortExec::new(LexOrdering::new(exprs?), Arc::clone(&child_copied))
SortExec::new(LexOrdering::new(exprs?).unwrap(), Arc::clone(&child_copied))
Copy link
Member

@andygrove andygrove Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for other reviewers: the unwrap() here is on an Option. This seems safe because it would only be None if the vector of sort expressions were empty.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @comphead

@mbutrovich mbutrovich merged commit d688655 into apache:main Aug 1, 2025
91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to DataFusion 49.0.0
4 participants