-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Introduction
- Follow on to Nov 5. 2024: This week in DataFusion #13265
This ticket is a weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments on this ticket about things that I may have missed or you think should get wider attention by the community
Loosely inspired by https://this-week-in-rust.org/
Highlights
- 🚀 DataFusion 43 currently has the fastest results on ClickBench (blog coming soon)
- 📝 Completed [EPIC] Automatically generate all function documentation from code #12740. Thanks to @Omega359 and @jonathanc-n for driving the project to completion
- ⚒️ Completed [Epic] Unify
WindowFunctionInterface (remove built in list ofBuiltInWindowFunctions) #8709 -- thanks @jcsherin @jatin510 for helping push it over the line
DataFusion Related Reading List
- Comparing approaches to User Defined Functions in Apache DataFusion using Python from @timsaucer
- New [Concepts, Readings, Events¶](https://datafusion.apache.org/user-guide/concepts-readings-events.html) page
Recent Releases
- Release DataFusion 42.2.0 #13166
- Release DataFusion 43.0.0 #12470 (thanks @andygrove)
- Release sqlparser-rs version
0.52.0datafusion-sqlparser-rs#1423 (huge kudos to @iffyio for all the reviews)
Highlights from last week(s):
(I am sorry if I missed you -- please add a note to this ticket with anything you would like to highlight)
Performance
- @Rachelint completed Support vectorized append and compare for multi group by #12996 for another boost on ClickBench performance
- @jonathanc-n feat: Support faster multi-column grouping (
GroupColumn) forDate/Time/Timestamptypes #13457
🐛 🔨
- @findepi fixed
LIKEin various places Test LIKE with implicit\escape #13288, Expand LIKE simplification: coverNULLpattern/expression and constant #13260 - Also fixed Produce informative error on physical schema mismatch #13434 and Fix invalid swap for LeftMark nested loops join #13426
- @findepi also fixed Fix join on arrays of unhashable types and allow hash join on all types supported at run-time #13388
Features
- @goldmedal added Introduce
INFORMATION_SCHEMA.ROUTINEStable #13255 - Also @goldmedal added Introduce
information_schema.parameterstable #13341 - Custom type planner (also from @goldmedal ): Introduce
TypePlannerfor customizing type planning #13294 - @jonahgao completed support for
PREPAREandEXECUTE, in several PRs such as feat: basic support for executing prepared statements #13242 and refactor: move PREPARE/EXECUTE intoLogicalPlan::Statement#13311
Logical Types
- @jayzhan211 is reworking signatures with Use LogicalType for TypeSignature
NumericandString,Coercible#13240 and Support TypeSignature::Nullary #13354
SortMergeJoin correctness and stability fixes(@comphead):
- Move filtered SMJ Full filtered join out of
join_partialphase #13369 - Minor: SortMergeJoin small refactoring #13398
Aggregation testing coverage
- @jonathanc-n keep improving with PR after PR
Unparser (Plan --> String)
- @blaginin such as Allow aggregation without projection in
Unparser#13326 - @goldmedal Support unparsing Array plan to SQL string #13418
- @Sevenannn Fix duckdb & sqlite character_length scalar unparsing #13428 and Fix Binary & Binary View Unparsing #13427
- @phillipleblanc in Support Utf8View in Unparser
expr_to_sql#13462
Others
- @alamb started documenting how to use multiple threadpools, see Improve documentation (and ASCII art) about streaming execution, and thread pools #13423
Major Projects / Discussions under way
- [Epic] A Collection of Additional UTF8View support tickets #13504 with @timsaucer and @Omega359
- Release Minor DataFusion 43.1.0 release #13499 to try and get delta-rs updated
Looking to get more involved? Try code review!
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @ mentioning one of the committers.
Help wanted
Please feel leave your own comments on this ticket if you are looking for help
Community
- Weekly Call
- Slack/Discord: info links
Upcoming meetups:
- 2024 Dec 18 Chicago: https://lu.ma/eq5myc5i @adriangb @timsaucer
- DISCUSSION: January 2025 DataFusion Meetup in Amsterdam / CIDR 2025 #12988
- 2025 Jan 15 Boston
Background:
Previous update: