- 
                Notifications
    You must be signed in to change notification settings 
- Fork 18
Blog: Apache DataFusion is now the fastest single node engine for querying Apache Parquet files #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rying Apache Parquet files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @Weijun-H @dharanad @Lordworms, @goldmedal @wiedld, @tlm365 @my-vegetable-has-exploded @doupache, @jayzhan211, @xinlifoobar, @Kev1n8
@tshauck, @austin362667, @demetribu, @PsiACE, @devanbenz, @thinh2, @Omega359 @XiangpengHao, @ariesdevil, @tustvold , @RinChanNOWW, @a10y @Dandandan @viirya  @itsjunetime,  @eejbyfeldt and @Rachelint
@korowa @pmcgleenon
I mentioned you and your work in this blog post -- thank you again 🙏
For names, I copy/pasted whatever was publically available on your github profiles. If you would like different names / attributions (or none at all) please propose a change 🙏
Also, if you remember others who should be on this list, please let me know
| a challenge!), and we have subsequently rallied to steadily improve the | ||
| performance release on release as shown in Figure 2. | ||
|  | ||
| [Mehmet Ozan Kabak]: https://www.linkedin.com/in/mehmet-ozan-kabak/) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @ozankabak
        
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
      Co-authored-by: Bruce Ritchie <[email protected]> Co-authored-by: Patrick McGleenon <[email protected]>
| Thats greatest news. Congrats! | 
| For the ClickBench run for DataFusion what is the  | 
| 
 I don't think the scripts change the default setting -- the scripts used are here: https://github.com/ClickHouse/ClickBench/tree/main/datafusion Here is the PR to update for 43.0.0: ClickHouse/ClickBench#251 | 
        
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
      Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Patrick McGleenon <[email protected]> Co-authored-by: Jay Zhan <[email protected]>
…ite into alamb/clickbench_blog
        
          
                _posts/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
      …bench.md Co-authored-by: Tai Le Manh <[email protected]>
|  | ||
| # Rallying The Community around Performance | ||
|  | ||
| In July, 2024 [Mehmet Ozan Kabak], CEO of [Synnada], [called on the community to | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this central to this post? I don't mean to "discredit" the call by any means, but I'm not sure the work described in this post was driven by this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly was inspired by the comment to help me focus where I spent my time reviewing PRs and helping push them through, though it is a good point that this may imply it motivated others as well, when I don't really know what did.
Perhaps we could rephrase the motivation with something like this?
"Performance has long been a focus for DataFusion: one of the core benefits of DataFusion is its core performance, which both excites contributors and attracts users. There seems to have been a renewed focus on performance recently, including a call in July 2024 from Mehmet ....?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased in 2507e67
…ite into alamb/clickbench_blog
| Thank you to everyone who reviewed this PR. I plan to merge / publish it later today unless there are any other comments | 
| Amazing work Andrew! You and all of the DataFusion contributors should be incredibly proud of this accomplishment. | 
| Let's get this published to the world | 

Let's celebrate the accomplishment of getting to the top of the ClickBench leaderboard