-
Notifications
You must be signed in to change notification settings - Fork 1.7k
some dependency removals and setup for refactor of FileScanConfig
#14543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @logan-keede -- this looks great. Thank you so much for making this small PRs.
I had a small suggestion but I think we could do it as a follow on PR too
| /// Check if repartition is supported | ||
| fn supports_repartition(&self, config: &FileScanConfig) -> bool { | ||
| !(config.file_compression_type.is_compressed() || config.new_lines_in_values) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better we did not provide a default implementation for supports_repartition
Assuming that a file source can repartition based on its compression and newlines seems very specific to CSV.
| /// Check if repartition is supported | |
| fn supports_repartition(&self, config: &FileScanConfig) -> bool { | |
| !(config.file_compression_type.is_compressed() || config.new_lines_in_values) | |
| } | |
| /// Check if repartition is supported | |
| fn supports_repartition(&self, config: &FileScanConfig) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to include it in my next PR, which should be tomorrow hopefully.
just for clarity do you want to implement something like this for other supported file format
fn supports_repartition(&self, config: &FileScanConfig) -> bool {
!(config.file_compression_type.is_compressed())
}
and original for csv. I do not have much context on this function, So i can not comment on whether newlines is useful only in csv or not. I will try to look into it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! Thanks -- I pushed a change to this branch -- hopefully that was ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(let's just hope I haven't broken the CI checks. 😆 😭 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it moving!
|
Thanks again @logan-keede |
Which issue does this PR close?
datafusion-catalog-listingout ofcore/src/datasource#14462datafusioncrate (datafusion/core) #14444.SessionStatewithSessionand progress towards movingFileFormatFactoryout ofdatasource#14517Rationale for this change
SessionStateneeds to be replaced withSessionindatasourceto split it out ofcore.get_file_format_factoryneeds to be implemented forSessiondatafusion/datafusion/core/src/datasource/file_format/mod.rs
Line 64 in 304488d
FileScanConfigtodatafusion-catalog-listing(or some other place, but out of core)What changes are included in this PR?
Refactoring,
SessionState->dyn Session+ some downcast to be removed later.AvroSourceinfile_format/mod.rs(thanks to @mertak-synnada for helping out)major: major movecoupling due to new merges seems to have made this a bit difficult than before, I am hoping we can go with this while I sort other parts out.FileScanConfigtodatafusion-catalog-listing(upcoming, we can probably do it in a separate PR)Are these changes tested?
Are there any user-facing changes?
There should not be