-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: add sql
feature to make sql planning optional
#17332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've adjusted the description because I think that also closes #17258 🥳 |
rust-version: stable | ||
- name: Run datafusion-common tests | ||
run: cargo test --profile ci -p datafusion-common --features=pyarrow | ||
run: cargo test --profile ci -p datafusion-common --features=pyarrow,sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extend the CI to also verify that ALL crates at least cargo check
with and without sql support? This YAML file has a few examples/listings of feature tests with cargo check
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the existing coverage and it seems like the current pattern is to test features in CI here:
I took the liberty of adding the appropriate entry for the sql
feature in this PR and documeting the feature
I was playing with this PR this morning to see if Comet could use it, and it looks like |
❤️ ❤️ ❤️ ❤️ ❤️ |
that would be great |
I did a quick check and with the latest push + comet I no longer have any |
I'll give this a shot, thanks @timsaucer! |
…plan in an effort to remove the hash functions from the final wasm blob. Probably need to revert or put under a feature flag.
c5852b7
to
7066b0e
Compare
I had to rebase, so you might want to pull again |
sql
feature sql
feature to make sql planning optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @timsaucer
Sorry for the delay in this PR. I merged up from main to resolve some conflicts and added a new test and docs. Hopefully it is good to go now.
Thanks again!
rust-version: stable | ||
- name: Run datafusion-common tests | ||
run: cargo test --profile ci -p datafusion-common --features=pyarrow | ||
run: cargo test --profile ci -p datafusion-common --features=pyarrow,sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the existing coverage and it seems like the current pattern is to test features in CI here:
I took the liberty of adding the appropriate entry for the sql
feature in this PR and documeting the feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a big win from the final Comet binary size, but I think any encapsulation improvements are welcome. Thanks @timsaucer!
Right - I don't think you will see a binary difference, especially in release mode. But hopefully you will get some improvements in build time and number of dependencies. |
🎉 |
} | ||
|
||
#[cfg(not(feature = "sql"))] | ||
pub(crate) fn parse_identifiers(s: &str) -> Result<Vec<String>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have two different, divergent implementations of parse_identifiers and parse_identifiers_normalized. This leads to unpredictable runtime behavior when integrating different components using DataFusion (one using sql
feature and one not).
If the new implementation is equivalent to sqlparser-based, it should be the only implementation.
If it is not, it should not exist. All the APIs using it should be guarded by sql
feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)] | ||
#[cfg(not(feature = "sql"))] | ||
pub struct ExceptSelectItem { | ||
pub first_element: Ident, | ||
pub additional_elements: Vec<Ident>, | ||
} | ||
#[cfg(not(feature = "sql"))] | ||
impl Display for ExceptSelectItem { | ||
fn fmt(&self, f: &mut Formatter) -> fmt::Result { | ||
write!(f, "EXCEPT ")?; | ||
if self.additional_elements.is_empty() { | ||
write!(f, "({})", self.first_element)?; | ||
} else { | ||
write!( | ||
f, | ||
"({}, {})", | ||
self.first_element, | ||
display_comma_separated(&self.additional_elements) | ||
)?; | ||
} | ||
Ok(()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expr crate should become un-dependent from sqlparser::ast types, so these copies should be the only types we use.
Is there a follow-up issue planned to cleanup these duplicate types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which issue does this PR close?
sqlparser
dependency optional #17258Rationale for this change
For some users, adding in sqlparser greatly increases build times for their projects even though they are not using sql. This feature makes
sql
an optional dependency to improve build times and reduce the number of dependencies.What changes are included in this PR?
Adds in a
sql
feature in thecore
crate and makessqlparser
an optional dependency. The rest of the changes are to account for the fact that some of the structs insqlparser
are used throughout the repository.Are these changes tested?
Ideally this is effectively a no-op change. By enabling
sql
(or simply using the default features) there is no change and all tests should run as before. I have also run all tests with the--no-default-features
flag.Are there any user-facing changes?
There is one change that may be impactful to some users. By adding in the
sql
feature, if users are not using default features of the datafusion core crate, they will need to addsql
to theirCargo.toml
.Additional context
When building on a mac m4, running a
cargo clean
and then checking time and build size these are my local results for just building the core datafusion library:Net results:
The absolute time numbers aren't so bad when you build just this library by itself, but as part of larger projects it has a compounding effect.