-
Notifications
You must be signed in to change notification settings - Fork 129
Feat/parameterized sql queries #964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This user interface looks nice 😎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that whenever a file is queried now, it'll also be registered as a table. This does have some impact, as it means these tables are now also returned whenever a context is queried for all registered tables. Meaning any sort of visualization or automation based on that would behave differently depending on whether certain queries were run.
I personally find this very surprising. I would not expect read_parquet
to secretly register_parquet
as that is not what this library did before, nor is it what the rust library does.
Are you sure this is fine and won't affect people? At the very least, shouldn't there be a way to filter these out easily, to cleanly differentiate between auto-registered and explicitly registered tables?
I very much see the value of the parameterized sql feature, but this seems like a very crude way of doing it.
python/datafusion/context.py
Outdated
skip_metadata: bool = True, | ||
schema: pyarrow.Schema | None = None, | ||
file_sort_order: list[list[Expr]] | None = None, | ||
table_name: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I've seen this parameterized as read_parquet().to_view()
(e.g., Ibis, DuckDB, SedonaDB)
I'm going to update this PR to use |
…orm string replacement via parsed tokens
0f2dccf
to
b6bf566
Compare
Which issue does this PR close?
Closes #513
This is built on top of #1267
I will rebase once that PR merges.
Rationale for this change
Users would like to use DataFrames as a parameter inside an SQL query. With this change, you can do the following:
This change allows for string replacement of any placeholder in the SQL statement. For most python objects this is calling
str()
on them. For DataFrame objects we register a temporary view and replace the parameter with the generated name of the view.What changes are included in this PR?
param_values
to allowprepare
statement style replacement of scalar valuesAre there any user-facing changes?
Existing code is not impacted, but new parameters are added.
Example
From the updated user documentation: