Feat/parameterized sql queries #964

timsaucer · 2024-12-06T02:00:21Z

Which issue does this PR close?

Closes #513

This is built on top of #1267
I will rebase once that PR merges.

Rationale for this change

Users would like to use DataFrames as a parameter inside an SQL query. With this change, you can do the following:

from datafusion import SessionContext
ctx = SessionContext()
df_customer = ctx.read_parquet("examples/tpch/data/customer.parquet")
ctx.sql("select c_custkey, c_name from $df", df=df_customer)

This change allows for string replacement of any placeholder in the SQL statement. For most python objects this is calling str() on them. For DataFrame objects we register a temporary view and replace the parameter with the generated name of the view.

What changes are included in this PR?

Add param_values to allow prepare statement style replacement of scalar values
Add token parsing of placeholders to perform string replacement.
Parses string converted arguments into tokens to perform SQL validation of final object
Verifies generated strings contain exactly one statement to ensure malicious code is not injected
Add user documentation
Added unit tests

Are there any user-facing changes?

Existing code is not impacted, but new parameters are added.

Example

From the updated user documentation:

MrPowers · 2024-12-06T14:45:05Z

This user interface looks nice 😎

matko

I see that whenever a file is queried now, it'll also be registered as a table. This does have some impact, as it means these tables are now also returned whenever a context is queried for all registered tables. Meaning any sort of visualization or automation based on that would behave differently depending on whether certain queries were run.

I personally find this very surprising. I would not expect read_parquet to secretly register_parquet as that is not what this library did before, nor is it what the rust library does.

Are you sure this is fine and won't affect people? At the very least, shouldn't there be a way to filter these out easily, to cleanly differentiate between auto-registered and explicitly registered tables?

I very much see the value of the parameterized sql feature, but this seems like a very crude way of doing it.

src/functions.rs

python/datafusion/context.py

paleolimbot · 2025-09-18T18:04:31Z

python/datafusion/context.py

        skip_metadata: bool = True,
        schema: pyarrow.Schema | None = None,
        file_sort_order: list[list[Expr]] | None = None,
+        table_name: str | None = None,


For what it's worth, I've seen this parameterized as read_parquet().to_view() (e.g., Ibis, DuckDB, SedonaDB)

timsaucer · 2025-10-05T19:38:20Z

I'm going to update this PR to use $param instead of {param} but I don't like my current approach because it will not work with dataframes that contain custom table providers. We need a more robust solution, which I am investigating.

…tion

…orm string replacement via parsed tokens

timsaucer mentioned this pull request Jan 26, 2025

Why uuid is only assigned for create_dataframe, not assigned for read_xxx #996

Open

matko reviewed Jan 28, 2025

View reviewed changes

src/functions.rs Show resolved Hide resolved

python/datafusion/context.py Outdated Show resolved Hide resolved

paleolimbot reviewed Sep 18, 2025

View reviewed changes

timsaucer mentioned this pull request Oct 1, 2025

Is it possible to pass query parameters? (:param or ?) #513

Open

timsaucer marked this pull request as draft October 11, 2025 14:33

timsaucer added 5 commits October 12, 2025 09:13

Add temporary view option for into_view

94b6f55

Intermediate work on parameterizing queries

eab2793

Reworking to do token parsing of sql query instead of string manipula…

4d3d602

…tion

Switching to explicit param_values or named parameters that will perf…

f10d958

…orm string replacement via parsed tokens

Add additional unit tests for parameterized queries

b6bf566

timsaucer force-pushed the feat/parameterized-sql-queries branch from 0f2dccf to b6bf566 Compare October 12, 2025 13:45

timsaucer added 3 commits October 12, 2025 10:34

merge conflict

0734413

license text

5ecb1ba

Add documentation

90e0cd8

timsaucer marked this pull request as ready for review October 13, 2025 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/parameterized sql queries #964

Feat/parameterized sql queries #964

Uh oh!

timsaucer commented Dec 6, 2024 •

edited

Loading

Uh oh!

MrPowers commented Dec 6, 2024

Uh oh!

matko left a comment

Uh oh!

Uh oh!

Uh oh!

paleolimbot Sep 18, 2025

Uh oh!

timsaucer commented Oct 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feat/parameterized sql queries #964

Are you sure you want to change the base?

Feat/parameterized sql queries #964

Uh oh!

Conversation

timsaucer commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Example

Uh oh!

MrPowers commented Dec 6, 2024

Uh oh!

matko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paleolimbot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

timsaucer commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

timsaucer commented Dec 6, 2024 •

edited

Loading

timsaucer commented Oct 5, 2025 •

edited

Loading