Skip to content

Conversation

@Michael-J-Ward
Copy link
Contributor

DRAFT

My first attempt was to convert with_column to a sugar API for df.select(vec![wildcard(), expr.alias(name)]), which mostly matches my mental model of what the function should do.

There are currently failing test cases surrounding ambiguous and/or "over-write an existing column" use cases. I suspect proper use of WildcardOptions will let me resolve those.

Which issue does this PR close?

Closes #12425.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

My mental model for `with_column` is sugar for `df.select([wildcard(), expr.alias(name)])`.

The bug test-case passes with this implementation, but a few other test cases fail.
Those test cases resolve around ambiguity in column names, which I *think* I can solve
by using `WildcardOptions`.
@github-actions github-actions bot added the core Core DataFusion crate label Sep 11, 2024
@Michael-J-Ward Michael-J-Ward changed the title With column bug fix: calling with_column twice and using a window function for the 2nd call returns an error Sep 11, 2024
@timsaucer
Copy link
Member

Alternative approach: #12431

The sugared version uses a wildcard to express the previous columns
The ambiguous projection now does not occur until the `physical` stage.

I assume there's a way to expand the wildcard and catch it during the `logical` stage, but I haven't found it yet.
…ing wildcardoptions

There's still a TODO because I'm not sure **how** to construct the WildcardOptions,
but this captures the semantic behavior.
@comphead
Copy link
Contributor

is this PR similar to #12431 ?

@Michael-J-Ward
Copy link
Contributor Author

@comphead yes. I started this draft late last night but hit a block, then Tim put up his working implementation this morning.

Purely for the benefit of my own understanding, could you take a look at my proposed impl and let me know if that captures the intended semantic behavior of the function?

I'm still building my understanding of the datafusion internals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Calling with_column twice generates an error when the second call uses a window function

3 participants