Skip to content

Conversation

@jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Jun 28, 2025

Which issue does this PR close?

Rationale for this change

This commit adds support for the Arrow Dictionary type in Substrait plans.

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

Add support for Arrow Dictionary type in Substrait plans

@github-actions github-actions bot added the substrait Changes to the substrait crate label Jun 28, 2025
@jkosh44 jkosh44 force-pushed the substrait-dictionary branch 2 times, most recently from a28e926 to 82327f4 Compare June 28, 2025 20:28
@jkosh44 jkosh44 changed the title Add support for Arrow Time types in Substrait Add support for Arrow Dictionary type in Substrait Jun 28, 2025
@jkosh44 jkosh44 force-pushed the substrait-dictionary branch from 82327f4 to dfac1e0 Compare June 28, 2025 20:54
@jkosh44 jkosh44 marked this pull request as ready for review June 28, 2025 20:54
@jkosh44
Copy link
Contributor Author

jkosh44 commented Jun 28, 2025

@gabotechs FYI

Comment on lines 56 to 59
/// Used for the arrow type [`DataType::Map`].
///
/// [`DataType::Map`]: datafusion::arrow::datatypes::DataType::Map
pub const DICTIONARY_CONTAINER_TYPE_VARIATION_REF: u32 = 3;
Copy link
Contributor

@gabotechs gabotechs Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I see the other *_CONTAINER_TYPE_VARIATION_REF are referring to the kind of variations you'd get for the different binary/string/list variants (e.g. Utf8, LargeUtf8 and Utf8View) which seem unrelated to Maps or Dictionaries.

I think the fact that the r#type::Map type chose to use the DEFAULT_CONTAINER_TYPE_VARIATION_REF for their current implementation is an unfortunate choice with no consequences, as the value is 0 anyways.

As the DataType::Dictionary or DataType::Map do not have the DataType::LargeDictionary, DataType::DictionaryView,DataType::LargeMap or DataType::MapView, I don't think they should be using the *_CONTAINER_TYPE_VARIATION_REF variations. What I'd do instead is to have a different set of variation refs for them, something like:

pub const DEFAULT_MAP_TYPE_VARIATION_REF: u32 = 0;
pub const DICTIONARY_MAP_TYPE_VARIATION_REF: u32 = 1;

what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I 100% agree, I've updated it use the consts you've suggested.

jkosh44 added 2 commits July 4, 2025 18:09
This commit adds support for the Arrow Dictionary type in Substrait
plans.

Resolves apache#16273
@jkosh44 jkosh44 force-pushed the substrait-dictionary branch from dfac1e0 to 35ff2ed Compare July 4, 2025 22:16
Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! this one LGTM, cc @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jkosh44 and @gabotechs

@alamb alamb merged commit d359d64 into apache:main Jul 7, 2025
27 checks passed
LiaCastaneda pushed a commit to DataDog/datafusion that referenced this pull request Jul 28, 2025
* Add support for Arrow Dictionary type in Substrait

This commit adds support for the Arrow Dictionary type in Substrait
plans.

Resolves apache#16273

* Add more specific type variation consts

(cherry picked from commit d359d64)
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Jul 28, 2025
* Add support for Arrow Dictionary type in Substrait (apache#16608)

* Add support for Arrow Dictionary type in Substrait

This commit adds support for the Arrow Dictionary type in Substrait
plans.

Resolves apache#16273

* Add more specific type variation consts

(cherry picked from commit d359d64)

* Remove DataDog Specific Workaround

---------

Co-authored-by: Joseph Koshakow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Unsupported cast type: Dictionary(Int32, Utf8)

3 participants