-
Notifications
You must be signed in to change notification settings - Fork 389
PR #1169 #1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR #1169 #1206
Conversation
Add ns TimeType - Remove ns warning - Add ns downcast
…eError Remove ns raise TypeError test, because it will no longer raise a TypeError E Failed: DID NOT RAISE <class 'TypeError'>
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Becasue test no longer raise ns TypeError.
The test should raise for ns if the option to downcast is not set. Which tests are you seeing this problem?
| elif primitive.unit == "ns": | ||
| if self._downcast_ns_timestamp_to_us: | ||
| logger.warning("Iceberg does not yet support 'ns' timestamp precision. Downcasting to 'us'.") | ||
| else: | ||
| raise TypeError( | ||
| "Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still necessary because iceberg doesnt support ns. this workaround help us downcast a column with ns to us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add back this code snippet.
pyiceberg/io/pyarrow.py
Outdated
| elif pa.types.is_date32(primitive): | ||
| return DateType() | ||
| elif isinstance(primitive, pa.Time64Type) and primitive.unit == "us": | ||
| elif isinstance(primitive, pa.Time64Type) and primitive.unit in ["us", "ns"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need to check for precision here, since its us and ns are supported.
Although we need to add code to downcast ns to us, similar to below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to below
Thank you for your response, but could you explain it more clearly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need to check for precision here, since its us and ns are supported.
Time64Type only supports us and ns, so the extra check for primitive.unit in ["us", "ns"] is redundant. Although, it'll be helpful if another precision was added in the future.
Although we need to add code to downcast ns to us, similar to below
Because Iceberg doesn't support ns, it has to downcast ns to us, or errors.
This behavior is controlled by https://github.com/apache/iceberg-python/pull/1206/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdL1078-L1084
Which we also need to include here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this?
elif pa.types.is_timestamp(primitive):
primitive = cast(pa.TimestampType, primitive)
if primitive.unit in ("s", "ms", "us"):
# Supported types, will be upcast automatically to 'us'
pass
elif primitive.unit == "ns":
if self._downcast_ns_timestamp_to_us:
primitive = cast(pa.TimestampType, "us")
else:
raise TypeError(
"Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write."
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, do the same for Time64Type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the _downcast_ns_timestamp_to_us part for sure. Im not sure if we need to do cast though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your response, But I am confused.
Did you mean that I need a new class variable _downcast_ns_time64type_to_us and a new condition to check?
if self._downcast_ns_time64type_to_us:
# what should be done here?
Or should I add some code under the original condition:
if self._downcast_ns_timestamp_to_us:
# Add something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function is converting pyarrow types to Iceberg types.
In the example of timestamp type (pa.types.is_timestamp(primitive)), we check if the unit (or precision) is ns. If it is and the downcast-ns-timestamp-to-us-on-write is set, we want to downcast the unit from ns to us. If downcast-ns-timestamp-to-us-on-write is not set, raise an error because ns is not supported.
If its not, we don't do anything special and just return the Iceberg type.
We want to do the same thing with the Time64Type type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
essentially check if the unit is ns. if it is, check downcast-ns-timestamp-to-us-on-write. if the option is set, we can downcast from ns to us.
If its not, raise an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the new commit. Thanks.
…se a TypeError" This reverts commit b3e83cf.
Add downcast condition on Time64Type
|
@sungwy can you take a look and see if the downcast logic makes sense here? |
Return TImeType not raise error here
ruff reformat
|
hey @JE-Chen, #1169 was already assigned to @zaryab-ali and I didn't realize this PR was not started by the assignee. |
Did you mean to change #1169 like this PR? @kevinjqliu
I think we need to edit the test as well.
Becasue test no longer raise ns TypeError.