Skip to content

Conversation

@JE-Chen
Copy link
Contributor

@JE-Chen JE-Chen commented Sep 25, 2024

Did you mean to change #1169 like this PR? @kevinjqliu
I think we need to edit the test as well.
Becasue test no longer raise ns TypeError.

JE-Chen and others added 2 commits September 25, 2024 20:09
Add ns TimeType
- Remove ns warning
- Add ns downcast
…eError

Remove ns raise TypeError test, because it will no longer raise a TypeError
E       Failed: DID NOT RAISE <class 'TypeError'>
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becasue test no longer raise ns TypeError.

The test should raise for ns if the option to downcast is not set. Which tests are you seeing this problem?

Comment on lines 1078 to 1084
elif primitive.unit == "ns":
if self._downcast_ns_timestamp_to_us:
logger.warning("Iceberg does not yet support 'ns' timestamp precision. Downcasting to 'us'.")
else:
raise TypeError(
"Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still necessary because iceberg doesnt support ns. this workaround help us downcast a column with ns to us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add back this code snippet.

elif pa.types.is_date32(primitive):
return DateType()
elif isinstance(primitive, pa.Time64Type) and primitive.unit == "us":
elif isinstance(primitive, pa.Time64Type) and primitive.unit in ["us", "ns"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need to check for precision here, since its us and ns are supported.

Although we need to add code to downcast ns to us, similar to below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to below

Thank you for your response, but could you explain it more clearly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need to check for precision here, since its us and ns are supported.

Time64Type only supports us and ns, so the extra check for primitive.unit in ["us", "ns"] is redundant. Although, it'll be helpful if another precision was added in the future.

Although we need to add code to downcast ns to us, similar to below

Because Iceberg doesn't support ns, it has to downcast ns to us, or errors.
This behavior is controlled by https://github.com/apache/iceberg-python/pull/1206/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdL1078-L1084

Which we also need to include here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

        elif pa.types.is_timestamp(primitive):
            primitive = cast(pa.TimestampType, primitive)
            if primitive.unit in ("s", "ms", "us"):
                # Supported types, will be upcast automatically to 'us'
                pass
            elif primitive.unit == "ns":
                if self._downcast_ns_timestamp_to_us:
                    primitive = cast(pa.TimestampType, "us")
                else:
                    raise TypeError(
                        "Iceberg does not yet support 'ns' timestamp precision. Use 'downcast-ns-timestamp-to-us-on-write' configuration property to automatically downcast 'ns' to 'us' on write."
                    )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, do the same for Time64Type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _downcast_ns_timestamp_to_us part for sure. Im not sure if we need to do cast though

Copy link
Contributor Author

@JE-Chen JE-Chen Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your response, But I am confused.
Did you mean that I need a new class variable _downcast_ns_time64type_to_us and a new condition to check?

if self._downcast_ns_time64type_to_us:
    # what should be done here?

Or should I add some code under the original condition:

if self._downcast_ns_timestamp_to_us:
    # Add something here. 

Copy link
Contributor

@kevinjqliu kevinjqliu Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this function is converting pyarrow types to Iceberg types.

In the example of timestamp type (pa.types.is_timestamp(primitive)), we check if the unit (or precision) is ns. If it is and the downcast-ns-timestamp-to-us-on-write is set, we want to downcast the unit from ns to us. If downcast-ns-timestamp-to-us-on-write is not set, raise an error because ns is not supported.

If its not, we don't do anything special and just return the Iceberg type.

We want to do the same thing with the Time64Type type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially check if the unit is ns. if it is, check downcast-ns-timestamp-to-us-on-write. if the option is set, we can downcast from ns to us.
If its not, raise an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the new commit. Thanks.

@kevinjqliu
Copy link
Contributor

@sungwy can you take a look and see if the downcast logic makes sense here?

JE-Chen and others added 2 commits October 1, 2024 14:18
Return TImeType not raise error here
ruff reformat
@kevinjqliu
Copy link
Contributor

hey @JE-Chen, #1169 was already assigned to @zaryab-ali and I didn't realize this PR was not started by the assignee.
Can you collaborate on #1215 to resolve this issue together?
Sorry for the confusion here

@JE-Chen JE-Chen closed this by deleting the head repository Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants