Skip to content

Conversation

amzn-choeric
Copy link
Contributor

Issue #, if available:
aws/sagemaker-scikit-learn-container#106

Description of changes:
Bumping PyArrow version to support the use of numpy versions >= 1.21.0, which includes a backward-incompatible change. The numpy version upgrade is also required to address certain security vulnerabilities. As this is a major version jump and consumers of this package has build scripts locked on older PyArrow version, I am also bumping the version of this package as well.

There was one compatibility issue observed with regards to a minor inheritance-related adjustment to a PyArrow class we use. This basically resulted in a reordering of how data members are stored in bytes and basically offset expected references resulting in segmentation faults. Though this was not immediately obvious as the failure actually happens much later down the line. I did look into alternatives to this somewhat brittle approach prior to finding a simple fix, but the alternatives appear to be slightly more involved and may require benchmark considerations.

Testing was done through building this within the scikit-learn container along with unit and integration tests.

@amzn-choeric amzn-choeric merged commit 588a28d into master Jan 27, 2023
@amzn-choeric amzn-choeric deleted the pyarrow branch January 27, 2023 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants