PyArrow Version Update & v0.8.0 Bump #31
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
aws/sagemaker-scikit-learn-container#106
Description of changes:
Bumping PyArrow version to support the use of numpy versions >= 1.21.0, which includes a backward-incompatible change. The numpy version upgrade is also required to address certain security vulnerabilities. As this is a major version jump and consumers of this package has build scripts locked on older PyArrow version, I am also bumping the version of this package as well.
There was one compatibility issue observed with regards to a minor inheritance-related adjustment to a PyArrow class we use. This basically resulted in a reordering of how data members are stored in bytes and basically offset expected references resulting in segmentation faults. Though this was not immediately obvious as the failure actually happens much later down the line. I did look into alternatives to this somewhat brittle approach prior to finding a simple fix, but the alternatives appear to be slightly more involved and may require benchmark considerations.
Testing was done through building this within the scikit-learn container along with unit and integration tests.