Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Update pyarrow dependency #1926

Closed
johnkerl opened this issue Nov 17, 2023 · 12 comments
Closed

[python] Update pyarrow dependency #1926

johnkerl opened this issue Nov 17, 2023 · 12 comments

Comments

@johnkerl
Copy link
Member

johnkerl commented Nov 17, 2023

This is for the pyarrow CVE. @bkmartinjr has verified we don't use the vulnerable code path, but, it's good optics for us to update.

@johnkerl johnkerl self-assigned this Nov 17, 2023
@johnkerl johnkerl changed the title [python] Update pyarrow dependency to != 14.0.0 [python] Update pyarrow dependency Nov 17, 2023
@johnkerl
Copy link
Member Author

johnkerl commented Nov 21, 2023

Here are the issues:

  • @bkmartinjr has already verified that TileDB-SOMA does not use the codepath affected, so we are not vulnerable. Nonetheless it's good optics for us to update.
  • Affected range is pyarrow from 0.14.0 to 14.0.0
  • pyarrow 14.0.1 is available, but does not work with Python 3.7
  • We are still supporting Python 3.7 for a few months now -- see also [python] Drop 3.7 support #1790
  • My experience with version-dependent conditional imports and typeguard has been very negative -- it's just a mess
  • @thetorpedodog pointed out there is a hotfix approach

Here is my proposal:

  • At present we use only the hotfix approach -- import pyarrow_hotfix -- and we don't touch any pyarrow version pins
  • Mark all affected code spots with TODO linking to this issue
  • Once we drop Python 3.7 support ([python] Drop 3.7 support #1790) then:

cc @thetorpedodog @bkmartinjr @jdblischak @Shelnutt2

@thetorpedodog
Copy link
Contributor

sounds good. there also is no rush to remove the hotfix; if pyarrow is already secure it does nothing, so in the worst case it’s an extra import and some light processing.

@bkmartinjr
Copy link
Member

Proposal works for me.

Alt, which you are free to ignore: given what I've read about the actual fix (i.e., in code we don't use), it would also be OK to:

  • update the dependency using pins (i.e., >=14.0.1) for all python variants > 3.7
  • leave 3.7 as is

@ihnorton
Copy link
Member

+1 from me

@johnkerl
Copy link
Member Author

johnkerl commented Nov 30, 2023

Additional non-joy: we found on #1936 that with pyarrow >= 13.0 we have

import pyarrow
import tiledb
tiledb.open('s3://anything/at/all')
Fatal error condition occurred in /Users/runner/work/crossbow/crossbow/vcpkg/buildtrees/aws-c-common/src/v0.8.9-fed0b55d0f.clean/source/allocator.c:121: allocator != ((void*)0)

on MacOS either x86_64 or arm64.

This should be followed up on.

Stack trace: https://gist.github.com/johnkerl/eb5874e94d0cc4768114faadcb989e83

@johnkerl
Copy link
Member Author

johnkerl commented Dec 1, 2023

This should be followed up on.

Here is the follow-up:

@johnkerl
Copy link
Member Author

johnkerl commented Dec 1, 2023

Given the above, namely:

I conclude that we'll need to simply stick with pyarrow_hotfix long-term.

All relevant PRs on this repo have been implemented and merged.

@johnkerl johnkerl closed this as completed Dec 1, 2023
@johnkerl johnkerl reopened this Dec 1, 2023
@johnkerl
Copy link
Member Author

johnkerl commented Dec 1, 2023

Update: there is a mitigation in aws-sdk-cpp 1.11.179 (core currently uses 1.11.160)

so we can handle this, but, only after a core bump.

@johnkerl
Copy link
Member Author

johnkerl commented Dec 1, 2023

No longer blocks 1.6

@ivirshup
Copy link
Collaborator

ivirshup commented Jun 6, 2024

pyarrow >= 13.0 on MacOS results in a fatal error with import pyarrow and import tiledb

I can't reproduce this behavior from pyarrow>=14.0.2.

Nvm it's dependent on import order. So importing tiledb before pyarrow is fine, but the reverse errors.

In light of that, could upper version bound here get removed? There are no wheel available for pyarrow versions this old for python 3.12, which is giving some CI grief over on cellxgene_census: https://github.com/chanzuckerberg/cellxgene-census/actions/runs/9405819588/job/25907864054?pr=1189

@johnkerl
Copy link
Member Author

Please keep this issue open until these are resolved:

@johnkerl
Copy link
Member Author

Resolved as of https://github.com/single-cell-data/TileDB-SOMA/releases/tag/1.14.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants