-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6655: [Python] Filesystem bindings for S3 #5423
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5423 +/- ##
=========================================
Coverage ? 80.84%
=========================================
Files ? 90
Lines ? 8854
Branches ? 0
=========================================
Hits ? 7158
Misses ? 1342
Partials ? 354 Continue to review full report at Codecov.
|
@pitrou PTAL, I still need to fix the AppVeyor builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see the testing simplified. Also I'm not sure we want to depend on the minio
client for testing. Actual checks are already done on the C++ side.
python/pyarrow/tests/test_fs.py
Outdated
with pytest.raises(ValueError): | ||
aaa_stat.size | ||
assert mtime_almost_equal(aaa_stat.mtime, fs.mtime(_aaa)) | ||
# type is inconsistent base_name has a trailing slas for 'aaa' and 'aaa/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou if the directory's path ends with a slash then the file type is File, otherwise it is Directory, is this the expected outcome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which situation exactly? For guidance, you can take a look at the C++ tests, they should tell you what is expected.
@@ -340,6 +340,12 @@ class ObjectInputFile : public io::RandomAccessFile { | |||
RETURN_NOT_OK(CheckClosed()); | |||
RETURN_NOT_OK(CheckPosition(position, "read")); | |||
|
|||
nbytes = std::min(nbytes, content_length_ - position); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou this is the fix for stream.read()
issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! And thanks for the tests too.
@pitrou I have a question about S3Options above. Follow-up issues: |
@kszucs Sorry, what is the S3Options question already? Would be nice if you could create those JIRAs and point them to me :-) |
@pitrou Created the JIRAs. That question I'm referring to is the keywords vs options object one. It is hidden by default in the GitHub thread, here is the link #5423 (comment) |
@kszucs I agree that using kwargs directly on S3FileSystem may look reasonable with the current settings. But S3 may/will need to grow more settings in the future. For example see the settings supported by boto3. Once there are many settings it is nicer to have property access on an options object, like we already have on the CSV and JSON options objects. |
@pitrou now that has convinced me :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting good! Just a couple questions below.
@@ -164,6 +168,10 @@ export PYARROW_BUILD_TYPE=$ARROW_BUILD_TYPE | |||
export PYARROW_WITH_PARQUET=1 | |||
export PYARROW_WITH_PLASMA=1 | |||
export PYARROW_WITH_ORC=1 | |||
export PYARROW_WITH_S3=1 | |||
if [ "$ARROW_TRAVIS_S3" == "1" ]; then | |||
export PYARROW_WITH_S3=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I suppose there's a leftover here or above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARROW_TRAVIS_S3 is still optional, so we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but look at the diff :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean, have I removed ARROW_TRAVIS_S3 or PYARROW_WITH_S3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this:
export PYARROW_WITH_S3=1
if [ "$ARROW_TRAVIS_S3" == "1" ]; then
export PYARROW_WITH_S3=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:), fixing it
@@ -340,6 +340,12 @@ class ObjectInputFile : public io::RandomAccessFile { | |||
RETURN_NOT_OK(CheckClosed()); | |||
RETURN_NOT_OK(CheckPosition(position, "read")); | |||
|
|||
nbytes = std::min(nbytes, content_length_ - position); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! And thanks for the tests too.
python/pyarrow/_s3fs.pyx
Outdated
|
||
Parameters | ||
---------- | ||
access_key: str, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update the docstring now :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
@pitrou updated |
depends on #5423 Closes #5484 from kszucs/s3-conda and squashes the following commits: 7a3c67c <Krisztián Szűcs> test s3fs import ff6eb22 <Krisztián Szűcs> Include s3 support in the conda packages Authored-by: Krisztián Szűcs <[email protected]> Signed-off-by: Wes McKinney <[email protected]>
- Add support for S3FileSystem in the python bindings. - Fixed issue with reading all the content of an S3 object - Introduce `minio_server` fixture for parametrized testing of all filesystem implementations - Fixed s3fs parquet test and updated it to use minio_server fixture Closes #5423 from kszucs/s3 and squashes the following commits: 384c960 <Krisztián Szűcs> Resolve review comments 73e6625 <Krisztián Szűcs> S3Options 98bd91a <Krisztián Szűcs> remove commented tests db89859 <Krisztián Szűcs> rename to s3fs 4478458 <Krisztián Szűcs> fix read() issue c1df10b <Krisztián Szűcs> initialization in first use 192ab65 <Krisztián Szűcs> flake8 f70f9fb <Krisztián Szűcs> remove minio-client dependency d399643 <Krisztián Szűcs> simplify test suite fee57a9 <Krisztián Szűcs> remove accidentally committed files 751cfd4 <Krisztián Szűcs> resolve a couple of review comments; enum workaround 45436f7 <Krisztián Szűcs> cython flake8 38dcb88 <Krisztián Szűcs> rat c541b3e <Krisztián Szűcs> comment left 098048a <Krisztián Szűcs> more compat 00340ed <Krisztián Szűcs> fixture error handling 2be25ce <Krisztián Szűcs> auto initialize s3 on import 88e0c9f <Krisztián Szűcs> py2 compat 8585a60 <Krisztián Szűcs> py2 compat d372287 <Krisztián Szűcs> install minio in the conda-toolchain build 041cad4 <Krisztián Szűcs> executable flag fb0f281 <Krisztián Szűcs> travis minio install script 8cbe0ee <Krisztián Szűcs> travis osx 72e56a6 <Krisztián Szűcs> enable S3 in travis python builds 7800c75 <Krisztián Szűcs> appveyor flag 7daf566 <Krisztián Szűcs> fix syntax error in travis script 68eb591 <Krisztián Szűcs> enable PYARROW_WITH_S3 on appveyor 2cb19d1 <Krisztián Szűcs> conditional import of test dependencies efa05d2 <Krisztián Szűcs> use minio for dask.s3fs test too f25ae5a <Krisztián Szűcs> travis 9042c7e <Krisztián Szűcs> use S3FS_DIR 9ce7180 <Krisztián Szűcs> cmake format; fix orc cimport 45a2a17 <Krisztián Szűcs> docstrings c0b9162 <Krisztián Szűcs> test requirements; flake8 44aedfd <Krisztián Szűcs> stat test a343950 <Krisztián Szűcs> testing suite 1551b52 <Krisztián Szűcs> wip dd41d21 <Krisztián Szűcs> imports 5200af1 <Krisztián Szűcs> s3 filesystem bindings Authored-by: Krisztián Szűcs <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
depends on #5423 Closes #5484 from kszucs/s3-conda and squashes the following commits: 7a3c67c <Krisztián Szűcs> test s3fs import ff6eb22 <Krisztián Szűcs> Include s3 support in the conda packages Authored-by: Krisztián Szűcs <[email protected]> Signed-off-by: Wes McKinney <[email protected]>
minio_server
fixture for parametrized testing of all filesystem implementations