Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add endpoint url parameter #34

Merged
merged 4 commits into from
Jan 18, 2023
Merged

add endpoint url parameter #34

merged 4 commits into from
Jan 18, 2023

Conversation

sattler
Copy link
Contributor

@sattler sattler commented Jan 17, 2023

I added a parameter to support custom endpoints.
The changes are minimal invasive and should not change any existing behavior or break backwards compatibility.

I tested these changes with a local minio deployment.

@ktrueda
Copy link
Owner

ktrueda commented Jan 17, 2023

Thank you for your PR. Its' great.

I tried to use your code, but it didn't work for wildcard path on s3 like
parquet-tools csv -n 3 --endpoint-url https://s3.ap-northeast-1.amazonaws.com "s3://bucket-name/*".

Probably you have to modify util.py like

    def resolve_wildcard(self) -> List[ParquetFile]:
        list_res = self.aws_session.client('s3', endpoint_url=self.endpoint_url)\
            .list_objects_v2(
            Bucket=self.bucket,
            Prefix=self.key[:-1]  # remove *
        )
        if list_res['IsTruncated']:
            raise Exception(f'Too much file match s3://{self.bucket}/{self.key}')

        if list_res['KeyCount'] == 0:
            return []
        keys = [e['Key'] for e in list_res['Contents']]
        return sorted(
            [S3ParquetFile(aws_session=self.aws_session, bucket=self.bucket, key=key, endpoint_url=self.endpoint_url) for key in keys], // you have to add endpoint_url argment
            key=lambda x: x.key
        )

Could you confirm this?

@ktrueda
Copy link
Owner

ktrueda commented Jan 17, 2023

S3ParquetFile is also used in https://github.com/ktrueda/parquet-tools/blob/master/tests/test_util.py .
You have to update this file and pass this test.

@sattler
Copy link
Contributor Author

sattler commented Jan 17, 2023

thank you for point this out. I will make the according changes!

@sattler
Copy link
Contributor Author

sattler commented Jan 17, 2023

I added a simple default value for the endpoint_url.
Therefore, no superfluous parameter definitions are needed for the test cases while still allowing to be set when needed.
Test are now also passing :)

Copy link
Owner

@ktrueda ktrueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@ktrueda ktrueda merged commit fd97598 into ktrueda:master Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants