Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Config.credentials_provider not used in write path #3367

Open
kevinzwang opened this issue Nov 20, 2024 · 0 comments
Open

S3Config.credentials_provider not used in write path #3367

kevinzwang opened this issue Nov 20, 2024 · 0 comments
Assignees
Labels
bug Something isn't working p1 Important to tackle soon, but preemptable by p0

Comments

@kevinzwang
Copy link
Member

Describe the bug

If you are writing to an S3 bucket and configure your S3 credentials using a user-provided function using S3Config.credentials_provider, it does not currently pass those credentials on to our writer, so it will fail to authenticate.

To Reproduce

The following will fail:

import daft
import datetime
import boto3

def get_credentials():
    session = boto3.Session()
    creds = session.get_credentials()
    return daft.io.S3Credentials(
        key_id=creds.access_key,
        access_key=creds.secret_key,
        session_token=creds.token,
        expiry=datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(hours=1),
    )

s3_config = daft.io.S3Config(credentials_provider=get_credentials, region_name="us-west-1")
io_config = daft.io.IOConfig(s3=s3_config)

df = daft.from_pydict({"foo": [1, 2, 3]})

df.write_parquet("s3://path/to/bucket.parquet", io_config=io_config)

Expected behavior

Daft should behave the same between reads (which currently work) and writes. It should fetch the credentials from the credentials provider (or cached credentials if already fetched and not expired) and pass it along to the PyArrow writer.

Component(s)

Parquet, CSV, Other

Additional context

Relevant part of the code where we set the PyArrow filesystem credentials for writing: https://github.com/Eventual-Inc/Daft/blob/main/daft/filesystem.py#L215-L235

@kevinzwang kevinzwang added bug Something isn't working needs triage labels Nov 20, 2024
@desmondcheongzx desmondcheongzx added p1 Important to tackle soon, but preemptable by p0 and removed needs triage labels Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p1 Important to tackle soon, but preemptable by p0
Projects
None yet
Development

No branches or pull requests

2 participants