-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: AWS Athena backend (and general AWS connections) #7682
Comments
possibly could build off of https://github.com/laughingman7743/PyAthena? I remember suggesting this a long time ago but there were concerns on how it would be able to be tested without you know, having an AWS account in CI ect |
This feature would be valuable to me too. It'd probably be good to reuse functionality that's already common and built out in other AWS maintained packages. For example, the AWS SDK for Pandas uses boto3 sessions for authentication. The authentication there includes a default session which will is a nice feature to connect to AWS so long as the environment is configured to work with other AWS tools like their AWS CLI. I tend to rely on the priority search authentication in there to autoload from a credentials/config file made with the AWS CLI to refresh any session tokens, but I know others may prefer refreshing standardized environment variables for AWS authentication instead. One other pro for using this approach is that the config/credentials files used by boto3 sessions are also what pyarrow implemented for it's authentication into AWS and reading/writing parquet files to S3. So this may work nicely with the to_parquet/read_parquet and s3 file systems as well. Similarly it's what PyAthena mentioned above also uses. In practice this is also just nice to work with in my experience - get the aws authentication working once, and then I can use the same configs for multiple packages (AWS SDK for Pandas, PyArrow, boto3, PyAthena, etc) Separate from the authentication topic...the AWS SDK for Pandas might make for a good SQL backend for Athena as well, as it implements the standard SQL operators directly in the Athena and Glue services. Likely that'd mean that the Ibis connection object would need to cover some config options, with the main one being different approaches in how to handle getting data from AWS back to the Python session that have a big impact on performance. But if all we need is authentication, then the SQL dialects in Trino (what Athena is based on) ought to get us pretty close too. Hope the references above are helpful if this gets picked up, thanks! |
Agreed that we should work towards making it easier to add support for backends that are ostensibly derivative of existing systems. It's very likely that we won't get to this until after #7580 (or a sequence of its changes) are merged and released, as we'd like to get away from sqlalchemy before supporting more backends. |
If someone wants to try handing a |
I've started working on an Athena backend. |
Consider how to support generic AWS authentication and backends for services, namely but not limited to Athena
Hi @lostmygithubaccount ,
Thanks for the reply and sorry for the delayed response here, I was bit occupied with other work so couldn't able spend time on this.
I had a look at the postgresql backends, but wondering about making a connection to athena using postgresql. At the moment All I have is aws credentials like AccessKeyId and SecretAccessKey. I am not sure how to pass these in the args.
If possible, could you please post a sample code snippet to make a connection to aws athena using postgresql backend ?
Originally posted by @uramith in #7229 (reply in thread)
The text was updated successfully, but these errors were encountered: