Athena Query Utility

Overview

This Python script provides a utility for executing queries on Amazon Athena and retrieving the results. It includes caching mechanisms to store and retrieve query results locally, which can significantly speed up repeated queries. Additionally, it supports multi-threaded query execution and integration with Jupyter notebooks for interactive use.

Features

Execute Athena Queries: Run SQL queries on Amazon Athena and retrieve results.
Caching: Save query results to local files to avoid redundant queries.
Multi-threading: Execute queries in separate threads for non-blocking operations.
Jupyter Notebook Integration: Display results directly within Jupyter notebooks.

Requirements

Python 3.x
boto3
pandas
ipywidgets (for Jupyter notebook integration)

Installation

Install the required Python packages using pip:

pip install boto3 pandas ipywidgets

Usage

Setting Up AWS Clients

Before running queries, you need to set up Athena and S3 clients using your AWS profile:

from boto3 import Session
from botocore.config import Config

def get_athena_s3_clients(profile_name='your-aws-profile'):
    session = Session(profile_name=profile_name)
    athena = session.client('athena', config=Config(region_name='us-west-2', signature_version='v4'))
    s3 = session.client('s3', config=Config(region_name='us-west-2', signature_version='v4'))
    return athena, s3

athena_client, s3_client = get_athena_s3_clients(profile_name='your-aws-profile')

Running a Query

Use the athena_query function to execute a query. Optionally, you can enable caching to store and retrieve query results locally.

from functools import partial

query = partial(
    athena_query,
    database='your-database',
    athena=athena_client,
    s3=s3_client,
    workgroup='your-workgroup'
)

result = query("SELECT * FROM your_table LIMIT 10")
print(result)

Using Cache

To use caching, simply run your query as shown above. The script will automatically handle caching based on the SQL query string.

Running a Query in a Thread

To execute a query in a separate thread and display results in a Jupyter notebook, use the thread_query function:

thread = thread_query(query="SELECT count(*) FROM your_table WHERE month=202201")

Functions

save_cache(key, value, query_execution): Saves query results to a cache file.
get_from_cache(key): Retrieves query results from a cache file.
athena_query(query, database, athena, s3, workgroup=None, output_location=None, print_function=print, use_cache=None): Executes a query on Athena and returns the result.
thread_query(query): Executes a query in a separate thread for non-blocking operations in Jupyter notebooks.
get_athena_s3_clients(profile_name='your-aws-profile'): Sets up and returns Athena and S3 clients using the specified AWS profile.

Example

# Example usage in a Jupyter notebook

# Initialize clients
athena_client, s3_client = get_athena_s3_clients(profile_name='your-aws-profile')

# Partial function with common parameters
query = partial(
    athena_query,
    database='your-database',
    athena=athena_client,
    s3=s3_client,
    workgroup='your-workgroup'
)

# Run a query
result = query("SELECT * FROM your_table LIMIT 10")
print(result)

# Run a threaded query in a Jupyter notebook
thread = thread_query(query="SELECT count(*) FROM your_table WHERE month=202201")

Author

Michael Permana ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
athena_client.py		athena_client.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Athena Query Utility

Overview

Features

Requirements

Installation

Usage

Setting Up AWS Clients

Running a Query

Using Cache

Running a Query in a Thread

Functions

Example

Author

About

Releases

Packages

Languages

License

mpermana/athena_client

Folders and files

Latest commit

History

Repository files navigation

Athena Query Utility

Overview

Features

Requirements

Installation

Usage

Setting Up AWS Clients

Running a Query

Using Cache

Running a Query in a Thread

Functions

Example

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages