Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Glue DB name and dataset name as per EdData 4.0.0 release #4

Merged
merged 1 commit into from
May 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# eddata-sdk
SDK to query EdCast's data lake
SDK to query EdCast's data lake. This is a wrapper script which uses AWS Athena python SDK to query and download the data.

# Installation
This python based utility can be installed and run from a Unix or Windows environment. Below are the steps to install and run the utility.
Expand All @@ -13,15 +13,15 @@ This python based utility can be installed and run from a Unix or Windows enviro

# Command to run
````
python3 edc_data_export.py --region us-east-1 --query "select * from user_card_performance_reporting_i where day=’2020-04-01’" --aws_access_key_id <<sample_access_key>> --aws_secret_access_key <<sample_secret_key>> --filename download_data.csv --s3bucket edcast-provided-bucket-name --org_id 100000 --env prod
python3 edc_data_export.py --region us-east-1 --query "select * from user_card_performance_reporting_i_v where day=’2020-04-01’" --aws_access_key_id <<sample_access_key>> --aws_secret_access_key <<sample_secret_key>> --filename download_data.csv --s3bucket edcast-provided-bucket-name --org_id 100000 --env prod
````
Above command runs and stores the extracted data in CSV format in the download_data.csv.

# Downloading each Eddata dataset
This is the section where downloading of data in the past hour for every single eddata dataset is described below
This is the section where downloading of data in the past hour for every single eddata dataset is described below.

# Sample Queries
Some sample queries that can be used in the utility
Some sample queries that can be used in the utility.

````
select * from user_card_performance_reporting_i_v where day=’2020-04-01’
Expand Down
22 changes: 11 additions & 11 deletions edc_data_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,13 +176,13 @@ def get_help():
for table in TABLES:
print(" - {0}".format(table))
print("\n 3. Example queries are")
print(" - \"select * from user_card_performance_reporting_i where day=’2020-04-01’\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’ and user_email=’[email protected]’\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’ and user_first_name like ’admin%\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’ and card_tile like ‘admin%’\"")
print(" - \"select * from user_assignments_performance_i where day between ‘2020-04-01’ and ‘2020-04-04’ and assignment_state=’completed’\"")
print(" - \"select * from group_assignments_performance_i where day between ‘2020-04-01’ and ‘2020-04-04’ \"")
print(" - \"select * from user_card_performance_reporting_i_v where day=’2020-04-01’\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and user_email=’[email protected]’\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and user_first_name like ’admin%\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and card_tile like ‘admin%’\"")
print(" - \"select * from user_assignments_performance_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and assignment_state=’completed’\"")
print(" - \"select * from group_assignments_performance_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ \"")
print("\n - Enter your aws_access_key_id: 'AWS_ACCESS_KEY_ID' the key ID that was provided by EdCast support team.")
print("\n - Enter your aws_secret_access_key: 'AWS_SECRET_ACCESS_KEY' the secret key that was provided by EdCast support team.")
print("\n - Enter the file location along with the filename to be saved: 'FILENAME' the path and the filename where you want to save the file. The only supported file type is csv.")
Expand Down Expand Up @@ -259,8 +259,8 @@ def download_file():
exit()


TABLES = ['user_card_performance_reporting_i', 'group_performance_reporting_i',
'channel_performance_reporting_i', 'group_assignments_performance_i', 'user_assignments_performance_i']
TABLES = ['user_card_performance_reporting_i_v', 'group_performance_reporting_i_v',
'channel_performance_reporting_i_v', 'group_assignments_performance_i_v', 'user_assignments_performance_i_v']

if ('--{}'.format('help') in sys.argv):
get_help()
Expand Down Expand Up @@ -324,9 +324,9 @@ def download_file():
})

if env == 'qa':
database = "edc_customer_database_{0}".format(str(org_id))
database = "v1_edc_qa_analytics_customer_database_{0}".format(str(org_id))
else:
database = "edc_"+env+"_analytics_customer_database_{}".format(str(org_id))
database = "v1_edc_"+env+"_analytics_customer_database_{}".format(str(org_id))

workgroup = "{}".format(org_id)

Expand Down