Skip to content

Commit

Permalink
Merge pull request #4 from edcast/venky/noch/update-dataset-name-4.0.0
Browse files Browse the repository at this point in the history
Update Glue DB name and dataset name as per EdData 4.0.0 release
  • Loading branch information
venky-edcast authored May 24, 2021
2 parents 6a9bdfd + d7ef361 commit 567bfa7
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 15 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# eddata-sdk
SDK to query EdCast's data lake
SDK to query EdCast's data lake. This is a wrapper script which uses AWS Athena python SDK to query and download the data.

# Installation
This python based utility can be installed and run from a Unix or Windows environment. Below are the steps to install and run the utility.
Expand All @@ -13,15 +13,15 @@ This python based utility can be installed and run from a Unix or Windows enviro

# Command to run
````
python3 edc_data_export.py --region us-east-1 --query "select * from user_card_performance_reporting_i where day=’2020-04-01’" --aws_access_key_id <<sample_access_key>> --aws_secret_access_key <<sample_secret_key>> --filename download_data.csv --s3bucket edcast-provided-bucket-name --org_id 100000 --env prod
python3 edc_data_export.py --region us-east-1 --query "select * from user_card_performance_reporting_i_v where day=’2020-04-01’" --aws_access_key_id <<sample_access_key>> --aws_secret_access_key <<sample_secret_key>> --filename download_data.csv --s3bucket edcast-provided-bucket-name --org_id 100000 --env prod
````
Above command runs and stores the extracted data in CSV format in the download_data.csv.

# Downloading each Eddata dataset
This is the section where downloading of data in the past hour for every single eddata dataset is described below
This is the section where downloading of data in the past hour for every single eddata dataset is described below.

# Sample Queries
Some sample queries that can be used in the utility
Some sample queries that can be used in the utility.

````
select * from user_card_performance_reporting_i_v where day=’2020-04-01’
Expand Down
22 changes: 11 additions & 11 deletions edc_data_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,13 +176,13 @@ def get_help():
for table in TABLES:
print(" - {0}".format(table))
print("\n 3. Example queries are")
print(" - \"select * from user_card_performance_reporting_i where day=’2020-04-01’\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’ and user_email=’[email protected]\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’ and user_first_name like ’admin%\"")
print(" - \"select * from user_card_performance_reporting_i where day between ‘2020-04-01’ and ‘2020-04-04’ and card_tile like ‘admin%’\"")
print(" - \"select * from user_assignments_performance_i where day between ‘2020-04-01’ and ‘2020-04-04’ and assignment_state=’completed’\"")
print(" - \"select * from group_assignments_performance_i where day between ‘2020-04-01’ and ‘2020-04-04’ \"")
print(" - \"select * from user_card_performance_reporting_i_v where day=’2020-04-01’\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and user_email=’[email protected]\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and user_first_name like ’admin%\"")
print(" - \"select * from user_card_performance_reporting_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and card_tile like ‘admin%’\"")
print(" - \"select * from user_assignments_performance_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ and assignment_state=’completed’\"")
print(" - \"select * from group_assignments_performance_i_v where day between ‘2020-04-01’ and ‘2020-04-04’ \"")
print("\n - Enter your aws_access_key_id: 'AWS_ACCESS_KEY_ID' the key ID that was provided by EdCast support team.")
print("\n - Enter your aws_secret_access_key: 'AWS_SECRET_ACCESS_KEY' the secret key that was provided by EdCast support team.")
print("\n - Enter the file location along with the filename to be saved: 'FILENAME' the path and the filename where you want to save the file. The only supported file type is csv.")
Expand Down Expand Up @@ -259,8 +259,8 @@ def download_file():
exit()


TABLES = ['user_card_performance_reporting_i', 'group_performance_reporting_i',
'channel_performance_reporting_i', 'group_assignments_performance_i', 'user_assignments_performance_i']
TABLES = ['user_card_performance_reporting_i_v', 'group_performance_reporting_i_v',
'channel_performance_reporting_i_v', 'group_assignments_performance_i_v', 'user_assignments_performance_i_v']

if ('--{}'.format('help') in sys.argv):
get_help()
Expand Down Expand Up @@ -324,9 +324,9 @@ def download_file():
})

if env == 'qa':
database = "edc_customer_database_{0}".format(str(org_id))
database = "v1_edc_qa_analytics_customer_database_{0}".format(str(org_id))
else:
database = "edc_"+env+"_analytics_customer_database_{}".format(str(org_id))
database = "v1_edc_"+env+"_analytics_customer_database_{}".format(str(org_id))

workgroup = "{}".format(org_id)

Expand Down

0 comments on commit 567bfa7

Please sign in to comment.