HHA507 Data Science / Assignment 1
1. Find or create an excel (.xls) file that contains at least two tabs. Bring in the first tab as a data frame; label that dataset as ‘tab1’, and a second data frame that represents the 2nd tab of the excel file, name this 'tab2'
hha-data-ingestion-alice/ingestion.py
Lines 10 to 21 in 1e29573
2. Find 1 open source json API via CMS, and retrieve it using requests package ; call the dataset ‘apiDataset’
-
data = requests.get('https://data.cms.gov/data-api/v1/dataset/ad73e4d3-925b-4055-ad9b-7f0015e906c8/data') data = data.json()
-
hha-data-ingestion-alice/ingestion.py
Lines 24 to 31 in 1e29573
- Limit your query to get the first 100 rows from each, as either a dataframe or dictionary; please call the first dataset ‘bigquery1’ and the second dataset ‘bigquery2’;
hha-data-ingestion-alice/ingestion.py
Lines 34 to 46 in 1e29573
-
import requests import json import pandas as pd ## import pandas for general file types from google.cloud import bigquery ## import bigquery for bigquery files import xlrd ## import xlrd for excel files, tab names import openpyxl ## import openpyxl to read Excel 2010 .xlsx files import db_dtypes #import db_btypes to resolve section3 compatibility issues
- Kaggle
- Healthdata.gov
- CMS
- Instructions for connecting to bigquery via a client (e.g., python)
- Helper YouTube Video that walks through how to create the special .json file to connect