Create Python script to clean 2015 through 2019 data #1844
Labels
Complexity: Medium
Feature: Data Quality
p-feature: data
Role: Data Science
Data management, loading, or analysis
Size: 1pt
Can be done in 6 hours
Milestone
Overview
We need to clean the 2016 through 2019 data from the 311 Data Service Request APIs from the city so that we can access them through our Search and Filters modal.
Action Items
We want to know how to clean the data from 2015 to 2019 in order to make it consistent with our 2021-2024 data. To achieve this, complete the following:
check_column_count.py
from R1)inspect_csv.py
from R1)311-data/scripts/clean-2015-through-2020
Resources/Instructions
R1: Relevant files and functions used in our build process, as well as tools used to determine where and how our datasets needed to be cleaned:
311-data/scripts/migrateOldHfDataset.py
311-data/scripts/updateHfDataset.py
, except that it allows you to pass in a year as an argument.dlData(year)
: you'll notice we are drawing data from a personal repo, see resources to know where that data comes fromhfClean(year)
: this method has two parts:hfUpload(year)
, this won't be needed for the purposes of this ticketprocess_data(...)
, this is our main control flow for the script, which is determined by command line arguments311-data/scripts/csv_debug_tools
check_column_count.py
, read documentation at the top of the fileinspect_csv.py
, read documentation at the top of the fileR2: Data Sources
The text was updated successfully, but these errors were encountered: