Skip to content

Commit

Permalink
Merge pull request #1382 from hackforla/1381-get_request_data_csvpy-i…
Browse files Browse the repository at this point in the history
…s-limited-to-collecting-20k-requests

fix bug when checking break condition for get_request_data_csv
  • Loading branch information
nichhk authored Sep 29, 2022
2 parents 41124d6 + 315c3b2 commit ff44de1
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 18 deletions.
2 changes: 1 addition & 1 deletion server/utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ The utils directory will serve as a location for mutltiple utilty tools. get_re

The 311 request data from [lacity.org](https://data.lacity.org/browse?q=MyLA311%20Service%20Request%20Data%20&sortBy=relevance) has 34 columns. The get_request_data_csv.py script can be run from the command line passing the arguments start_date and end_date that lets you retreive the 311 request data from the [311 data server](https://dev-api.311-data.org/docs). The 311 server processes the data from lacity.org. The data cleaning procedure is mentioned [here](https://github.com/hackforla/311-data/blob/dev/docs/data_loading.md). The result is written to a csv file and saved in the current working directory of the user. A preview of the data_final dataframe is printed in the command line.

Example: `python get_311_request_data_csv.py "2021-01-01" "2021-01-03"` will return 261 rows and 15 columns.
Example: `python get_request_data_csv.py "2021-01-01" "2021-01-03"` will return 261 rows and 15 columns.

![image](https://user-images.githubusercontent.com/10836669/188473763-52bc9474-0878-432c-b4e8-6e4ff21dcda2.png)
40 changes: 23 additions & 17 deletions server/utils/get_request_data_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,47 +4,53 @@

REQUESTS_BATCH_SIZE = 10000


def get_311_request_data(start_date, end_date):
"""Fetches 311 requests from the 311 data server.
Retreives 311 requests from the 311 data server for a given start_date and end_date.
Args:
start_date: The date from which the 311 request data have to be collected. Datatype: Datetime.
end_date: The date upto which the 311 request data have to be fetched. Datatype: Datetime.
Return:
Dataframe data_final is returned with 15 columns. The dataframe is saved as a CSV file ('data_final.csv') in the current directory.
"""

skip = 0
all_requests = []
while True:
url=f'https://dev-api.311-data.org/requests?start_date={start_date}&end_date={end_date}&skip={skip}&limit={REQUESTS_BATCH_SIZE}'
url = f'https://dev-api.311-data.org/requests?start_date={start_date}&end_date={end_date}&skip={skip}&limit={REQUESTS_BATCH_SIZE}'
response = requests.get(url)
data = response.json()
all_requests.extend(data)
skip += REQUESTS_BATCH_SIZE
if len(data) < skip:
break
data_final = pd.DataFrame(all_requests)
data_final.sort_values(by='createdDate', inplace = True, ignore_index = True)
if len(data) < REQUESTS_BATCH_SIZE:
break
data_final = pd.DataFrame(all_requests)
data_final.sort_values(by='createdDate', inplace=True, ignore_index=True)
return data_final


def main():
"""Prints out the preview of the dataframe data_final in the command line.
The result is written to a csv file and saved in the current working directory of the user.
"""

parser = argparse.ArgumentParser(description='Gets 311 request data from the server')
parser.add_argument('start_date', type=str, help='The start date that has to be entered')
parser.add_argument('end_date', type=str, help='The end data that has to be entered')

parser = argparse.ArgumentParser(
description='Gets 311 request data from the server')
parser.add_argument('start_date', type=str,
help='The start date that has to be entered')
parser.add_argument('end_date', type=str,
help='The end data that has to be entered')
args = parser.parse_args()
start_date = args.start_date
end_date = args.end_date
data_final = get_311_request_data(start_date, end_date)
data_final = get_311_request_data(start_date, end_date)
data_final.to_csv('data_final.csv')
print(data_final)

if __name__ == "__main__":
print(data_final)


if __name__ == "__main__":
main()

0 comments on commit ff44de1

Please sign in to comment.