This project contains codes for prediction and evaluation of crime on LA data(/crimePrediction). Additionally, twitter data scraping codes(/twitter-scrape-master, reference : https://www.dataquest.io/blog/streaming-data-python/) and other relevant codes are also present.
For relation of streetlights with crime, a csv file named LA_STLIGHT.csv is required. This file will be uploaded in future.
For crime prediction codes, change directory to crimePrediction/ Download LAPD crime data csv file from this website - 'https://data.lacity.org/A-Safe-City/Crime-Data-From-2010-to-Present/y8tr-7khq' and save in crimePrediction folder as 'lapd_Crime_Data_From_2010_to_Present.csv'
Requirements :
- Run 'pip install -r requirements.txt'
- You will need to install pyproj library separately according to your OS. For Ubuntu, Download it from this link - https://pypi.python.org/pypi/pyproj? Unzip it and follow instructions in readme to install.
Data file : lapd_Crime_Data_From_2010_to_Present.csv Python files : Please refer to pythonFiles.txt
Before making predictions, two python scripts are used to divide the data into grid cells and make a time series for each cell. Those files are : 1. makeGrid.py and 2. divideByTime.py
Finally, crimePrediction.py is used. However, for using KFold cross validation, crimePredictionKFold.py along with divideByTimeClustering.py and makeGrid.py must be used.
Note : If you wish to change the data used, make modifications in makeGrid.py
Grid clustering approach can be tested by using gridClustering.py
For finding score according to the new resource allocation metric, use ra_improved.py
Suggested flow:
- Run makeGrid.py on appropriate data
- Run divideByTimeClustering.py on data generated by makeGrid.py
- Generate predcitions using Linear regression(crimePredictionKFold.py) and clustered linear regression(gridClustering.py)
- Run ra_improved.py to get plots.
A script can be used to automate the whole process as desired. Refer to script_clustering.sh and make relevant changes.