Norman PD Incidents Extractor is a python based utillity tool used to extract incidents data from a provided incident PDF file URL (which is hosted on Norman Police Department's website).
The project's python code follows PEP8 Style Guide
This utility uses a number of open source projects:
- PyPDF2 - Utility to read and write PDFs with Python
- Pytest - Testing framework that supports complex functional testing
- Pytest-cov - Coverage plugin for pytest
- Pandas - Flexible and powerful data analysis / manipulation library for Python
- Jupyterlab - Browser-based computational environment for python
- autopep8 - Tool that automatically formats Python code to conform to the PEP 8 style guide
- Clone this repository and move into the folder.
$ git clone https://github.com/Biswas-N/Norman-PD-incidents-extractor.git $ cd Norman-PD-incidents-extractor
- Install dependencies using Pipenv.
$ pipenv install
- Run the utility tool
$ make
Note: Project includes a
Makefile
which has commonly used commands. By runningmake
the following commandpipenv run python main.py --incidents <Sample URL>'
is executed.
The documentation about code structure and extraction algorithm can be found here.
This utility is tested using pytest.
Documentation about the tests can be found here. Follow the below commands to run tests on your local system.
- Install dev-dependencies.
$ pipenv install --dev
- Run tests using
Makefile
.$ make test
- Run test coverage.
$ make cov
- The utility is built based on the assumption that, there might be empty spaces either in Location or Nature column or both. If there are empty value in any other columns the utility may fail to extract incidents.
- The utility assumes there are only five columns (Datetime, Incident Number, Location, Nature and Incident ORI) for each incident. If that is changed, the utility may fail to extract incidents.