Skip to content

Latest commit

 

History

History
126 lines (100 loc) · 4.61 KB

ENV.md

File metadata and controls

126 lines (100 loc) · 4.61 KB

Notes on the env

I use jupyter lab with a variety of python packages on linux. Here's a simple recipe to set it up from scratch. Your mileage may vary based on OS.

  1. Make sure Anaconda is installed and git clone the repository from github. cd into the nyc-stew folder.

  2. In the nyc-stew folder you should see the environment.yml file. Use this to build the env using conda: conda env create -f environment.yml

  3. Once the create env is completed, activate the new env: conda activate stew

  4. At this point you will have a working jupyter lab with the necessary packages. Launch the lab with juptyer lab.

  5. You're ready to explore, understand, develop, ...

Startup imports

I am lazy with imports. I setup a default start script in ~/.ipython/profile_default/startup so I don't think about specifics in a notebook.

I have included start.py in the notebooks folder. If you don't want to setup a default, just add a code cell (to each notebook) with %run start.py.

data

My-o-my. I've covered a lot of ground with the data. In general the data flow is:

1. Find data and save to the raw directory.

My raw directory looks like this:

data/raw
├── 311
├── admin-boundaries
├── DEM
├── DEP
├── NYC-2017-STEW-MAP-Public-Version2
├── NYCFutureHighTideWithSLR.gdb
├── NYC_STEWMAP_2017_Networks_Version2_Public.xlsx
├── NYCWRP_Shapefiles_2016
├── slr_metadata.pdf
└── weather

8 directories, 2 files

I have some organization. As of this time (05/31/2022), I have 31G. Way to much for github.

2. Process the raw data and place it in data/processed.

My processed directory looks like:

data/processed/
├── 311
│   ├── dep-clean-geo.parq
│   ├── dep-full.parq
│   ├── dob-clean-geo.parq
│   ├── dob-full.parq
│   ├── dot-clean-geo.parq
│   ├── dot-full.parq
│   ├── dpr-clean-geo.parq
│   ├── dpr-full.parq
│   ├── dsny-clean-geo.parq
│   ├── dsny-full.parq
│   ├── hpd-clean-geo.parq
│   └── hpd-full.parq
├── admin-boundaries
│   ├── boroughs.parq
│   ├── brooklyn.parq
│   ├── CDTA.parq
│   ├── census-tracts-2020.parq
│   └── NTA.parq
├── brooklyn
│   ├── brooklyn-2021-311.parq
│   ├── brooklyn-311-elevation.parq
│   ├── brooklyn-boundary.parq
│   ├── brooklyn-catch-basins.parq
│   ├── brooklyn-census-tracts.parq
│   ├── brooklyn-community-districts-ta.parq
│   ├── brooklyn-dem.parq
│   ├── brooklyn-extreme-flood.parq
│   ├── brooklyn-moderate-flood.parq
│   ├── brooklyn-ms4-drainage.parq
│   ├── brooklyn-ms4-outfalls.parq
│   ├── brooklyn-neighborhoods-ta.parq
│   ├── brooklyn-rainfall-2021.parq
│   ├── brooklyn-slr-2050-08.parq
│   ├── brooklyn-slr-2050-11.parq
│   ├── brooklyn-slr-2050-16.parq
│   ├── brooklyn-slr-2050-21.parq
│   ├── brooklyn-slr-2050-30.parq
│   ├── brooklyn-turfs.parq
│   ├── primst-turfs-counts.parq
│   └── primst-with-alters.parq
├── db
│   ├── popids2.p
│   └── popids.p
├── DCP
│   ├── slr-2050-08.parq
│   ├── slr-2050-11.parq
│   ├── slr-2050-16.parq
│   ├── slr-2050-21.parq
│   ├── slr-2050-30.parq
│   └── slr_metadata.pdf
├── DEP
│   ├── 2021-311.parq
│   ├── brooklyn-extreme.parq
│   ├── catch-basins.parq
│   ├── Data_Dictionary_ExtremeFlood.xlsx
│   ├── extreme-flood-map.parq
│   ├── moderate-flood-map.parq
│   ├── ms4-drainage.parq
│   └── ms4-outfalls.parq
├── office-locations.parq
├── SN
│   ├── connections.parq
│   └── elements.parq
└── turfs.parq

7 directories, 58 files

It contains 3.2G. You can look at the notebooks and see what goes into the transformations.

Note that I am trying to use parquet files. Much faster and more economical.

3. For the first release, I am including data/processed/brooklyn/

This directory contains 72M. Somewhat more manageable.