Workshop Overview

This repository is a collection of notebooks, python modules, and data about small businesses in San Diego. It demonstrates a workflow for analyzing and combining multiple content sources of the local community. It is a starting point for two potential projects:

Scalesd accelerator program supporting small businesses in San Diego.
Sustainable Communities web challenge sponsored by Code for Atlanta.

Introduction - Data Science Workflow

My Data Science Workflow covers the following steps:

The questons, and data, come from multiple sources. They focus on local businesses and the communities where they operate. Foundation data starts with business firms and Business Improvement Districts.
Analysis is based on Extract, Transform, Load/Link (wrangling). I use python and a variety of libraries to manipulate the data. Important libraries used in the analysis include pandas, geopandas, and numpy. As we refine the analysis, specific questions are answered, just as new ones are uncovered.
My development environment is jupyter lab. Lab provides a rich environment for exploration. I make use of multiple widget packages to analyze and visualize the data. The key packages are ipywidgets, ipyleaflet, and bqplot.
As any analysis unfolds new questions are always uncovered. This drives us to new and different sources of information for answers. Part of our research is uncovering information sources to address these new questions. Part of our understanding comes from applying different analytic techniques such as spatial, time series and crossfiltering. Our motivation is always about adding more structure to the data we have!
As I develop analytic notebooks, I like to keep my eye out for opportunities develop packages/modules that can be shared. I have one simple example in the src directory for NAICS. As you look at the notebooks you should see examples of repetitive hacks. These should be converted to code! Techniques to link, fuse, and share are very important.
The work flow allows me to package, and document, the processes used in the analysis. Notebooks (like this one) can be published and shared. It provides a structured approach to uncovering details in the data, software interfaces for various services, and ultimately an approach to creating a structured knowledge repository supporting multiple projects.

This workshop demonstrates how these elements are combined for analysis and product development.

Contents of the workshop repository

Data directory contains the various csv and shape files
Notebooks directory contains the initial notebooks available here
The order I'd recommend (sort of) at first glance:
- preparation.md - Markdown description of questions and data sources
- wrangling.ipynb - Primary notebook to process the business csv
- naics.ipynb - A deep dive on NAICS codes
- query.ipynb - A scratch pad to play with different queries (more to come on this one)
- El Cajon Blvd BID.ipynb - Visualize the boulevard BID
- tldr.ipynb - Jumping to the end first?

This is just the starting point. There is much more to do!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
img		img
notebooks		notebooks
src		src
ENV.md		ENV.md
README.md		README.md
environment.yml		environment.yml
postBuild		postBuild

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Workshop Overview

Introduction - Data Science Workflow

Contents of the workshop repository

About

Releases

Packages

Languages

researchsherpa/workshop

Folders and files

Latest commit

History

Repository files navigation

Workshop Overview

Introduction - Data Science Workflow

Contents of the workshop repository

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages