Open Data Search

This project is a part of the Data Science Working Group at Code for San Francisco. Other DSWG projects can be found at the main GitHub repo.

-- Project Status: Active

Project Intro/Objective

The purpose of this project is to use analytics and topic modelling of search text to improve the user experience at https://data.sfgov.org/

Partner

SF Open Data
https://datasf.org/opendata/
Partner contact: Jason Lally, @jasonlally

Methods Used

Data Analysis
Descriptive Statistics/Data Visualization
Natural Language Processing
Word2Vec Modelling

Technologies

R
Python
- Pandas, Spacy

Project Description

Th major goals of the project are as follows:

Clean and process search terms and categorize search terms by quality
Utilize Natural Language Processing and Topic Modelling on valid search terms and cluster terms to determine potential demand for data sources
Provide actionable insights to improve search functionality on the site

Needs of this project

NLP/Topic Modelling Expertise

Getting Started

Clone this repo, for help see this tutorial.
Data is being kept here
Data processing/transformation script is Data Combiner
- Script that combines raw data from .tsv files into a single .csv file
- Search Data Processing Jupyter Notebook - Notebook that cleans, processes and categorizes search terms

Featured Notebooks/Analysis

Exploration of Search Google Analytics
Search Data Modelling Juptyer Notebook
- Notebook with vectorization of search terms using pre-trained word2vec model

Contributing DSWG Members

Team Lead (Contact): Rocio S Ng (@Rocio)

Other Members:

Name	Slack Handle
Bao Lin Liu	@jbaolinliu
Scott Brenstuhl	@scott_brenstuhl

Contact

If you haven't joined the SF Brigade Slack, you can do that here.
Our slack channel is #datasci-open-data_src
Feel free to contact team leads with any questions or if you are interested in contributing!

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
0-archive		0-archive
Figs		Figs
data		data
helper_modules		helper_modules
processed_search_term_data		processed_search_term_data
NER_people.py		NER_people.py
NER_people_list.p		NER_people_list.p
NER_people_location.py		NER_people_location.py
NER_spacy.py		NER_spacy.py
README.md		README.md
corrected_spellng_errors.csv		corrected_spellng_errors.csv
data_combiner.R		data_combiner.R
location_count.csv		location_count.csv
notes.md		notes.md
open_search_exploration.Rmd		open_search_exploration.Rmd
open_search_exploration.md		open_search_exploration.md
people_count.csv		people_count.csv
search_data_processing.ipynb		search_data_processing.ipynb
search_data_processing_baolin.ipynb		search_data_processing_baolin.ipynb
search_data_processing_baolin_v2.ipynb		search_data_processing_baolin_v2.ipynb
search_terms.py		search_terms.py
spell_corrector.py		spell_corrector.py
street_names_clean.csv		street_names_clean.csv
threshold.py		threshold.py
usage_writeup.Rmd		usage_writeup.Rmd
usage_writeup.md		usage_writeup.md
word2vec_modelling.ipynb		word2vec_modelling.ipynb
word_count_sorted.p		word_count_sorted.p
word_list.txt		word_list.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Data Search

-- Project Status: Active

Project Intro/Objective

Partner

Methods Used

Technologies

Project Description

Needs of this project

Getting Started

Featured Notebooks/Analysis

Contributing DSWG Members

Other Members:

Contact

About

Releases

Packages

Contributors 4

Languages

sfbrigade/datasci-open-data-search

Folders and files

Latest commit

History

Repository files navigation

Open Data Search

-- Project Status: Active

Project Intro/Objective

Partner

Methods Used

Technologies

Project Description

Needs of this project

Getting Started

Featured Notebooks/Analysis

Contributing DSWG Members

Other Members:

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages