Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to model vaccination data #175

Open
Mr0grog opened this issue Jan 22, 2021 · 5 comments
Open

Figure out how to model vaccination data #175

Mr0grog opened this issue Jan 22, 2021 · 5 comments

Comments

@Mr0grog
Copy link
Collaborator

Mr0grog commented Jan 22, 2021

Three bay area counties now have vaccination dashboards:

We need to figure out how to best represent this data in our scraper output. What’s common between these dashboards? What’s different? What’s most important?


Updated 2021-01-22: Added Marin County
Updated 2021-02-05: Added Alameda, San Mateo, Napa from @kengo-sony

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Jan 23, 2021

Marin is also now including vaccination data at the bottom of their main dashboard page: https://coronavirus.marinhhs.org/surveillance#vaccines

@kengo-sony
Copy link
Contributor

kengo-sony commented Feb 4, 2021

Found three more counties that have vaccination dashboards

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Feb 15, 2021

Interesting discovery: the state is currently publishing some good data on their vaccinations page at: https://covid19.ca.gov/vaccines/. For each county, they’ve got:

  • Total vaccinations administered
  • % Administered by age
  • % Administered by gender
  • % Administered by race/ethnicity

Some counties are definitely not publishing these stats (some counties we haven’t identified a dashboard/dataset for at all), and this also gives us a standard set of categories. Using this state data might be the place to start for now.

Downsides:

  • This data is not in the state open data portal or Snowflake (yet?)
  • None of the data points are offered in timeseries form, so we need to build a timeseries ourselves by scraping daily.

Since we need to build the timeseries ourselves and because the state data covers a broader set of counties than we do here, I went ahead and started the ball rolling at https://github.com/Mr0grog/ca-covid-vaccination-stats. If that works out well, we should see about integrating the code (or just the data) here.

@Lynguyen237
Copy link

@Mr0grog Rob, how did your attempt to build time series data from the state website go?
I took a stab at documenting the metrics provided by each county here (not finished) and I think if the state already has the data and common metrics shared by all counties, that would be our best bet: https://docs.google.com/spreadsheets/d/1fhdF587nhBlychWvmMwPs2PwGy9ffqiH06ZZhEitVfo/edit#gid=262116661

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Apr 29, 2021

The state is now publishing timeseries data for all the same info I was scraping at: https://data.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data

I think you should probably just use the new state dataset; a quick glance over your spreadsheet makes it look like the state data feeds or my scraper both cover all the stats you’ve listed (except neighborhood), but for all counties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants