Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Data: CA COVID-19 Hospital Data #29

Closed
elaguerta opened this issue Apr 14, 2020 · 5 comments
Closed

New Data: CA COVID-19 Hospital Data #29

elaguerta opened this issue Apr 14, 2020 · 5 comments

Comments

@elaguerta
Copy link
Collaborator

County or Counties: All
Data source: https://data.chhs.ca.gov/dataset/california-covid-19-hospital-data-and-case-statistics/resource/6cd8d424-dfaa-4bdd-9410-a3d656e1176e?inner_span=True
Output to:
Update frequency/time: Daily, likely in the morning, updated to present day

@benghancock
Copy link
Collaborator

Hi @elaguerta - I'd love to help with this issue, if you're still looking for help on it. It looks like the CSV data is pretty tidy, and as noted, pulls the hospital data for all counties. The columns are:

['county', 'todays_date', 'hospitalized_covid_confirmed_patients',
       'hospitalized_suspected_covid_patients', 'hospitalized_covid_patients',
       'all_hospital_beds', 'icu_covid_confirmed_patients',
       'icu_suspected_covid_patients', 'icu_available_beds']

How do you envision this integrating with the county scrapers? Would each county scraper also pull from this dataset, and filter for the selected county? Or would it not be integrated -- i.e. the hospital data would be scraped separately?

@elaguerta
Copy link
Collaborator Author

Hi @benghancock, for now I think we should keep it separate because it is coming from a completely different source than all the county sources. Any integration that might be needed by the front-end design, or by our future API users, can happen downstream. To make that easier, I'd suggest you structure your top method to get just one county by a name or other identifier passed as an argument.

@benghancock
Copy link
Collaborator

Got it, that makes sense. I'll plan to start working on this later today. I've forked the project and will create a branch for this issue. Thanks!

@benghancock
Copy link
Collaborator

I've got the pagination working to loop through and collect all records returned by the API. Still need to collect the notes for each field, and document the output data model. By default the response returns all the records going back to March, which is a lot (5,000+ for all counties) and will obviously continue to grow. Do we want to limit the time period or just pull them all in so we have complete data each time?

@benghancock
Copy link
Collaborator

Hi @elaguerta - I've put in a PR that hopefully delivers what's needed for this.

elaguerta added a commit that referenced this issue Jul 29, 2020
Issue #29: Get CA COVID-19 Hospital Data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants