You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To communicate more information to downstream users of the data about data types, representations, and known issues, I think it would be useful to build a data dictionary. This could just be a plain text file, with sections for each category of data we collect, and sub-sections for each field.
The entries could look something like this:
"cases"
~~~~~~~
Description:
The number of COVID-19 cases recorded in the county for the given date
Fields:
"date" : date
The date of the observation
"cases" : integer
The number of positive COVID-19 cases observed on the date. Figure may be
preliminary and is subject to change
Notes:
[ ... notes go here ...]
Each of the data scrapers contains metadata, docstrings, comments, etc., that could be valuable to help the public understand the data more clearly. The goal would be to put this information all in one place, in an easy-to-digest way. Keeping the file in plain text makes it portable, and ensures that it's human readable (with some lightweight ASCII styling).
To foster easier collaboration in creating this doc, I'm suggesting that we just keep it as a file in the base dir of the repo on a separate branch while it is being drafted. That way, collaborators can make changes locally and push them up, and others can comment and contribute. Keeping it as a file in the repo has the added benefit of keeping the dictionary in sync with the rest of the code base; changes to the code that affect how the data is represented should be accompanied by updates to the data dictionary.
I'm happy to get this rolling and would appreciate any help and/or feedback!
The text was updated successfully, but these errors were encountered:
To communicate more information to downstream users of the data about data types, representations, and known issues, I think it would be useful to build a data dictionary. This could just be a plain text file, with sections for each category of data we collect, and sub-sections for each field.
The entries could look something like this:
Each of the data scrapers contains metadata, docstrings, comments, etc., that could be valuable to help the public understand the data more clearly. The goal would be to put this information all in one place, in an easy-to-digest way. Keeping the file in plain text makes it portable, and ensures that it's human readable (with some lightweight ASCII styling).
To foster easier collaboration in creating this doc, I'm suggesting that we just keep it as a file in the base dir of the repo on a separate branch while it is being drafted. That way, collaborators can make changes locally and push them up, and others can comment and contribute. Keeping it as a file in the repo has the added benefit of keeping the dictionary in sync with the rest of the code base; changes to the code that affect how the data is represented should be accompanied by updates to the data dictionary.
I'm happy to get this rolling and would appreciate any help and/or feedback!
The text was updated successfully, but these errors were encountered: