Create a Data Dictionary to Describe Data, Known Issues, and Caveats #212

benghancock · 2021-06-04T04:35:58Z

To communicate more information to downstream users of the data about data types, representations, and known issues, I think it would be useful to build a data dictionary. This could just be a plain text file, with sections for each category of data we collect, and sub-sections for each field.

The entries could look something like this:

"cases"
~~~~~~~
Description:
  The number of COVID-19 cases recorded in the county for the given date

Fields:
  "date" : date
    The date of the observation

  "cases" : integer
    The number of positive COVID-19 cases observed on the date. Figure may be
    preliminary and is subject to change

    Notes:
    [ ... notes go here ...]

Each of the data scrapers contains metadata, docstrings, comments, etc., that could be valuable to help the public understand the data more clearly. The goal would be to put this information all in one place, in an easy-to-digest way. Keeping the file in plain text makes it portable, and ensures that it's human readable (with some lightweight ASCII styling).

To foster easier collaboration in creating this doc, I'm suggesting that we just keep it as a file in the base dir of the repo on a separate branch while it is being drafted. That way, collaborators can make changes locally and push them up, and others can comment and contribute. Keeping it as a file in the repo has the added benefit of keeping the dictionary in sync with the rest of the code base; changes to the code that affect how the data is represented should be accompanied by updates to the data dictionary.

I'm happy to get this rolling and would appreciate any help and/or feedback!

The text was updated successfully, but these errors were encountered:

benghancock · 2021-06-04T16:38:24Z

I've started this as DICTIONARY.md in the base of the repo, on branch 212-create-data-dictionary.

benghancock added the documentation Improvements or additions to documentation label Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Data Dictionary to Describe Data, Known Issues, and Caveats #212

Create a Data Dictionary to Describe Data, Known Issues, and Caveats #212

benghancock commented Jun 4, 2021

benghancock commented Jun 4, 2021

Create a Data Dictionary to Describe Data, Known Issues, and Caveats #212

Create a Data Dictionary to Describe Data, Known Issues, and Caveats #212

Comments

benghancock commented Jun 4, 2021

benghancock commented Jun 4, 2021