Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add brief description of RSI model to quickstart #492

Merged
merged 96 commits into from
Aug 29, 2024

Conversation

kmoscoe
Copy link
Contributor

@kmoscoe kmoscoe commented Aug 28, 2024

Also added a note to the workflow, and renamed the "custom data" page to be clearer about what it's about.

kmoscoe and others added 30 commits July 11, 2024 13:23
Though the style guide says to just use imperatives, "get started" just sounds weird. Also this is more consistent with "troubleshooting"
@kmoscoe kmoscoe requested a review from keyurva August 28, 2024 23:31
... | ... | ... |

There are a few important things to note:
- There are only 3 columns: one representing a place (`countryAlpha3Code`); one representing a date (`date`); and one representing a [_statistical variable_](/glossary.html#variable), which is a Data Commons concept for a metric: `average_annual_wage` and `gender_wage_gap`. (Actually, there can be any number of statistical variable columns -- but no other types of additional columns -- and these two CSV files could be combined into one.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also explain the countryAlpha3Code part in brief or point to a section that does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! Done!


There are a few important things to note:
- There are only 3 columns: one representing a place (`countryAlpha3Code`); one representing a date (`date`); and one representing a [_statistical variable_](/glossary.html#variable), which is a Data Commons concept for a metric: `average_annual_wage` and `gender_wage_gap`. (Actually, there can be any number of statistical variable columns -- but no other types of additional columns -- and these two CSV files could be combined into one.)
- Every row is a separate [_observation_](/glossary.html#observation), or a value of the variable for a given place and time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true of the 2 sample files but if they have multiple variable columns, then each row could represent multiple observations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is related to the second bullet where I mention that; added another mention, PTAL.

- There are only 3 columns: one representing a place (`countryAlpha3Code`); one representing a date (`date`); and one representing a [_statistical variable_](/glossary.html#variable), which is a Data Commons concept for a metric: `average_annual_wage` and `gender_wage_gap`. (Actually, there can be any number of statistical variable columns -- but no other types of additional columns -- and these two CSV files could be combined into one.)
- Every row is a separate [_observation_](/glossary.html#observation), or a value of the variable for a given place and time.

This is the scheme to which your data must conform if you want to take advantage of Data Commons' simple import facility. If your data doesn't follow this model, you'll need to do some more work to prepare or configure it for correct loading. (That topic is discussed in detail in [Preparing and loading your data](custom_data.md).)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should use the word "format" vs "scheme" do describe the shape of the data. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if you think that's better.

glossary.md Outdated
### [Statistical Variable](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}
{: #variable}

Any type of metric, statistic, or measure that can be measured at a place and time. Examples include [median income of persons older than 16](https://datacommons.org/browser/Median_Income_Person_16OrMoreYears){: target="_blank"}, [number of female high school graduates aged 18 to 24](https://datacommons.org/browser/Count_Person_18To24Years_EducationalAttainmentHighSchoolGraduateIncludesEquivalency_Female){: target="_blank"}, [unemployment rate](https://browser.datacommons.org/browser/UnemploymentRate_Person){: target="_blank"}, or [percentage of persons with diabetes](https://browser.datacommons.org/browser/Percent_Person_WithDiabetes){: target="_blank"}. A complete list of variables can be found in the [Knowledge Graph](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's typically associated with a place and time but in the more generic sense, it can be associated with any entity in the graph (e.g. business unit, power plant, etc.) and time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I actually didn't write or update this, I only put it in alphabetical order. :-) But updated as you suggest.

Copy link
Contributor

@keyurva keyurva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kara!

custom_dc/launch_cloud.md Outdated Show resolved Hide resolved
- There are only 3 columns: one representing a place (`countryAlpha3Code`); one representing a date (`date`); and one representing a [_statistical variable_](/glossary.html#variable), which is a Data Commons concept for a metric: `average_annual_wage` and `gender_wage_gap`. (Actually, there can be any number of statistical variable columns -- but no other types of additional columns -- and these two CSV files could be combined into one.)
- Every row is a separate [_observation_](/glossary.html#observation), or a value of the variable for a given place and time.
- There are only 3 columns: one representing a place (`countryAlpha3Code`, a [special Data Commons place type](/custom_dc/custom_data.html#special-names)); one representing a date (`date`); and one representing a [_statistical variable_](/glossary.html#variable), which is a Data Commons concept for a metric: `average_annual_wage` and `gender_wage_gap`. (Actually, there can be any number of statistical variable columns -- but no other types of additional columns -- and these two CSV files could be combined into one.)
- Every row is a separate [_observation_](/glossary.html#observation), or a value of the variable for a given place and time. In the case where multiple statistical variable columns, each row would, of course, consist of multiple observations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Sentence needs some rewording? "In the case where there are multiple..." or something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, yes. Thanks for catching.

@kmoscoe kmoscoe merged commit cafd60b into datacommonsorg:master Aug 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants