Add docs for pandas API. (datacommonsorg#55)

miss-o-soup · Aug 26, 2020 · 6e21c83 · 6e21c83
1 parent 87314ca
commit 6e21c83
Show file tree

Hide file tree

Showing 16 changed files with 357 additions and 28 deletions.
diff --git a/Gemfile.lock b/Gemfile.lock
@@ -16,9 +16,9 @@ GEM
     colorator (1.1.0)
     commonmarker (0.17.13)
       ruby-enum (~> 0.5)
-    concurrent-ruby (1.1.6)
-    dnsruby (1.61.3)
-      addressable (~> 2.5)
+    concurrent-ruby (1.1.7)
+    dnsruby (1.61.4)
+      simpleidn (~> 0.1)
     em-websocket (0.5.1)
       eventmachine (>= 0.12.9)
       http_parser.rb (~> 0.6.0)
@@ -31,9 +31,9 @@ GEM
     ffi (1.13.1)
     forwardable-extended (2.6.0)
     gemoji (3.0.1)
-    github-pages (206)
+    github-pages (207)
       github-pages-health-check (= 1.16.1)
-      jekyll (= 3.8.7)
+      jekyll (= 3.9.0)
       jekyll-avatar (= 0.7.0)
       jekyll-coffeescript (= 1.1.1)
       jekyll-commonmark-ghpages (= 0.1.6)
@@ -67,7 +67,8 @@ GEM
       jekyll-theme-time-machine (= 0.1.1)
       jekyll-titles-from-headings (= 0.5.3)
       jemoji (= 0.11.1)
-      kramdown (= 1.17.0)
+      kramdown (= 2.3.0)
+      kramdown-parser-gfm (= 1.1.0)
       liquid (= 4.0.3)
       mercenary (~> 0.3)
       minima (= 2.5.1)
@@ -80,20 +81,20 @@ GEM
       octokit (~> 4.0)
       public_suffix (~> 3.0)
       typhoeus (~> 1.3)
-    html-pipeline (2.13.0)
+    html-pipeline (2.14.0)
       activesupport (>= 2)
       nokogiri (>= 1.4)
     http_parser.rb (0.6.0)
     i18n (0.9.5)
       concurrent-ruby (~> 1.0)
-    jekyll (3.8.7)
+    jekyll (3.9.0)
       addressable (~> 2.4)
       colorator (~> 1.0)
       em-websocket (~> 0.5)
       i18n (~> 0.7)
       jekyll-sass-converter (~> 1.0)
       jekyll-watch (~> 2.0)
-      kramdown (~> 1.14)
+      kramdown (>= 1.17, < 3)
       liquid (~> 4.0)
       mercenary (~> 0.3.3)
       pathutil (~> 0.9)
@@ -191,7 +192,10 @@ GEM
       gemoji (~> 3.0)
       html-pipeline (~> 2.2)
       jekyll (>= 3.0, < 5.0)
-    kramdown (1.17.0)
+    kramdown (2.3.0)
+      rexml
+    kramdown-parser-gfm (1.1.0)
+      kramdown (~> 2.0)
     liquid (4.0.3)
     listen (3.2.1)
       rb-fsevent (~> 0.10, >= 0.10.3)
@@ -215,6 +219,7 @@ GEM
     rb-fsevent (0.10.4)
     rb-inotify (0.10.1)
       ffi (~> 1.0)
+    rexml (3.2.4)
     rouge (3.19.0)
     ruby-enum (0.8.0)
       i18n
@@ -228,13 +233,18 @@ GEM
     sawyer (0.8.2)
       addressable (>= 2.3.5)
       faraday (> 0.8, < 2.0)
+    simpleidn (0.1.1)
+      unf (~> 0.1.4)
     terminal-table (1.8.0)
       unicode-display_width (~> 1.1, >= 1.1.1)
     thread_safe (0.3.6)
     typhoeus (1.4.0)
       ethon (>= 0.9.0)
     tzinfo (1.2.7)
       thread_safe (~> 0.1)
+    unf (0.1.4)
+      unf_ext
+    unf_ext (0.0.7.7)
     unicode-display_width (1.7.0)
     zeitwerk (2.4.0)
 
@@ -246,4 +256,4 @@ DEPENDENCIES
   jekyll-feed (~> 0.6)
 
 BUNDLED WITH
-   2.0.2
+   2.1.4
diff --git a/api/pandas/index.md b/api/pandas/index.md
@@ -0,0 +1,84 @@
+---
+layout: default
+title: Pandas
+nav_order: 3
+parent: API
+has_children: true
+---
+# Data Commons Pandas API
+
+The **Data Commons Pandas API** is a superset of the Data Commons Python API:
+all functions from the Python API are also accessible from
+the Pandas API, and supplemental functions help with directly creating
+[pandas](https://pandas.pydata.org/)
+objects using data from the Data Commons knowledge graph for common pandas
+use cases. Please see the [Data Commons API Overview](/api) for more details
+on the design and structure of the API.
+
+Before proceeding, make sure you have followed the setup instructions below.
+
+## Getting Started
+
+To get started using the Pandas API:
+
+*   Install the API using `pip`.
+*   (Optional) Create an API key and enable the **Data Commons API**.
+*   Begin developing with the Pandas API
+
+### Installing the Pandas API
+
+First, install the `datacommons_pandas` package through `pip`.
+
+```bash
+$ pip install datacommons_pandas
+```
+
+For more information about installing `pip` and setting up other parts of
+your Python development environment, please refer to the
+[Python Development Environment Setup Guide](https://cloud.google.com/python/setup.html)
+for Google Cloud Platform.
+
+### Creating an API Key (Optional)
+
+If you would like to provide an API key, follow the steps in [the API setup
+guide](/api/setup.html). Data Commons *does not charge* users, but uses the
+API key for understanding API usage.
+
+With the API key created and Data Commons API activated, we can now get started
+using the pandas API. There are two ways to provide your key
+to the pandas API package.
+
+1.  You can set the API key by calling `datacommons_pandas.set_api_key`.
+    Start by importing `datacommons_pandas`, then set the API key like so.
+
+    ```python
+    import datacommons_pandas as dcpd
+
+    dcpd.set_api_key('YOUR-API-KEY')
+    ```
+
+    This will create an environment variable in your Python runtime called
+    `DC_API_KEY` holding your key. Your key will then be used whenever
+    the package sends a request to the Data Commons graph.
+
+1.  You can export an environment variable in your shell like so.
+
+    ```python
+    export DC_API_KEY='YOUR-API-KEY'
+    ```
+
+    After you've exported the variable, you can start using the Data Commons
+    package.
+
+    ```
+    import datacommons_pandas as dcpd
+    ```
+
+    This route is particularly useful if you are building applications that
+    depend on this API, and are deploying them to hosting services.
+
+### Using the Pandas API
+
+You are ready to go! From here you can view our [tutorials](/tutorials.html) on how to use the
+API to perform certain tasks, or see a full list of functions, classes and
+methods available for use in the sidebar.
diff --git a/api/pandas/multivariate_dataframe.md b/api/pandas/multivariate_dataframe.md
@@ -0,0 +1,88 @@
+---
+layout: default
+title: Multivariate Table as pd.DataFrame
+nav_order: 3
+parent: Pandas
+grand_parent: API
+---
+
+# Get Multivariate DataFrame
+
+## `datacommons_pandas.build_multivariate_dataframe(places, stats_vars)`
+
+Returns a `pandas.DataFrame` with [`places`](https://datacommons.org/browser/Place)
+as index and [`stat_vars`](https://datacommons.org/browser/StatisticalVariable)
+as columns, where each cell is latest observed statistic for
+its `Place` and `StatisticalVariable`.
+
+See the [full list of `StatisticalVariable`s](/statistical_variables.html).
+
+**Arguments**
+
+*   `places (Iterable of str)`: A list of dcids of the
+    [`Place`](https://datacommons.org/browser/Place)s to query for.
+
+*   `stat_vars (Iterable of str)`: A list of dcids of the
+    [`StatisticalVariable`](https://datacommons.org/browser/StatisticalVariable)s
+    to query for.
+
+**Returns**
+
+A `pandas.DataFrame` with [`places`](https://datacommons.org/browser/Place)
+(str)
+as index and [`stat_vars`](https://datacommons.org/browser/StatisticalVariable)
+(str) as columns, where each cell is latest observed statistic (float) for
+its `Place` and `StatisticalVariable`.
+
+**Raises**
+
+* `ValueError` - If no statistical values found for the given parameters.
+
+Be sure to initialize the library. See the
+[datacommons_pandas library setup guide](/api/pandas/) for more details.
+
+You can find a list of `StatisticalVariable`s with human-readable names [here](/statistical_variables.html).
+
+## Examples
+
+We would like to get a DataFrame of
+
+- [Count_Person](https://datacommons.org/browser/Count_Person)
+- [Median_Age_Person](https://datacommons.org/browser/Median_Age_Person)
+- [UnemploymentRate_Person](https://datacommons.org/browser/UnemploymentRate_Person)
+
+for
+[the United States](https://datacommons.org/browser/country/USA),
+[California](https://datacommons.org/browser/geoId/06),and
+[Santa Clara County](https://datacommons.org/browser/geoId/06085).
+
+```python
+>>> import datacommons_pandas as dcpd
+>>> dcpd.build_multivariate_dataframe(["country/USA", "geoId/06", "geoId/06085"],
+                  ["Count_Person", "Median_Age_Person", "UnemploymentRate_Person"])
+             Count_Person  Median_Age_Person  UnemploymentRate_Person
+place                                                                
+country/USA     328239523               37.9                      NaN
+geoId/06         39512223               36.3                     15.1
+geoId/06085       1927852               37.0                     10.7
+```
+
+In the next example, there is no data about
+`RetailDrugDistribution_DrugDistribution_14Hydroxycodeinone` nor
+`RetailDrugDistribution_DrugDistribution_Amphetamine` for non-USA
+places, so the API throws ValueError for no data:
+
+```python
+>>> import datacommons_pandas as dcpd
+>>> dcpd.build_multivariate_dataframe(
+      ["country/MEX", "nuts/AT32"],
+      ["RetailDrugDistribution_DrugDistribution_14Hydroxycodeinone",
+      "RetailDrugDistribution_DrugDistribution_Amphetamine"
+      ]
+    )
+ValueError    Traceback (most recent call last)
+...
+-->    raise ValueError('No data for any of specified Places and StatisticalVariables.')
+
+ValueError: No data for any of specified places and stat_vars.
+```
diff --git a/api/pandas/time_series.md b/api/pandas/time_series.md
@@ -0,0 +1,74 @@
+---
+layout: default
+title: Time Series as pd.Series
+nav_order: 1
+parent: Pandas
+grand_parent: API
+---
+
+# Get Time Series for a Place
+
+## `datacommons_pandas.build_time_series(place, stat_var, measurement_method=None,observation_period=None, unit=None, scaling_factor=None)`
+
+Returns a `pandas.Series` representing a time series for the [`place`](https://datacommons.org/browser/Place) and
+[`stat_var`](https://datacommons.org/browser/StatisticalVariable) satisfying any optional parameters.
+
+See the [full list of `StatisticalVariable`s](/statistical_variables.html).
+
+**Arguments**
+
+* `place (str)`: The `dcid` of the [`Place`](https://datacommons.org/browser/Place) to query for.
+
+* `stat_var (str)`: The `dcid` of the
+  [`StatisticalVariable`](https://datacommons.org/browser/StatisticalVariable).
+
+* `measurement_method (str)`: (Optional) The `dcid` of the preferred [`measurementMethod`](https://datacommons.org/browser/measurementMethod) for the `stat_var`.
+
+* `observation_period (str)`: (Optional) The preferred [`observationPeriod`](https://datacommons.org/browser/observationPeriod) for the `stat_var`. This is an [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) such as "P1M" (one month).
+
+* `unit (str)`: (Optional) The `dcid` of the preferred [`unit`](https://datacommons.org/browser/unit) for the `stat_var`.
+
+* `scaling_factor (int)`: (Optional) The preferred [`scalingFactor`](https://datacommons.org/browser/scalingFactor) for the `stat_var`.
+
+**Returns**
+
+ A `pandas.Series` with dates (str) as index for observed values (float) for the `stat_var` and `place`.
+
+**Raises**
+
+* `ValueError` - If no statistical value found for the place with the given parameters.
+
+Be sure to initialize the library. Check the [datacommons_pandas library setup guide](/api/pandas/) for more details.
+
+You can find a list of `StatisticalVariable`s with human-readable names [here](/statistical_variables.html).
+
+## Examples
+
+We would like to get the [male population](https://datacommons.org/browser/Count_Person_Male) in [Arkansas](https://datacommons.org/browser/geoId/05)
+
+```python
+>>> import datacommons_pandas as dcpd
+>>> dcpd.build_time_series("geoId/05", "Count_Person_Male")
+2015    1451913
+2016    1456694
+2017    1461651
+2018    1468412
+2011    1421287
+2012    1431252
+2013    1439862
+2014    1447235
+dtype: int64
+```
+
+In the next example, the parameter `observation_period='P3Y'` overly constrains the request so the API
+throws ValueError:
+
+```python
+>>> import datacommons_pandas as dcpd
+>>> dcpd.build_time_series('geoId/06085', 'Count_Person', observation_period='P3Y')
+ValueError    Traceback (most recent call last)
+...
+-->          raise ValueError('No data in response.')
+
+ValueError: No data in response.
+```