forked from datacommonsorg/docsite
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add docs for pandas API. (datacommonsorg#55)
- Loading branch information
Showing
16 changed files
with
357 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
layout: default | ||
title: Pandas | ||
nav_order: 3 | ||
parent: API | ||
has_children: true | ||
--- | ||
# Data Commons Pandas API | ||
|
||
The **Data Commons Pandas API** is a superset of the Data Commons Python API: | ||
all functions from the Python API are also accessible from | ||
the Pandas API, and supplemental functions help with directly creating | ||
[pandas](https://pandas.pydata.org/) | ||
objects using data from the Data Commons knowledge graph for common pandas | ||
use cases. Please see the [Data Commons API Overview](/api) for more details | ||
on the design and structure of the API. | ||
|
||
Before proceeding, make sure you have followed the setup instructions below. | ||
|
||
## Getting Started | ||
|
||
To get started using the Pandas API: | ||
|
||
* Install the API using `pip`. | ||
* (Optional) Create an API key and enable the **Data Commons API**. | ||
* Begin developing with the Pandas API | ||
|
||
### Installing the Pandas API | ||
|
||
First, install the `datacommons_pandas` package through `pip`. | ||
|
||
```bash | ||
$ pip install datacommons_pandas | ||
``` | ||
|
||
For more information about installing `pip` and setting up other parts of | ||
your Python development environment, please refer to the | ||
[Python Development Environment Setup Guide](https://cloud.google.com/python/setup.html) | ||
for Google Cloud Platform. | ||
|
||
### Creating an API Key (Optional) | ||
|
||
If you would like to provide an API key, follow the steps in [the API setup | ||
guide](/api/setup.html). Data Commons *does not charge* users, but uses the | ||
API key for understanding API usage. | ||
|
||
With the API key created and Data Commons API activated, we can now get started | ||
using the pandas API. There are two ways to provide your key | ||
to the pandas API package. | ||
|
||
1. You can set the API key by calling `datacommons_pandas.set_api_key`. | ||
Start by importing `datacommons_pandas`, then set the API key like so. | ||
|
||
```python | ||
import datacommons_pandas as dcpd | ||
|
||
dcpd.set_api_key('YOUR-API-KEY') | ||
``` | ||
|
||
This will create an environment variable in your Python runtime called | ||
`DC_API_KEY` holding your key. Your key will then be used whenever | ||
the package sends a request to the Data Commons graph. | ||
|
||
1. You can export an environment variable in your shell like so. | ||
|
||
```python | ||
export DC_API_KEY='YOUR-API-KEY' | ||
``` | ||
|
||
After you've exported the variable, you can start using the Data Commons | ||
package. | ||
|
||
``` | ||
import datacommons_pandas as dcpd | ||
``` | ||
|
||
This route is particularly useful if you are building applications that | ||
depend on this API, and are deploying them to hosting services. | ||
|
||
### Using the Pandas API | ||
|
||
You are ready to go! From here you can view our [tutorials](/tutorials.html) on how to use the | ||
API to perform certain tasks, or see a full list of functions, classes and | ||
methods available for use in the sidebar. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
--- | ||
layout: default | ||
title: Multivariate Table as pd.DataFrame | ||
nav_order: 3 | ||
parent: Pandas | ||
grand_parent: API | ||
--- | ||
|
||
# Get Multivariate DataFrame | ||
|
||
## `datacommons_pandas.build_multivariate_dataframe(places, stats_vars)` | ||
|
||
Returns a `pandas.DataFrame` with [`places`](https://datacommons.org/browser/Place) | ||
as index and [`stat_vars`](https://datacommons.org/browser/StatisticalVariable) | ||
as columns, where each cell is latest observed statistic for | ||
its `Place` and `StatisticalVariable`. | ||
|
||
See the [full list of `StatisticalVariable`s](/statistical_variables.html). | ||
|
||
**Arguments** | ||
|
||
* `places (Iterable of str)`: A list of dcids of the | ||
[`Place`](https://datacommons.org/browser/Place)s to query for. | ||
|
||
* `stat_vars (Iterable of str)`: A list of dcids of the | ||
[`StatisticalVariable`](https://datacommons.org/browser/StatisticalVariable)s | ||
to query for. | ||
|
||
**Returns** | ||
|
||
A `pandas.DataFrame` with [`places`](https://datacommons.org/browser/Place) | ||
(str) | ||
as index and [`stat_vars`](https://datacommons.org/browser/StatisticalVariable) | ||
(str) as columns, where each cell is latest observed statistic (float) for | ||
its `Place` and `StatisticalVariable`. | ||
|
||
**Raises** | ||
|
||
* `ValueError` - If no statistical values found for the given parameters. | ||
|
||
Be sure to initialize the library. See the | ||
[datacommons_pandas library setup guide](/api/pandas/) for more details. | ||
|
||
You can find a list of `StatisticalVariable`s with human-readable names [here](/statistical_variables.html). | ||
|
||
## Examples | ||
|
||
We would like to get a DataFrame of | ||
|
||
- [Count_Person](https://datacommons.org/browser/Count_Person) | ||
- [Median_Age_Person](https://datacommons.org/browser/Median_Age_Person) | ||
- [UnemploymentRate_Person](https://datacommons.org/browser/UnemploymentRate_Person) | ||
|
||
for | ||
[the United States](https://datacommons.org/browser/country/USA), | ||
[California](https://datacommons.org/browser/geoId/06),and | ||
[Santa Clara County](https://datacommons.org/browser/geoId/06085). | ||
|
||
```python | ||
>>> import datacommons_pandas as dcpd | ||
>>> dcpd.build_multivariate_dataframe(["country/USA", "geoId/06", "geoId/06085"], | ||
["Count_Person", "Median_Age_Person", "UnemploymentRate_Person"]) | ||
Count_Person Median_Age_Person UnemploymentRate_Person | ||
place | ||
country/USA 328239523 37.9 NaN | ||
geoId/06 39512223 36.3 15.1 | ||
geoId/06085 1927852 37.0 10.7 | ||
``` | ||
|
||
In the next example, there is no data about | ||
`RetailDrugDistribution_DrugDistribution_14Hydroxycodeinone` nor | ||
`RetailDrugDistribution_DrugDistribution_Amphetamine` for non-USA | ||
places, so the API throws ValueError for no data: | ||
|
||
```python | ||
>>> import datacommons_pandas as dcpd | ||
>>> dcpd.build_multivariate_dataframe( | ||
["country/MEX", "nuts/AT32"], | ||
["RetailDrugDistribution_DrugDistribution_14Hydroxycodeinone", | ||
"RetailDrugDistribution_DrugDistribution_Amphetamine" | ||
] | ||
) | ||
ValueError Traceback (most recent call last) | ||
... | ||
--> raise ValueError('No data for any of specified Places and StatisticalVariables.') | ||
|
||
ValueError: No data for any of specified places and stat_vars. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
layout: default | ||
title: Time Series as pd.Series | ||
nav_order: 1 | ||
parent: Pandas | ||
grand_parent: API | ||
--- | ||
|
||
# Get Time Series for a Place | ||
|
||
## `datacommons_pandas.build_time_series(place, stat_var, measurement_method=None,observation_period=None, unit=None, scaling_factor=None)` | ||
|
||
Returns a `pandas.Series` representing a time series for the [`place`](https://datacommons.org/browser/Place) and | ||
[`stat_var`](https://datacommons.org/browser/StatisticalVariable) satisfying any optional parameters. | ||
|
||
See the [full list of `StatisticalVariable`s](/statistical_variables.html). | ||
|
||
**Arguments** | ||
|
||
* `place (str)`: The `dcid` of the [`Place`](https://datacommons.org/browser/Place) to query for. | ||
|
||
* `stat_var (str)`: The `dcid` of the | ||
[`StatisticalVariable`](https://datacommons.org/browser/StatisticalVariable). | ||
|
||
* `measurement_method (str)`: (Optional) The `dcid` of the preferred [`measurementMethod`](https://datacommons.org/browser/measurementMethod) for the `stat_var`. | ||
|
||
* `observation_period (str)`: (Optional) The preferred [`observationPeriod`](https://datacommons.org/browser/observationPeriod) for the `stat_var`. This is an [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) such as "P1M" (one month). | ||
|
||
* `unit (str)`: (Optional) The `dcid` of the preferred [`unit`](https://datacommons.org/browser/unit) for the `stat_var`. | ||
|
||
* `scaling_factor (int)`: (Optional) The preferred [`scalingFactor`](https://datacommons.org/browser/scalingFactor) for the `stat_var`. | ||
|
||
**Returns** | ||
|
||
A `pandas.Series` with dates (str) as index for observed values (float) for the `stat_var` and `place`. | ||
|
||
**Raises** | ||
|
||
* `ValueError` - If no statistical value found for the place with the given parameters. | ||
|
||
Be sure to initialize the library. Check the [datacommons_pandas library setup guide](/api/pandas/) for more details. | ||
|
||
You can find a list of `StatisticalVariable`s with human-readable names [here](/statistical_variables.html). | ||
|
||
## Examples | ||
|
||
We would like to get the [male population](https://datacommons.org/browser/Count_Person_Male) in [Arkansas](https://datacommons.org/browser/geoId/05) | ||
|
||
```python | ||
>>> import datacommons_pandas as dcpd | ||
>>> dcpd.build_time_series("geoId/05", "Count_Person_Male") | ||
2015 1451913 | ||
2016 1456694 | ||
2017 1461651 | ||
2018 1468412 | ||
2011 1421287 | ||
2012 1431252 | ||
2013 1439862 | ||
2014 1447235 | ||
dtype: int64 | ||
``` | ||
|
||
In the next example, the parameter `observation_period='P3Y'` overly constrains the request so the API | ||
throws ValueError: | ||
|
||
```python | ||
>>> import datacommons_pandas as dcpd | ||
>>> dcpd.build_time_series('geoId/06085', 'Count_Person', observation_period='P3Y') | ||
ValueError Traceback (most recent call last) | ||
... | ||
--> raise ValueError('No data in response.') | ||
|
||
ValueError: No data in response. | ||
``` |
Oops, something went wrong.