Skip to content

Commit

Permalink
Overhaul for new API key process (#500)
Browse files Browse the repository at this point in the history
* troubleshooting API

* Revert to gerund

Though the style guide says to just use imperatives, "get started" just sounds weird. Also this is more consistent with "troubleshooting"

* repoint sections on finding DCs

* add additional title

* merge

* added info about getting keys to API docs

* add custom DC info:

* update troubleshooting page

* try to fix nav order

* fix up mistakes from earlier merge

* more fixes

* fix lowercase v everywhere

* fix erroneous change to v1 file

* remove references to Apigee everywhere

* remove links to DataGemma colab tutorials

* added link to DataGemma docs

* fix devsite URL

* merge

* updated hostname and naming

* reworded about separate keys

* fix broken link

* fix broken link again
  • Loading branch information
kmoscoe authored Sep 11, 2024
1 parent 0176775 commit 880ee05
Show file tree
Hide file tree
Showing 17 changed files with 283 additions and 271 deletions.
42 changes: 36 additions & 6 deletions api/index.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,24 @@
---
layout: default
title: API
nav_order: 20
nav_order: 0
has_children: true
---


# Overview
# API overview

[Data Commons](https://datacommons.org){: target="_blank"} aggregates data from many
different [data sources](https://datacommons.org/datasets){: target="_blank"} into a single
database. Data Commons is based on the data model used by
[schema.org](https://schema.org){: target="_blank"}; for more information, see [Key concepts](/data_model.html).

The Data Commons APIs allow developers to programmatically access the data in Data Commons.
Data Commons provides several different ways to access its resources:
The Data Commons APIs allow developers to programmatically access the data in Data Commons, using the following technologies:

* A [REST API](/api/rest/v2) that can be used on the command line as well as in any language with an HTTP library.
* [Python](/api/python) and [Pandas](/api/pandas) wrappers.

> **Note:** The Python and Pandas APIs wrap the [v1](/api/rest/v1) version of the REST APIs and have not yet been updated to v2.
> **Note:** The Python and Pandas APIs wrap the [V1](/api/rest/v1) version of the REST APIs and have not yet been updated to v2.
The endpoints can be roughly grouped into four categories:

Expand All @@ -33,4 +32,35 @@ The endpoints can be roughly grouped into four categories:
graph query language [SPARQL](https://www.w3.org/TR/rdf-sparql-query/){: target="_blank"}. This is useful for complex node connections which would require multiple API calls; for example, "hate crimes motivated by disability status in Californian cities".

- **Utilities**: These are Python notebook-specific APIs for helping with
Pandas DataFrames, etc.
Pandas DataFrames, etc.

In addition, Data Commons provides additional tools for accessing its data that call the REST APIs under the hood:

- [Google Sheets](sheets/index.md): provides several custom functions that populate spreadsheets with data from the Data Commons knowledge graph
- [Web Components](web_components/index.md): provides JavaScript APIs and HTML templates that allow you to embed Data Commons data and visualizations into web pages


## API keys {: #get-key}

A key is required by some APIs to authenticate and authorize requests.
- All REST [V2](rest/v2/index.md) and [V1](rest/v1/index.md) APIs. These requests are served by endpoints at `api.datacommons.org`.
- All requests coming from a custom Data Commons instance. These are also served by `api.datacommons.org`.
- Data Commons NL API requests (used by the [DataGemma](https://ai.google.devgit/gemma/docs/datagemma){: target="_blank"} tool). These are served by endpoints at `nl.datacommons.org`.

A key is currently not required for the following, although this may change in the future:
- Python and Pandas client libraries other than NL APIs
- V0 REST APIs
- Google Sheets
- Web Components

### Obtain an API key

Data Commons API keys are managed by a self-service portal. To obtain an API key, go to [https://apikeys.datacommons.org](https://apikeys.datacommons.org){: target="_blank"} and request a key for the hostname(s) listed above. Enable each of the APIs you want; you can share a single key for all of them.

To use the key in requests, see the relevant documentation:
- For REST V2 APIs, see the section on [Authentication](/api/rest/v2/index.html#authentication).
- For REST V1 APIs, see the section on [Authentication](/api/rest/v2/getting_started.html#authentication).
- For NL APIs in DataGemma, see the Colab notebooks in [https://github.com/datacommonsorg/llm-tools/tree/main/notebooks](https://github.com/datacommonsorg/llm-tools/tree/main/notebooks){: target="_blank"}



6 changes: 3 additions & 3 deletions api/pandas/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: Pandas
nav_order: 30
nav_order: 50
parent: API
has_children: true
---
Expand All @@ -15,7 +15,7 @@ the Pandas API, and supplemental functions help with directly creating
objects using data from the Data Commons knowledge graph for common
use cases.

> **Note:** The Pandas API only supports [v1](/api/rest/v1/index.html) of the REST APIs.
> **Note:** The Pandas API only supports [V1](/api/rest/v1/index.html) of the REST APIs.
Before proceeding, make sure you have followed the setup instructions below.

Expand All @@ -27,7 +27,7 @@ Before proceeding, make sure you have followed the setup instructions below.
```bash
$ pip install datacommons_pandas
```
You are ready to go! You can view our [tutorials](tutorials.md) on how to use the
You are ready to go! You can view our [tutorials](/api/python/tutorials.html) on how to use the
API to perform certain tasks using [Google Colab](https://colab.sandbox.google.com/){: target="_blank"}, or refer to pages in the navigation bar for detailed information about all the methods available.

## Run Python interactively
Expand Down
4 changes: 2 additions & 2 deletions api/python/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: Python
nav_order: 20
nav_order: 40
parent: API
has_children: true
---
Expand All @@ -13,7 +13,7 @@ programmatically access nodes in the Data Commons knowledge graph. This package
allows users to explore the structure of the graph, integrate statistics from
the graph into data analysis workflows and much more.

> **Note:** The Python API only supports [v1](/api/rest/v1) of the REST APIs.
> **Note:** The Python API only supports [V1](/api/rest/v1) of the REST APIs.
Before proceeding, make sure you have followed the setup instructions below.

Expand Down
2 changes: 1 addition & 1 deletion api/python/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Returns the results of running a graph query on the Data Commons knowledge graph
using [SPARQL](https://www.w3.org/TR/rdf-sparql-query/){: target="_blank"}. Note that Data Commons is only
able to support a limited subsection of SPARQL functionality at this time: specifically only the keywords `ORDER BY`, `DISTINCT`, and `LIMIT`.

Note: The Python SPARQL library currently only supports the [v1](/api/v1/query.html) version of the API.
Note: The Python SPARQL library currently only supports the [V1](/api/v1/query.html) version of the API.

## General information about the query() method

Expand Down
14 changes: 7 additions & 7 deletions api/python/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@ Python:

- [Getting Started: Analyzing Census Data](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/analyzing_census_data.ipynb){:target="_blank"}

- [Case Study: COVID-19 Feature Exploration Analysis (by Google Health)](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/COVID_19_Feature_Exploration_Analysis_with_Data_Commons.ipynb){:target="_blank"}
- [COVID-19 Feature Exploration Analysis (by Google Health)](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/COVID_19_Feature_Exploration_Analysis_with_Data_Commons.ipynb){:target="_blank"}

- [Case Study: Analyzing Income Distribution](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/analyzing_income_distribution.ipynb){:target="_blank"}
- [Analyzing Income Distribution](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/analyzing_income_distribution.ipynb){:target="_blank"}

- [Case Study: Prevalence of Obesity in 500 US Cities](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/analyzing_obesity_prevalence.ipynb){:target="_blank"}
- [Prevalence of Obesity in 500 US Cities](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/analyzing_obesity_prevalence.ipynb){:target="_blank"}

- [Case Study: Analyzing Genomic Data with Biomedical Data Commons](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/analyzing_genomic_data.ipynb){:target="_blank"}
- [Analyzing Genomic Data with Biomedical Data Commons](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/analyzing_genomic_data.ipynb){:target="_blank"}

- [Case Study: Drug Discovery with Biomedical Data Commons](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/Drug_Discovery_With_Data_Commons.ipynb){:target="_blank"}
- [Drug Discovery with Biomedical Data Commons](https://colab.research.google.com/github/datacommonsorg/api-python/blob/master/notebooks/Drug_Discovery_With_Data_Commons.ipynb){:target="_blank"}

- [Case Study: Analyzing Superfund Sites with Data Commons](https://colab.sandbox.google.com/github/datacommonsorg/api-python/blob/master/notebooks/Analyzing_SuperfundSites_with_Data_Commons.ipynb){:target="_blank"}
- [Analyzing Superfund Sites with Data Commons](https://colab.sandbox.google.com/github/datacommonsorg/api-python/blob/master/notebooks/Analyzing_SuperfundSites_with_Data_Commons.ipynb){:target="_blank"}

- [Case Study: Estimating CMIP6 Temperature Distributions](https://colab.sandbox.google.com/github/datacommonsorg/api-python/blob/master/notebooks/Estimating_(Temperature)_Distributions_With_DataCommons_.ipynb){:target="_blank"}
- [Estimating CMIP6 Temperature Distributions](https://colab.sandbox.google.com/github/datacommonsorg/api-python/blob/master/notebooks/Estimating_(Temperature)_Distributions_With_DataCommons_.ipynb){:target="_blank"}
3 changes: 1 addition & 2 deletions api/rest/v1/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,7 @@ We've provided a trial API key for general public use. This key will let you try
`AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI`
</div>

<b>The trial key is capped with a limited quota for requests.</b> If you are planning on using our APIs more rigorously (e.g. for personal or school projects, developing applications, etc.) please request one by
[filling out this form](https://docs.google.com/forms/d/e/1FAIpQLSeVCR95YOZ56ABsPwdH1tPAjjIeVDtisLF-8oDYlOxYmNZ7LQ/viewform) and selecting "API access" to request an official key without any quota limits. We'll be happy to hear from you!
<b>The trial key is capped with a limited quota for requests.</b> If you are planning on using our APIs more rigorously (e.g. for personal or school projects, developing applications, etc.) please go to the portal at https://apikeys.datacommons.org and request a key for `api.datacommons.org`.

### Pagination
{: #pagination}
Expand Down
226 changes: 3 additions & 223 deletions api/rest/v2/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,225 +1,5 @@
---
layout: default
title: Getting started
nav_order: 0
parent: REST (v2)
grand_parent: API
published: true
layout: redirect
redirect: /api/rest/v2/index.html
nav_exclude: true
---

{:.no_toc}
# Getting started

* TOC
{:toc}

Following HTTP, a REST API call consists of a _request_ that you provide, and a _response_ from the Data Commons servers with the data you requested, in [JSON](https://json.org){: target="_blank"} format. The following sections detail how to assemble a request.

## Service endpoints

You make requests through [API endpoints](https://en.wikipedia.org/wiki/Web_API#Endpoints){: target="_blank"}. You access each endpoint using its unique URL, which is a combination of a base URL and the endpoint's [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier){: target="_blank"}.

The base URL for all REST endpoints is:

<pre>
https://api.datacommons.org/<var>VERSION</var>
</pre>

The current version is `v2`.

To access a particular endpoint, append the URI to the base URL (e.g. `https://api.datacommons.org/v2/node` ).
The URIs for the V2 API are below:

| API | URI path | Description |
| --- | --- | ----------- |
| Node | [/node](/api/rest/v2/node) | Fetches information about edges and neighboring nodes |
| Observation | [/observation](/api/rest/v2/observation) | Fetches statistical observations |
| Resolve entities | [/resolve](/api/rest/v2/resolve) | Returns a Data Commons ID ([`DCID`](/glossary.html#dcid)) for entities in the graph |
| SPARQL | [/v2/sparql](/sparql) | Returns matches to a [SPARQL](https://www.w3.org/TR/rdf-sparql-query/){: target="_blank"} graph query |

### Endpoints for custom instances

If you are running your own Data Commons, the URL/URI endpoints are slightly different:

<pre>
<var>CUSTOM_URL</var>/core/api/v2
</pre>

## Query parameters {#query-param}

Endpoints take a set of parameters which allow you to specify the entities, variables, timescales, etc. you are interested in. The V2 APIs only use query parameters.

Query parameters are chained at the end of a URL behind a `?` symbol. Separate multiple parameter entries with an `&` symbol. For example, this would look like:

<pre>
https://api.datacommons.org/v2/node?key=<var>API_KEY</var>&nodes=<var>DCID1</var>&nodes=<var>DCID2</var>&property=<-*
</pre>

Still confused? Each endpoint's documentation page has examples at the bottom tailored to the endpoint you're trying to use.

## POST requests

All V2 endpoints allow for POST requests. For POST requests, feed all parameters in JSON format. For example, in cURL, this would look like:

<pre>
curl -X POST \
-H "X-API-Key: <var>API_KEY</var>" \
--url https://api.datacommons.org/v2/node \
--data '{
"nodes": [
"geoId/06085",
"geoId/06086"
],
"property": "->[name, latitude, longitude]"
}'
</pre>


{: #authentication}
## Authentication

API keys are required in any REST API request. To obtain an API key, please see [Get API key](/api//rest/v2/index.html#get-key).

> **Note:** If you are sending API requests to a custom Data Commons instance, do _not_ include any API key in the requests.
To include an API key, add your API key to the URL as a query parameter by appending <code>?key=<var>API_KEY</var></code>.

For GET requests, this looks like:

<pre>
https://api.datacommons.org/v2/<var>ENDPOINT</var>?key=<var>API_KEY</var>
</pre>

If the key is not the first query parameter, use <code>&key=<var>API_KEY</var></code> instead. This looks like:

<pre>
https://api.datacommons.org/v2/<var>ENDPOINT</var>?<var>QUERY</var>=<var>VALUE</var>&key=<var>API_KEY</var>
</pre>

For POST requests, pass the key as a header. For example, in cURL, this looks like:

<pre>
curl -X POST \
--url https://api.datacommons.org/v2/node \
--header 'X-API-Key: <var>API_KEY</var>' \
--data '{
"nodes": [
"<var>ENTITY_DCID_1</var>",
"<var>ENTITY_DCID_2</var>",
...
],
"property: "<var>RELATION_EXPRESSION</var>"
}'
</pre>

{: #pagination}
## Pagination

When the response to a request is too long, the returned payload is
_paginated_. Only a subset of the response is returned, along with a long string
of characters called a _token_. To get the next set of entries, repeat the
request with `nextToken` as an query parameter, with the token as its value.

For example, the request:

```bash
curl --request GET \
'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId/06&property=<-*'
```

will return something like:

```json
{
"data": {
"geoId/06": {
"arcs": < ... output truncated for brevity ...>
},
},
"nextToken": "SoME_veRy_L0ng_S+rIng"
}
```

To get the next set of entries, repeat the previous command and append the `nextToken`:

```bash
curl --request GET \
'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId/06&property=<-*&nextToken=SoME_veRy_L0ng_S+rIng'
```

Similarly for POST requests, this would look like:

```bash
curl -X POST \
-H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \
--url https://api.datacommons.org/v2/node \
--data '{
"nodes": "geoId/06",
"property": "<-*",
"nextToken": "SoME_veRy_L0ng_S+rIng"
}'
```

{: #relation-expressions}
## Relation expressions

Data Commons represents real world entities and data as nodes. These
nodes are connected by directed edges, or arcs, to form a knowledge graph. The
label of the arc is the name of the [property](/glossary.html#property).

Relation expressions include arrow annotation and other symbols in the syntax to
represent neighboring nodes, and to support chaining and filtering.
These new expressions allow all of the functionality of the V1 API to be
expressed with fewer API endpoints in V2. All V2 API calls require relation
expressions in the `property` or `expression` parameter.

The following table describes symbols in the V2 API relation expressions:

| ------ | ---------- |
| `->` | An outgoing arc |
| `<-` | An incoming arc |
| <code>{<var>PROPERTY</var>:<var>VALUE</var>}</code> | Filtering; identifies the property and associated value |
| `[]` | Multiple properties, separated by commas |
| `*` | All properties linked to this node |
| `+` | One or more expressions chained together for indirect relationships, like `containedInPlace+{typeOf:City}` |

### Incoming and outgoing arcs

Arcs in the Data Commons Graph have directions. In the example below, for the node [Argentina](https://datacommons.org/browser/country/ARG){: target="_blank"}, the property `containedInPlace` exists in both in and out directions, illustrated in the following figure:

![](/assets/images/rest/property_value_direction_example.png)

Note the directionality of the property `containedInPlace`: incoming arc represents "Argentina contains Buenos Aires", while the outgoing arc represents "Argentina is in South America".*

Nodes for outgoing arcs are represented by `->`, while nodes for incoming arcs
arcs are represented by `<-`. To illustrate using the above example:

- Regions that include Argentina (DCID: `country/ARG`): `country/ARG->containedInPlace`
- All cities directly contained in Argentina (DCID: `country/ARG`): `country/ARG<-containedInPlace{typeOf:City}`

### Filters

You can use filters to reduce results to only match nodes with a specified property and value. Use {} to specify property:value pairs to define the filter. Using the same example, `country/ARG<-containedInPlace+{typeOf:City}` only returns nodes with the `typeOf:City`, filtering out `typeOf:AdministrativeArea1` and so on.

### Specify multiple properties

You can combine multiple properties together within `[]`. For example, to request a few outgoing arcs for a node, use
`->[name, latitude, longitude]`. See more in this [Node API example](/api/rest/v2/node.html#multiple-properties)).

### Wildcard

To retrieve all properties linked to a node, use the `*` wildcard, e.g. `<-*`.
See more in this [Node API example](/api/rest/v2/node.html#wildcard).

### Chain properties

Use `+` to express a chain expression. A chain expression represents requests for information about nodes
which are connected by the same property, but are a few hops away. This is supported only for the `containedInPlace` property.

To illustrate again using the Argentina example:
- All cities directly contained in Argentina (dcid: `country/ARG`): `country/ARG<-containedInPlace{typeOf:City}`
- All cities indirectly contained in Argentina (dcid: `country/ARG`): `country/ARG<-containedInPlace+{typeOf:City}`

## Escape codes for reserved characters in GET requests

HTTP GET requests do not allow some of the characters used by Data Commons DCIDs and relation expressions. When sending GET requests, you may need use the [corresponding percent codes](https://en.wikipedia.org/wiki/Percent-encoding){: target="_blank"} for reserved characters.
Loading

0 comments on commit 880ee05

Please sign in to comment.