Skip to content

Commit

Permalink
Add API search doc as markdown (#172)
Browse files Browse the repository at this point in the history
  • Loading branch information
diversemix authored Oct 12, 2023
1 parent 1870874 commit 2b2cd9e
Showing 1 changed file with 270 additions and 0 deletions.
270 changes: 270 additions & 0 deletions docs/api/search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# Search API Specification

**3<sup>**rd**</sup> October 2023**

***


# **VERSION HISTORY**

| | | | |
| --- | --------------------------------------------------------------------------------------------------------------------------- | --------- | ------------ |
| 1.0 | Initial version based on [v1.3.15-beta.](https://github.com/climatepolicyradar/navigator-backend/releases/tag/v1.3.15-beta) | 3/10/2023 | Peter Hooper |
| | | | |


# **PURPOSE**

This document is intended to explain the use of our search API for external developers and integrators. 


# **BACKGROUND**

The API  is a typical REST API where the requests and responses are encoded as `application/json`


# **SEARCH ENDPOINT**

| | |
| :------: | ---------------- |
| **POST** | /api/v1/searches |

There is **_no_** authentication required for using this interface. 

❗ We ask that users be respectful of its use and remind users that data is available to download on request. 

The search endpoint behaves in two distinct ways:

1. In “Browse” mode - this is when an empty` query_string `is provided. This mode does not use Opensearch, rather queries the structured data (postgresql) directly, using the other supplied filter fields.
2. In “Search” mode - when a `query_string `is provided. A query is constructed sent to Opensearch and the response is augmented with the structured data before being returned in the same response scheme.


## **Request Payload**

The payload is a JSON object representing the search to be performed. This can be seen in [code here](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L62) and is described in further detail below.

```
{
  "query_string": "string",
  "exact_match": false,
  "max_passages_per_doc": 10,
  "keyword_filters": {
"additionalProp1": ["string"],
"additionalProp2": ["string"],
"additionalProp3": ["string"]
  },
  "year_range": ["string","string"],
  "sort_field": "date",
  "sort_order": "desc",
  "include_results": ["pdfsTranslated"],
  "limit": 10,
  "offset": 0
}
```

### Properties

#### query_string

A string representation of the search to be performed, example “Adaptation strategy”

#### exact_match

Boolean value to indicate if the `query_string `should be treated as an exact match when the search is performed. 

#### max_passages_per_doc (optional, default is 10)

The maximum number of matched passages to be returned for a single document.

#### keyword_filters (optional)

This is an object containing a map of fields and their values to filter on. The allowed fields can be found in [code here](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L34).

#### year_range

This is an array containing exactly two values, which can be null or an integer representing the years to search between. Examples: 

`[2000, 2023]` - Would search between 2000 and 2023 inclusive.

`[null, 2023]` - Would search from 1947 up to 2023 inclusive.

`[2000, null]` - Would search from 2000 to the current date.

`[null, null]` - Does not filter by date.

Further information and understanding can be found by [reading the tests here](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/tests/routes/test_search.py#L634).

#### sort_field (optional) & sort_order (optional, defaults to descending)

The field to sort by can be chosen from “date” or “title” [see related code](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L20).

The order can be chosen from ascending (use “asc”) or descending (use “desc”), [see related code](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L13).

#### include_results (optional)

This is an array that lists the indexes to use when performing the search. The values can be chosen from: “pdfsTranslated”, “htmlsNonTranslated”, “htmlsTranslated”

#### limit & offset

These values control pagination, allowing a front end application to page through the results. The `limit` refers to the maximum number of results to return and `offset` where to start returning the results from that were retrieved via the backend.


## **Response Payload**

The response returns a list of families and includes their associated documents along with their passage matches. The payload has the following scheme:

```
{
  "hits": 0,
  "query_time_ms": 0,
  "total_time_ms": 0,
  "families": [ <see family below> ]
}
```

### Properties

#### hits

The total number of families that meet the search criteria.

#### query_time_ms

The time Opensearch spent performing the query.

#### total_time_ms

The total time spent in getting the response. 

#### families

A list of family objects, each following the scheme below:

```
{
  "family_slug": "string",
  "family_name": "string",
  "family_description": "string",
  "family_category": "string",
  "family_date": "string",
  "family_last_updated_date": "string",
  "family_source": "string",
  "family_geography": "string",
  "family_metadata": {},
  "family_title_match": true,
  "family_description_match": true,
  "family_documents": [ < see family document below > ]
}
```

#### family_slug

The slug that forms part of the URL to navigate to the family. Example, with a slug of  `climate-change-adaptation-strategy_1882`, a URL can be created to this family of documents as:

 [`https://app.climatepolicyradar.org/document/climate-change-adaptation-strategy_1882`](https://app.climatepolicyradar.org/document/climate-change-adaptation-strategy_1882)

#### family_name

The name of the family.

#### family_description

The description of the family.

#### family_category

The family category, for example: Executive (see list in [code here](https://github.com/climatepolicyradar/navigator-backend/blob/1529e0ff85b73a8e52a94e7eb510e3882307e64e/app/db/models/law_policy/family.py#L15))

#### family_date

The date the family of documents was published, this is from the corresponding “Passed/Approved” event for this family.

#### family_last_updated_date

The date the family of documents was published, this is from the most recent event of this family of documents.

#### family_source

The source, currently organisation name. Either “CCLW” or “UNFCCC”

#### family_geography

The geographical location of the family in [ISO 3166-1 alpha-3](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3)

#### family_metadata

An object if metadata for the family, the schema will change given the `family_source.`

#### family_title_match

Boolean value that is true if the search is matched within the family’s title.

#### family_description_match

Boolean value that is true if the search is matched within the family’s description.

#### family_documents

A list of the family’s documents in the following scheme:

```
{
   "document_title": "string",
   "document_slug": "string",
   "document_type": "string",
   "document_source_url": "string",
   "document_url": "string",
   "document_content_type": "string",
   "document_passage_matches": [{
        "text": "string",
        "text_block_id": "string",
        "text_block_page": 0,
        "text_block_coords": [ ["string","string"] ]
    }]
}
```

#### document_title

The title of the document.

#### document_slug

The slug that forms part of the URL to navigate to the particular document. Example, with a slug of  \``national-climate-change-adaptation-strategy_06f8`, a` `URL can be created to the document as:

https://app.climatepolicyradar.org/documents/national-climate-change-adaptation-strategy_06f8

#### document_type

The type of document, for example: “Strategy”, see the [loaded metadata here](https://github.com/climatepolicyradar/navigator-backend/blob/1529e0ff85b73a8e52a94e7eb510e3882307e64e/app/data_migrations/data/law_policy/document_type_data.json).

#### document_source_url

The source url of the external site that was used to ingest into the system.

#### document_url

The CDN url of where the document can be found within our system.

#### document_content_type

The content\_type of the document found at the above URLs. [Complete list is available at the IANA site](https://www.iana.org/assignments/media-types/media-types.xhtml). Most common is “`application/pdf`” and “`text/html`”.

#### document_passage_matches

This is a list of passages that match the search criteria within this document. The length of which is affected by `max_passages_per_doc `in the request.` `This is used for passage highlighting, please contact us for further information should you wish to use this data.


## **Examples**

The following examples are of using curl to call the API endpoint to retrieve results via the command line.

```
API_HOST=https://app.climatepolicyradar.org
curl "$API_HOST/api/v1/searches" \
     -X POST \
     -H 'Accept: application/json' \
     -H 'Content-Type: application/json' \
     --data-raw '{"query_string":"", "exact_match":true, "keyword_filters":{}, "sort_field":null, "sort_order":"desc", "limit":100, "offset":0}'
```

0 comments on commit 2b2cd9e

Please sign in to comment.