Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API search doc as markdown #172

Merged
merged 1 commit into from
Oct 12, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions docs/api/search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# Search API Specification

**3<sup>**rd**</sup> October 2023**

***


# **VERSION HISTORY**

| | | | |
| --- | --------------------------------------------------------------------------------------------------------------------------- | --------- | ------------ |
| 1.0 | Initial version based on [v1.3.15-beta.](https://github.com/climatepolicyradar/navigator-backend/releases/tag/v1.3.15-beta) | 3/10/2023 | Peter Hooper |
| | | | |


# **PURPOSE**

This document is intended to explain the use of our search API for external developers and integrators. 


# **BACKGROUND**

The API  is a typical REST API where the requests and responses are encoded as `application/json`


# **SEARCH ENDPOINT**

| | |
| :------: | ---------------- |
| **POST** | /api/v1/searches |

There is **_no_** authentication required for using this interface. 

❗ We ask that users be respectful of its use and remind users that data is available to download on request. 

The search endpoint behaves in two distinct ways:

1. In “Browse” mode - this is when an empty` query_string `is provided. This mode does not use Opensearch, rather queries the structured data (postgresql) directly, using the other supplied filter fields.
2. In “Search” mode - when a `query_string `is provided. A query is constructed sent to Opensearch and the response is augmented with the structured data before being returned in the same response scheme.


## **Request Payload**

The payload is a JSON object representing the search to be performed. This can be seen in [code here](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L62) and is described in further detail below.

```
{
  "query_string": "string",
  "exact_match": false,
  "max_passages_per_doc": 10,
  "keyword_filters": {
"additionalProp1": ["string"],
"additionalProp2": ["string"],
"additionalProp3": ["string"]
  },
  "year_range": ["string","string"],
  "sort_field": "date",
  "sort_order": "desc",
  "include_results": ["pdfsTranslated"],
  "limit": 10,
  "offset": 0
}
```

### Properties

#### query_string

A string representation of the search to be performed, example “Adaptation strategy”

#### exact_match

Boolean value to indicate if the `query_string `should be treated as an exact match when the search is performed. 

#### max_passages_per_doc (optional, default is 10)

The maximum number of matched passages to be returned for a single document.

#### keyword_filters (optional)

This is an object containing a map of fields and their values to filter on. The allowed fields can be found in [code here](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L34).

#### year_range

This is an array containing exactly two values, which can be null or an integer representing the years to search between. Examples: 

`[2000, 2023]` - Would search between 2000 and 2023 inclusive.

`[null, 2023]` - Would search from 1947 up to 2023 inclusive.

`[2000, null]` - Would search from 2000 to the current date.

`[null, null]` - Does not filter by date.

Further information and understanding can be found by [reading the tests here](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/tests/routes/test_search.py#L634).

#### sort_field (optional) & sort_order (optional, defaults to descending)

The field to sort by can be chosen from “date” or “title” [see related code](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L20).

The order can be chosen from ascending (use “asc”) or descending (use “desc”), [see related code](https://github.com/climatepolicyradar/navigator-backend/blob/ddebbd17f6b62cf7909e6e4c575285b8b00a41b2/app/api/api_v1/schemas/search.py#L13).

#### include_results (optional)

This is an array that lists the indexes to use when performing the search. The values can be chosen from: “pdfsTranslated”, “htmlsNonTranslated”, “htmlsTranslated”

#### limit & offset

These values control pagination, allowing a front end application to page through the results. The `limit` refers to the maximum number of results to return and `offset` where to start returning the results from that were retrieved via the backend.


## **Response Payload**

The response returns a list of families and includes their associated documents along with their passage matches. The payload has the following scheme:

```
{
  "hits": 0,
  "query_time_ms": 0,
  "total_time_ms": 0,
  "families": [ <see family below> ]
}
```

### Properties

#### hits

The total number of families that meet the search criteria.

#### query_time_ms

The time Opensearch spent performing the query.

#### total_time_ms

The total time spent in getting the response. 

#### families

A list of family objects, each following the scheme below:

```
{
  "family_slug": "string",
  "family_name": "string",
  "family_description": "string",
  "family_category": "string",
  "family_date": "string",
  "family_last_updated_date": "string",
  "family_source": "string",
  "family_geography": "string",
  "family_metadata": {},
  "family_title_match": true,
  "family_description_match": true,
  "family_documents": [ < see family document below > ]
}
```

#### family_slug

The slug that forms part of the URL to navigate to the family. Example, with a slug of  `climate-change-adaptation-strategy_1882`, a URL can be created to this family of documents as:

 [`https://app.climatepolicyradar.org/document/climate-change-adaptation-strategy_1882`](https://app.climatepolicyradar.org/document/climate-change-adaptation-strategy_1882)

#### family_name

The name of the family.

#### family_description

The description of the family.

#### family_category

The family category, for example: Executive (see list in [code here](https://github.com/climatepolicyradar/navigator-backend/blob/1529e0ff85b73a8e52a94e7eb510e3882307e64e/app/db/models/law_policy/family.py#L15))

#### family_date

The date the family of documents was published, this is from the corresponding “Passed/Approved” event for this family.

#### family_last_updated_date

The date the family of documents was published, this is from the most recent event of this family of documents.

#### family_source

The source, currently organisation name. Either “CCLW” or “UNFCCC”

#### family_geography

The geographical location of the family in [ISO 3166-1 alpha-3](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3)

#### family_metadata

An object if metadata for the family, the schema will change given the `family_source.`

#### family_title_match

Boolean value that is true if the search is matched within the family’s title.

#### family_description_match

Boolean value that is true if the search is matched within the family’s description.

#### family_documents

A list of the family’s documents in the following scheme:

```
{
   "document_title": "string",
   "document_slug": "string",
   "document_type": "string",
   "document_source_url": "string",
   "document_url": "string",
   "document_content_type": "string",
   "document_passage_matches": [{
        "text": "string",
        "text_block_id": "string",
        "text_block_page": 0,
        "text_block_coords": [ ["string","string"] ]
    }]
}
```

#### document_title

The title of the document.

#### document_slug

The slug that forms part of the URL to navigate to the particular document. Example, with a slug of  \``national-climate-change-adaptation-strategy_06f8`, a` `URL can be created to the document as:

https://app.climatepolicyradar.org/documents/national-climate-change-adaptation-strategy_06f8

#### document_type

The type of document, for example: “Strategy”, see the [loaded metadata here](https://github.com/climatepolicyradar/navigator-backend/blob/1529e0ff85b73a8e52a94e7eb510e3882307e64e/app/data_migrations/data/law_policy/document_type_data.json).

#### document_source_url

The source url of the external site that was used to ingest into the system.

#### document_url

The CDN url of where the document can be found within our system.

#### document_content_type

The content\_type of the document found at the above URLs. [Complete list is available at the IANA site](https://www.iana.org/assignments/media-types/media-types.xhtml). Most common is “`application/pdf`” and “`text/html`”.

#### document_passage_matches

This is a list of passages that match the search criteria within this document. The length of which is affected by `max_passages_per_doc `in the request.` `This is used for passage highlighting, please contact us for further information should you wish to use this data.


## **Examples**

The following examples are of using curl to call the API endpoint to retrieve results via the command line.

```
API_HOST=https://app.climatepolicyradar.org

curl "$API_HOST/api/v1/searches" \
     -X POST \
     -H 'Accept: application/json' \
     -H 'Content-Type: application/json' \
     --data-raw '{"query_string":"", "exact_match":true, "keyword_filters":{}, "sort_field":null, "sort_order":"desc", "limit":100, "offset":0}'
```
Loading