Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose JSON body fields as typed parameters #1680

Closed
sethmlarson opened this issue Aug 11, 2021 · 2 comments · Fixed by #1802
Closed

Expose JSON body fields as typed parameters #1680

sethmlarson opened this issue Aug 11, 2021 · 2 comments · Fixed by #1802

Comments

@sethmlarson
Copy link
Contributor

sethmlarson commented Aug 11, 2021

Motivation

Today with the Elasticsearch client you have to write a JSON blob to the body to pass any values within the HTTP body.
Unfortunately this is where some of the most complex data structures are for the Elasticsearch API (see query DSL, aggregations)
which means we're unable to provide a good window into this opaque object via types or auto-complete.

This is what writing a search query looks like today:

from elasticsearch import Elasticsearch
client = Elasticsearch("https://localhost:9200")

client.search(
    index="test-index",
    size=10,
    body={
        "runtime_mappings": {
            "day_of_week": {
                "type": "keyword",
                "script": "emit(doc['@timestamp'].value.dayOfWeekEnum)"
            }
        },
        "aggs": {
            "day_of_week_count": {
                "value_count": {
                    "field": "day_of_week"
                }
            }
        }
    }
)

And here is what the type hints for the search API look like for today:

def search(
    index: Optional[Any] = ...,
    body: Optional[Any] = ...,
    size: Optional[Any] = ...,
    ...,
): ...

Something to note for later: the size parameter is always serialized in the query string in this example.

Proposed Solution

With the expanded JSON body fields implemented the same search API call can be written like so:

client.search(
    index="test-index",
    size=10,
    runtime_mappings={
        "day_of_week": {
            "type": "keyword",
            "script": "emit(doc['@timestamp'].value.dayOfWeekEnum)"
        }
    },
    aggs={
        "day_of_week_count": {
            "value_count": {
                "field": "day_of_week"
            }
        }
    }
)

Notice how the body fields and fields that are serialized elsewhere like index are all at the same level and you're writing actual Python code instead of wrangling with JSON at the top-level of the API.

And here is what the function signature and types for the search API would look like:

def search(
    index: Optional[Union[str, List[str]]] = ...,
    size: Optional[int] = ...,
    runtime_mappings: Optional[Mapping[str, Any]] = ...,
    aggs: Optional[Mapping[str, Any]] = ...
    ...,
    body: Optional[Any] = ...,
): ...

The differences that I want to highlight:

  • There is still a body parameter for backwards compatibility with queries written before this change. This won't be a breaking change, any queries that are using the body parameter will still function exactly as they do today. The body parameter will be deprecated on APIs that support expanded body fields.
  • Using the Elasticsearch specification we get better type hints for parameters that already exist. You can see this example best with the index and size parameters, previously they were Optional[Any] and now they are Optional[Union[str, List[str]] and Optional[int] respectively. This gives you a lot more confidence when writing API calls that you're using the right types the first time.
  • Not shown, but the size parameter would be encoded in the HTTP request body instead of the query string. This is an improvement over the current serialization strategy because space within the HTTP request target (URL path + query) is limited. In the past we've seen errors from users using Scroll IDs which can be quite verbose and currently have a work-around to serialize Scroll IDs in the HTTP body. Another motivator is that any value serialized in the query string instead of the request body isn't effected by HTTP body compression.

Nuances and Future Improvements

These improvements are a first-step towards a fully-typed Elasticsearch Python client and will let users start down the path of adding richer types their Elasticsearch code. However there are a few things to note:

When will DeprecationWarnings start?

I'm unsure if they should be emitted in the next release (7.15) or if I should wait until more APIs are supported. Tough balance between alerting users to upcoming changes and new features and being difficult to avoid DeprecationWarnings in general usage of the client.

Not all APIs can take advantage right away

For an API to have expanded body fields it must be completed defined within the Elasticsearch specification. There are a lot of APIs that aren't completely defined yet so rollout of this change may take some time while the specification is filled.

In these cases APIs will be generated using the previous process and have a simple body parameter that won't raise a DeprecationWarning for the time being.

When the body parameter is used, return to old behavior

This means that all previous code written will use the old behavior until updated to not use the body parameter. A DeprecationWarning will be raised in cases where the body parameter could be replaced by expanded fields.

If both a body field and the body parameter are defined a ValueError will be raised as this configuration isn't supported.

Some fields that were once serialized to the query are now serialized to the body

This is true! However I believe that API compatibility won't be broken by this behavior change as Elasticsearch will treat the two different serializations the same. Some examples of this are _source, size, from, etc.

Deeply nested objects are still untyped

For example the runtime_mappings object is typed as Mapping[str, Any] where instead it should be mapped as Mapping[str, TypedDict[type: Union[Mapping[str, Any], str], format: str, script: Optional[str]]].

The aim is to reduce the amount of Any types in type hints if possible but for this initial implementation using only built-in Python types as this is as far as we can define in the general case. This means only using scalars (int, float, str, etc), and Union, Optional, List, Mapping, and Any.

typing.TypedDict is a newly added feature to Python 3.8 but in terms of ability to describe the complex structures of Elasticsearch API is missing a few critical features. Specifically the ability to mark one or more keys as "optional" to include without using the total=False parameter which makes all keys optional. I'll continue to watch the Python typing space for additional improvements there.

In the future defining our own objects and types may be required to represent these complex types.

What about conflicts with per-request parameters and body fields?

Parameters like api_key are a parameter on every API in order to define different authentication config per request.
We want to continue to support code written this way so for now APIs that have a conflict between per-request parameters
and body fields will continue to use the old behavior of a single body parameter.

APIs that have these conflicts will not have expanded body fields for now, there is a future improvement in the works to solve this issue.

What about bodies that aren't JSON?

These APIs won't be changed and will continue to have a body parameter for 7.x.

The future is keyword-only

In Python 3 keyword-only arguments were added which allowed making functions automatically raise a TypeError if called without using that argument as a keyword argument. This is a fantastic feature because if makes all code written with a library much more readable and maintainable. It also makes my job as a library maintainer much easier as I no longer have to worry about breaking code wrt. the order of parameters, only that they are there.

Currently type stubs define all parameters as keyword-only (except required path parameters) but because these are stubs and not function signatures there's no enforcement of this unless you opt-in with mypy or another type-checking tool.

Starting in 8.x keyword-only arguments will be used for all parameters so users should switch over to keyword arguments (as has always been recommended) as soon as possible!

@sethmlarson sethmlarson pinned this issue Aug 11, 2021
@sethmlarson sethmlarson changed the title [7.x] Expose JSON body fields as typed parameters Expose JSON body fields as typed parameters Aug 11, 2021
@sethmlarson
Copy link
Contributor Author

Completed #1681 to remove the legacy generator and add basic types to request / response body.

@sethmlarson
Copy link
Contributor Author

Landed the first body field parameters in the search API: #1686
Now that this is merged adding this feature to more APIs will be opting in on the generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant