Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return a 410 (Gone) status code for unavailable API endpoints #97397

Merged
merged 10 commits into from
Aug 22, 2023

Conversation

swallez
Copy link
Member

@swallez swallez commented Jul 5, 2023

Updates RestController to return a 410 (Gone) status code for known REST handlers that are unavailable in the current environment (typically serverless). Also uses ElasticsearchException to return the detailed format by default, that contains more useful information for caller-side diagnostics and logs (in particular the "type" property).

Returning 410 instead of 404 allows a better handling of the differences brought by serverless in several areas:

  • in client libraries, 404 is usually not seen, unless the version of the client is newer than that of the server, and the user calls an enpoint function that doesn't exist on the server. Having a 410 will allow to report better errors to users in versions of the client libraries meant to be used with both the statefull and serverless variants of Elasticsearch
  • in telemetry and Cloud proxy logs, it clearly distinguishes calls to unavailable endpoints from regular 404. This can give better insights on the usage of serverless, and also help support in identifying bad usage of serverless.

Updates RestController to return a 410 (Gone) status code for known REST
handlers that are unavailable in the current environment (typically
serverless). Also uses ElasticsearchException to return the detailed format
by default, that contains more useful information for caller-side
diagnostics and logs (in particular the "type" property).
@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label v8.10.0 labels Jul 5, 2023
@swallez swallez added the :Core/Infra/REST API REST infrastructure and utilities label Jul 5, 2023
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team and removed needs:triage Requires assignment of a team area label labels Jul 5, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@rjernst
Copy link
Member

rjernst commented Jul 6, 2023

The concept here is great, having a different response code for known APIs that do not exist in serverless will help avoid confusion from users. However, I think the implementation in this PR is not sufficient. This PR depends on all endpoints from non-serverless being registered. Yet serverless completely omits some bits of code (eg watcher), so the endpoints are never registered.

I think an alternative approach would be to build a (compile time) list of APIs that exist to reference. It would need to span across modules/plugins. And in the plugin case it would actually be an improvement since a user might not have a plugin installed, yet try to use an API from it. So mapping a source of the API (eg plugin name or elasticsearch) would be another good thing to have.

Again, I think the concept here is good, but that should ensure there is consistency.

@swallez
Copy link
Member Author

swallez commented Jul 11, 2023

Thanks for the review @rjernst. I actually thought about the problem you outline but first wanted to "test the waters" with this simple change.

For plugins that are not loaded, we can indeed consider having a resource file that lists the URL patterns of all controllers that do not have a ServerlessScope annotation. In serverless mode RestController.handleBadRequest will check this list to identify known URL patterns that aren't active and return 410 in that case.

Maintaining this resource file manually would be time consuming and error-prone. It can be generated by introspecting classes, looking for RestController implementations, checking the ServerlessScope and calling routes() if not present to register URL patterns in the resource file.

Since this generation has to call the code, it seems complex (if not impossible) to be part of the build system. We can use a similar approach to ElasticsearchExceptionTest and do it in a unit test.

That unit test will use an approach similar to snapshot testing:

  • introspect classes to build the non-serverless URL pattern list (check annotations and call routes())
  • compare it with the existing resource file
  • if they're different, fail the test and write the correct resource file on disk to a temporary location so that the developer can check the difference and update the existing resource file.

This approach guarantees that the list of non-serverless patterns will be exhaustive and up to date.

This resource file also brings other benefits: it will be the source of truth to identify API endpoints that do not exist in serverless. It can then be leveraged to verify that @availability annotations in the API specification are correct. This will ensure there is no discrepancy in serverless client libaries with the actual capabilities of the server.

What do you think? And of course I'm willing to contribute this 😉

@rjernst
Copy link
Member

rjernst commented Jul 11, 2023

Generated resources files are exactly what I was thinking. The generation is indeed a bit tricky. I'd like to get @mark-vieira 's thoughts on such generation. I don't think we should do it in a test, but instead have an explicit generator that the build calls. I think we would want to have this as a task in each module, and then collect these resource files in the distribution.

@mark-vieira
Copy link
Contributor

I think we would want to have this as a task in each module, and then collect these resource files in the distribution.

Something like this would probably work, we'd have to work out the details on how we'd aggregate these and where. To grab these files the distribution would still have to depend on these artifacts, and we want to make sure we make that dynamic somehow.

@rjernst this would be something we'd only include in the serverless distribution, yes?

@rjernst
Copy link
Member

rjernst commented Jul 11, 2023

this would be something we'd only include in the serverless distribution, yes?

It somewhat depends. Right now the endpoint check for endpoints missing is in server. We would need to plug this into serverless (not currently pluggable, though I may be making it pluggable very soon for other reasons).

One reason I think it might be nice to generate and collect these resources across the board is for plugins, so that we can distinguish when a plugin api is called but the plugin is not installed. It's not a hard blocker here, we can do just serverless for now, but it seems to me like the work is pretty much the same whether the collection happens in serverless or not.

@joshdover
Copy link
Contributor

After discussion, we think this will be a good solution for us on the Beats side. We'd like to help make sure this moves along to unblock us. Thanks!

@tvernum
Copy link
Contributor

tvernum commented Aug 9, 2023

My preference would be for us to merge a version of this PR, and then iterate on something that handles unavailable modules.
If we want clients to be able to rely on 410 responses, then we should get a version of that implemented now, and improve it over time.

@technige
Copy link

@tvernum Thank you, yes, I'd be in favour of getting something in place sooner rather than later, even if it's not perfect. We need to carry out regression testing of Stack clients against Serverless, and being able to distinguish endpoints in this way is closely linked to that process.

@swallez
Copy link
Member Author

swallez commented Aug 14, 2023

@tvernum agree. Can we assign a reviewer to get this merged?

As for the next steps, I feel comfortable contributing the approach using snapshot testing that I suggested. Implementing it in the build system is beyond my current knowledge of it.

@elasticsearchmachine
Copy link
Collaborator

Hi @swallez, I've created a changelog YAML for you.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@swallez swallez merged commit 33f8333 into elastic:main Aug 22, 2023
@swallez swallez deleted the gone-for-non-serverless branch August 22, 2023 17:11
elasticsearchmachine pushed a commit that referenced this pull request Aug 22, 2023
…98752)

As a follow up to #97397, we want the ability to assert on unavailable
API endpoints that return a 410 status code. This PR adds this support
to our YAML testing framework.

cc @swallez
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/REST API REST infrastructure and utilities >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Core/Infra Meta label for core/infra team v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants