Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt RFC 7807 error responses to allow API consumers to be reminded of the API definition #31

Closed
eric-murray opened this issue Jul 12, 2023 · 29 comments
Labels
documentation Improvements or additions to documentation EUC-question Expected opinion from End User Council

Comments

@eric-murray
Copy link
Collaborator

Problem description
CAMARA error responses include a short message to give some indication of the specific error. For example:

{
  "status": 400,
  "code": "OUT_OF_RANGE",
  "message": "Invalid port ranges specified: devicePorts"
}

But this information may not be sufficient on its own to allow the client to resolve the error, particularly if the API behaviour has changed between different versions. Now that documentation is embedded into the API definition itself, these responses would be a good opportunity to remind the API consumer of the API definition that is currently being used by the API implementation they are interacting with (which may not be the most recent).

RFC 7807 allows URLs to be included in the type field. With a couple of additional property name changes (renaming code to title and message to detail) CAMARA error responses could be made RFC 7807 compliant.

For example:

{
   "type": "https://raw.githubusercontent.com/camaraproject/QualityOnDemand/release-0.8.1/code/API_definitions/qod-api.yaml"
   "status": 400,
   "title": "OUT_OF_RANGE",
   "detail": "Invalid port ranges specified: devicePorts"
}

Using this format, an explicit link to the API definition (in this case v0.8.1) can be provided to assist troubleshooting. This URL should remain stable for an API release.

Expected action
An RFC 7807 compliant format as shown in the above example should be adopted for CAMARA error responses.

Additional context
Many API client implementations will already be able to parse RFC 7807 compliant error responses.

@eric-murray eric-murray added the documentation Improvements or additions to documentation label Jul 12, 2023
@shilpa-padgaonkar
Copy link
Collaborator

shilpa-padgaonkar commented Jul 13, 2023

Fine for me to add type. But some points to consider

  • RFC 7807 has included the type field to detail out the problem so that the error code can be understood better. Not sure if everyone would be happy to be pointed to the full doc/spec. This could be used as an option by providers to actually point to the particular problem detail (maybe even specific to their implementation).

  • I am not sure if the version issue should be clubbed with this solution together. For this, I would point more towards a comment API Versioning QualityOnDemand#179 (comment) and the followup new issue API Versioning - Aggregation ReleaseManagement#14 ( as i dont think this issue is only aggregator related issue)

  • Also, with this proposal we might have to be extra careful that we do not return older release links (unintentionally) and add more confusion.

@eric-murray
Copy link
Collaborator Author

Thanks for the comments @shilpa-padgaonkar

  • Probably returning a link to the full API specification should be considered the "default" value for the type field, as that documentation is known to exist and is publicly accessible. But I agree that an API provider should be free to return a link to a more specific description of the problem, either provided by themselves or submitted to the CAMARA API repository. We could also look at embedding hash-links in the YAML file to allow linking to the specific section of the API definition that the API consumer should be looking at.

  • Also agree that the value of the type field should not be taken as a definitive indicator of the API version that the API provider is compliant with. Really the API client needs to know this in advance rather than relying on the error description. However, I can see client and server using different API versions being a source of error, and returning a link to the specific version documentation is one way to make the API client realise their mistake.

  • And, yes, API providers need to make sure their implementations are fully updated when they migrate to a new version of an API!

Additional views on this would be welcome.

@eric-murray
Copy link
Collaborator Author

So the solution for defining URI fragments in text/plain documents is defined in RFC 5147, and is to append #line to the URL, followed by the start and end line of the fragment, separated by commas. So, amending the above example:

{
   "type": "https://raw.githubusercontent.com/camaraproject/QualityOnDemand/release-0.8.1/code/API_definitions/qod-api.yaml#line312,342"
   "status": 400,
   "title": "OUT_OF_RANGE",
   "detail": "Invalid port ranges specified: devicePorts"
}

where now the type URL should now return only the PortsSpec schema.

Unfortunately, this notation does not appear to be supported by any web browsers :-(

@RandyLevensalor
Copy link

This looks good. As discussed on the call, allowing the flexibility for the API provider to provide the content which they see as best in the type field.

@eric-murray
Copy link
Collaborator Author

So one solution would be to hash-link to the github rather than raw view of the YAML. For example, we could have:

{
   "type": "https://github.com/camaraproject/QualityOnDemand/blob/release-0.8.1/code/API_definitions/qod-api.yaml#L312-L328"
   "status": 400,
   "title": "OUT_OF_RANGE",
   "detail": "Invalid port ranges specified: devicePorts"
}

So my proposal would be to add a suitable type field of this nature to each example error message that is included in an API specification (such as that above). API providers could then either just use the example error message verbatim for a given error, or provide a link to their own error documentation instead if they prefer.

As the hash-link still links to specific lines rather than sections, it will require some effort to keep these up to date as lines move around, but really this is something that only need be done as part of the final documentation update for a release.

@Kevsy
Copy link
Collaborator

Kevsy commented Jul 31, 2023

I support this proposal: it also enables problem details that are common to multiple APIs to be shared by referencing the same type URI value.

@patrice-conil
Copy link
Collaborator

As a developer, I do not support this proposal because it complicates error handling too much. Maintaining the precision of the pointer to the documentation could be a real challenge and the risk is to always return the same link(global swagger) which increases the payload and does not provide any useful information. In addition, the detection of errors in format, type, etc. are generally delegated to the serialization/deserialization layer (GSON, Jackson, Moshi, etc.) and it is therefore difficult to customize errors, especially since microservices frameworks like spring (@ControllerAdvice) or quarkus (@ServerExceptionMapper) propose to centralize the error handling in a dedicated component which transforms the exceptions generated by the different layers of the software into well-formatted errors.
IMHO I think it would be more useful to give consumers more usage examples of the APIs rather than pointers to the documentation.

@eric-murray
Copy link
Collaborator Author

Hi @patrice-conil

When you say using RFC 7807 "complicates error handling too much", do you mean for the API client, the API implementor, or both?

For the API implementor, my proposal was that error examples including a relevant type field be included in the API OAS file. So, yes, somebody will need to check each of these for each API release, but that only needs to be done once for each release, and I doubt we will have more than one or two releases per year per year, and hopefully less. So no challenge at all for implementors (i.e. developers) if they use the default error responses.

For the API client, I have to confess that I don't understand why you think using a standard error format (RFC 7807) is more challenging than using the proprietary CAMARA format.

Client developers who do not want to pay any attention to the type field can choose to ignore it. So, again, no challenge for developers who do not want a challenge. Whether error handling is centralised or handled by individual microservices appears to me to be irrelevant to whether the error format is standardised (RFC 7807) or proprietary (CAMARA). If anything, a centralised component should find it easier to handle a standardised error format.

Payload bloat is primarily an issue for happy path scenarios (2XX responses). Payload bloat for errors is much less of an issue, because fielding large numbers of 4XX error responses should always be a temporary issue. For 5XX errors where the client can do nothing about it, the type field can be omitted.

@patrice-conil
Copy link
Collaborator

Hi @eric-murray,

Hi @patrice-conil

When you say using RFC 7807 "complicates error handling too much", do you mean for the API client, the API implementor, or both?

Only for implementors.

For the API implementor, my proposal was that error examples including a relevant type field be included in the API OAS file. So, yes, somebody will need to check each of these for each API release, but that only needs to be done once for each release, and I doubt we will have more than one or two releases per year per year, and hopefully less. So no challenge at all for implementors (i.e. developers) if they use the default error responses.

It's not a problem of format ... it's how to populate info when detection of error is delegated to the framework. If I always populate the "type" with same swagger spec ... what is the added value ?

For the API client, I have to confess that I don't understand why you think using a standard error format (RFC 7807) is more challenging than using the proprietary CAMARA format.

I agree with you, there is no challenge for API consumers

Client developers who do not want to pay any attention to the type field can choose to ignore it. So, again, no challenge for developers who do not want a challenge. Whether error handling is centralised or handled by individual microservices appears to me to be irrelevant to whether the error format is standardised (RFC 7807) or proprietary (CAMARA). If anything, a centralised component should find it easier to handle a standardised error format.

As said before ... challenge is not in the format of error.

Payload bloat is primarily an issue for happy path scenarios (2XX responses). Payload bloat for errors is much less of an issue, because fielding large numbers of 4XX error responses should always be a temporary issue. For 5XX errors where the client can do nothing about it, the type field can be omitted.

I agree with you on that point.

In fact, if you think we could always link back to global OAS, there's no challenge in using that format for errors, but I don't think it adds much value (other than adhering to a standard). If it's a question of being much more precise and pointing to a particular section of the specification as mentioned by @shilpa-padgaonkar , then there is a real challenge.

@eric-murray
Copy link
Collaborator Author

Hi @patrice-conil

OK, if I understand your concerns correctly, these are:

  • the difficulty in maintaining the link definitions for each error with the API definition YAML, because these must be based on line numbers and lines move around as the API definition evolves; and
  • the difficulty in the implementation framework identifying the correct error example to return (if it is desired just to return one of the examples defined in the API definition rather than return customised errors)

These are reasonable concern for an API implementor, but the reason we are doing all this is for the benefit of the API consumer. My view is that we should accept some implementation pain if this provides value to the API consumer, and I believe this does. When testing our implementations internally, a common complaint from API consumers is that they just do not understand what they have done wrong when constructing a request and getting a 4XX response. Some people learn by reading the documentation, whereas others try something to see what result they get.

So I am still in favour of this proposal as I think it does add value for API consumers (at least, for those "trying out" an API to see if it is of value to them), but I think a consensus needs to be reached within CAMARA on this, as it does imply some additional work for the CAMARA community.

@patrice-conil
Copy link
Collaborator

Hi @eric-murray ,

These are reasonable concern for an API implementor, but the reason we are doing all this is for the benefit of the API consumer. My view is that we should accept some implementation pain if this provides value to the API consumer, and I believe this does. When testing our implementations internally, a common complaint from API consumers is that they just do not understand what they have done wrong when constructing a request and getting a 4XX response. Some people learn by reading the documentation, whereas others try something to see what result they get.

IMHO, we need to invest more time on "giving the right error message" than "pointing on particular spec section". As an API consumer, I expect the API to tell me what my mistake is and not direct me to the document I've already read.
This kind of Error, is what I expect ... so it's what I implemented in QoD.

{
    "status": 400,
    "code": "BAD_REQUEST",
    "message": "Cannot construct instance of `com.camara.model.RateUnitEnum`, problem: Unexpected value 'Kbps'"
}

But as said before, adhering to the RFC 7807 is not a problem for us. We will just always return pointer on global API definition.

@eric-murray
Copy link
Collaborator Author

Hi @patrice-conil

Sue, but the choice is not one or the other. A RFC 8707 error response can have both a message (or detail for RFC 8707) and type for a link to more detail if the error message itself does not clarify the problem. I am all in favour of providing a rich set of example error responses in the API definition to inspire API implementors, but this issue is primarily discussing the format through which such error messages can be provided to the API consumer.

So, to summarise the proposal:

  • Error responses should be RFC 7807 compliant
  • 'type` will be optional
  • API definitions will provide model error responses for those who do not want to implement anything different

So the only mandatory requirement for API implementors is that error responses have media type application/problem+json. Standardising and mandating error message content is not being proposed.

@Kevsy
Copy link
Collaborator

Kevsy commented Aug 23, 2023

"message": "Cannot construct instance of com.camara.model.RateUnitEnum, problem: Unexpected value 'Kbps'"

Just to comment that the error detail I would expect as an API consumer would be:

"message": " Unexpected value 'Kbps'. Expected values are: 'bps' 'kbps' 'Mbps' 'Gbps' 'Tbps' "

...because that (1) tells me what to do next and (2) does not share internal implementation details (security principle of least privilege).

@patrice-conil
Copy link
Collaborator

@Kevsy,
Yes your message would be better ... but mine is free using jackson as serializer/deserializer 😄

@maxl2287
Copy link
Contributor

maxl2287 commented Sep 8, 2023

@eric-murray: I am also okay with RFC 7807 and having the type as optional.

I am also coming from the developer POV and see troubles in having links to the schema in the error-response (but when type is optional, we do not have to take care about)

I am also interested that the message would communicate to the Provider, what the exact error is rather than linking to the schema (like @Kevsy reported as well).

example

{
   "type": "https://github.com/camaraproject/QualityOnDemand/blob/release-0.8.1/code/API_definitions/qod-api.yaml#L312-L328"
   "status": 400,
   "title": "INVALID_INPUT",
   "detail": "Invalid input provided: The msisdn should adhere to the pattern '^+?[0-9]{5,15}$'"
}

this would help a lot more than:

{
   "type": "https://github.com/camaraproject/QualityOnDemand/blob/release-0.8.1/code/API_definitions/qod-api.yaml#L312-L328"
   "status": 400,
   "title": "INVALID_INPUT",
   "detail": "Schema validation failed at msisdn"
}

@RubenBG7
Copy link
Collaborator

From my prospective, errors raised by an API have not the goal of solve bugs with the integration. Errors goal are to give information of what happend with the request made to the consumer.

When a developer needs to follow what happend in the integration layers you should use correlators, surveillance methods, logs or whatever you need, but the errors shall not be used to give this kind of information.

@eric-murray
Copy link
Collaborator Author

The agreement made during a recent meeting was to refer this proposal to the End User Council, who should be able to come to a common view on whether this is useful to developers or not. Unfortunately I've been so busy with other things I've not been able to pursue this, but will try to do that in the near future.

@patrice-conil
Copy link
Collaborator

Hi @RubenBG7, @eric-murray

From my prospective, errors raised by an API have not the goal of solve bugs with the integration. Errors goal are to give information of what happend with the request made to the consumer.

The goal is not solve bugs with the integration but if it could help during integration why not?

When a developer needs to follow what happend in the integration layers you should use correlators, surveillance methods, logs or whatever you need, but the errors shall not be used to give this kind of information.

Actually, if we address few high consumption integrations, I agree with Ruben, but if we want to address a larger market with many developers... correlators and tracking would be inefficient.
I think optional type should do the trick and let each provider to implement the way it wants.

@rartych rartych added the EUC-question Expected opinion from End User Council label Oct 16, 2023
@rartych
Copy link
Collaborator

rartych commented Dec 19, 2023

It occurred that in July 2023 RFC 9457 was published and obsoleted RFC 7807.

There are pros & cons for this solution of Problem Details for HTTP APIs.

@gmuratk
Copy link
Collaborator

gmuratk commented Jan 11, 2024

@eric-murray, @rartych
I'm not sure where we are with the EUC feedback/input, but I added some suggestions in the discussion for #113 which mentions this issue. It was suggested to bring them over to this discussion.
Instead of repeating the lengthy comment and an example implementation here is the link to the comment.
#113 (comment)

@akoshunyadi
Copy link

  • I agree on using the new version RFC 9457.
  • The mapping: status => status, code => title, message => detail makes also sense for me.
  • I don't see the necessity to introduce an additional attribute 'cause'. I thought the reason for the error contained in the 'detail' attribute already.
  • The question is how type should be handled. Since it is an URI it doesn't have to be an URL, just an unique identifier is also possible. For that I would prefer a common list for all APIs.

@gmuratk
Copy link
Collaborator

gmuratk commented Jan 11, 2024

Thanks @akoshunyadi

  • I don't see the necessity to introduce an additional attribute 'cause'. I thought the reason for the error contained in the 'detail' attribute already.

RFC 9457 has the following statement: "Consumers SHOULD NOT parse the "detail" member for information". Instead, having 'cause' attribute may benefit the consumer to trigger programmable action - "to resolve an error" which is the original goal outlined in the problem statement.

  • The question is how type should be handled. Since it is an URI it doesn't have to be an URL, just an unique identifier is also possible. For that I would prefer a common list for all APIs.

I think the recent changes for moving common structures to CAMARA_common.yaml aligns with this. However we need a way for API specific [error] causes to be supported as well. That's the reason for API Specific Cause value list and definitions in the example implementation of the qod-api I provided in my comment.

@jlurien
Copy link
Contributor

jlurien commented Jan 24, 2024

The debate is split between this issue and #113, but one common point is whether API specs should normalize the main identification of an error and where, In my opinion, the equivalent for type of RFC 9457 in our current model is the code. title in the RFC is just an optional human-readable summary of the problem type that should not change between errors of same type. detail is for further details, that may change between instances (similar to our message)

Situation is that we are not using consistently the code across current CAMARA specs. Some APIs define specific codes for specific API errors, such as SIM_SWAP.UNKNOWN_PHONE_NUMBER while others rely entirely on generic errors and the same code is used always for same status, providing no extra value.

So IMO changing to RFC 7807 and renaming the object keys is not changing the situation. The real difference is to provide relevant values for the type/code, so developers can rely on known values to identify specific problems.

@palmerabollo
Copy link

The current format with code+message+status is inspired by some big industry players that have thousands of developers already using their APIs:

(I'm only aware of Twitter/X using a type field)

Since we are targeting the same kind of developers, I believe familiarity is a good argument in favor of keeping the current code+message fields.

I'm not against adding a field with a link to the error spec (it will be hard to maintain, though). I'd try to keep the interfaces as simple as possible until we have more input from devs using the APIs. Adding an new field in the future is easy because it's not breaking change, but removing or renaming fields will be painful.

@patrice-conil
Copy link
Collaborator

On Orange side we agree with TEF position and don't see enough value to change.

@gmuratk
Copy link
Collaborator

gmuratk commented Jan 30, 2024

Thank you for the comments and provided references.

Based on the review, here are my thoughts:

References from the big industry players have 'error codes' that are unique allowing the Application Developers to be able to identify specific problems. Whereas in the current API definitions, 'code' values are overloaded and 'message', which is almost like free form text, is used as the discriminator. It will be challenging to the developers to keep a map of status+error+message especially if the messages have to be localized by API providers.

If there is a concern with addition of a new key 'cause' that will be uniquely identifying the specific issue, we can possibly rework the 'code' field. But then again, I don't think it'd be wise to have 'code' values that deviate from the corresponding HTTP Response integer/string combo. (e.g. 400 Bad Request -> Invalid Argument, 401 Unauthorized -> Unauthenticated, 403 Forbidden -> Permission Denied, 500 Internal Server Error -> Internal, 503 Service Unavailable -> Unavailable)

The example I provided by forking the CAMARA_common.yaml and qod-api.yaml offer a list of common and API specific cause values. Common ones can be listed in the CAMARA_common.yaml and can serve as a starting point for any new APIs. API Specific causes (i.e. ErrorCodes) can be listed as enumarations in each API definition.

For convenience: An example of the modified QoD API (forked from v0.10.0-rc) that is making use of an example CAMARA_common.yaml can be found in the following link: https://github.com/gmuratk/QualityOnDemand-httpresponses/blob/main/code/API_definitions/qod-api.yaml

@gmuratk
Copy link
Collaborator

gmuratk commented Feb 1, 2024

#128 seems to be relevant issue to the concern below

If there is a concern with addition of a new key 'cause' that will be uniquely identifying the specific issue, we can possibly rework the 'code' field. But then again, I don't think it'd be wise to have 'code' values that deviate from the corresponding HTTP Response integer/string combo. (e.g. 400 Bad Request -> Invalid Argument, 401 Unauthorized -> Unauthenticated, 403 Forbidden -> Permission Denied, 500 Internal Server Error -> Internal, 503 Service Unavailable -> Unavailable)

cc: @lbertz02

@eric-murray
Copy link
Collaborator Author

Replaced by #133

@hdamker
Copy link
Collaborator

hdamker commented Feb 12, 2024

"type": "https://raw.githubusercontent.com/camaraproject/QualityOnDemand/release-0.8.1/code/API_definitions/qod-api.yaml"

I know that this discussion is closed and replaced, but just the remark, that the proposed use of type URLs within the issue description wouldn't work. "type" is an identifier and using the version of the API within the URL would mean to define with each version a new problem type and therefore introducing a breaking change. To ensure stable problem types, RFC 9457 is going so far to introduce a IANA registry for problem type (with "Specification required"): https://iana.org/assignments/http-problem-types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation EUC-question Expected opinion from End User Council
Projects
None yet
Development

No branches or pull requests