Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP HTTP Server Response Code #3203

Open
dineshg13 opened this issue Feb 12, 2023 · 8 comments
Open

OTLP HTTP Server Response Code #3203

dineshg13 opened this issue Feb 12, 2023 · 8 comments
Assignees
Labels
spec:protocol Related to the specification/protocol directory triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide

Comments

@dineshg13
Copy link
Member

dineshg13 commented Feb 12, 2023

What are you trying to achieve?
We are trying to setup a OTLP HTTP Server, it is not clear if responding with any code other than 200 is considered error. Every SDK is handling this differently.

  1. OTLP exporter n collector treats all codes in [200, 300) as success:

  2. Java otlp http exporter treats all codes in [200, 300) as success: https://github.com/open-telemetry/opentelemetry-java/blob/3d73283a71a2a6e1d6dd5e725098f697b1edae03/exporters/common/src/main/java/io/opentelemetry/exporter/internal/okhttp/OkHttpExporter.java#L120 .

public boolean isSuccessful()
Returns true if the code is in [200..300), which means the request was successfully received, understood, and accepted.
  1. Python http exporter treats 200 and 202 as success:
  2. Go HTTP Exporter treats only 200 status code as success.

What did you expect to see?
I would expect the specification to say explicitly that any 2xx shouldn't be considered an error.

Additional context.
See discussion on PR open-telemetry/opentelemetry-go#3707

@arminru
Copy link
Member

arminru commented Feb 13, 2023

Hey @dineshg13!

The specification itself is quite clear on this:

On success, the server MUST respond with `HTTP 200 OK`. The response body MUST be

If the request is only partially accepted
(i.e. when the server accepts only parts of the data and rejects the rest), the
server MUST respond with `HTTP 200 OK`. The response body MUST be the same

If your backend follows this spec and only returns 200 OK, it should be accepted by all compliant exporters.

@arminru arminru added the triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide label Feb 13, 2023
@jack-berg
Copy link
Member

The requirement for only accepting 200 as success was part of the original PR defining OTLP. I didn't see any comments that suggest why this is the case.

@MSNev
Copy link

MSNev commented Feb 13, 2023

All 2xx codes should be treated as successful, there is a scenario with browsers where the server response MUST return a 204 to avoid browsers from creating excessive connections (which can cause requests to block navigation and never get sent -- resulting in lost telemetry)

For clarification, during page unload when the browser is sending requests via the API's used for sending requests (sendBeacon() and fetch() with keep-alive (which are currently the only reliable API's to send requests during page unload)) IF the response returns a 200 this causes browser (chromium specifically) to have to establish a new connection as it ignores the body (because it is not expecting one) and it can't expect that the connection stream is valid, while when a 204 response is returned it can and does reuse the existing connection.

To fully "support" this scenario the sender (exporter) should pass a flag to inform the server that if everything is successful and it would have normally returned a 200 (with body) that it should return a 204 and no body.

@tigrannajaryan
Copy link
Member

We have this in the spec:

All other HTTP responses that are not explicitly listed in this document should be treated according to HTTP specification.

So the spec says that the Client must be ready to receive other status codes. The spec does not say when the Server should use these codes.

We can try to refine the spec in a way that clarifies what happens when 2xx or 3xx is used. We must make this change in a way that is backward compatible (I am not sure how exactly it can be done).

@dineshg13
Copy link
Member Author

I agree with @tigrannajaryan . The spec needs to clarify how the client handles 2xx or 3xx. Considering only 200 as the non-error response seems overly restrictive.

@Aneurysm9
Copy link
Member

All other HTTP responses that are not explicitly listed in this document should be treated according to HTTP specification.

So the spec says that the Client must be ready to receive other status codes. The spec does not say when the Server should use these codes.

We can try to refine the spec in a way that clarifies what happens when 2xx or 3xx is used. We must make this change in a way that is backward compatible (I am not sure how exactly it can be done).

I think for the 2xx series the spec is fairly conclusive that 200 OK is the only status code a server should use. It is the only valid response for full success and for partial success. There are no other success scenarios that are not either full or partial success; everything else is some form of error or a redirect.

I think there's a case to be made for allowing 204 No Content in the situations @MSNev has described and where full success is to be reported. Not having a response body makes it impossible to use for partial success. That said, I'd offer my usual concern about over-indexing on the known-bad behavior of older browser implementations that may be a non-issue for much of the useful life of this specification.

For 202 Accepted I think we need to be very careful about introducing asynchronous behavior at a point at which synchronous behavior is expected. With a request that is accepted but not processed there is then no means for reporting partial success or other error states without also specifying callback mechanisms and increasing the surface area and complexity of the protocol and its implementations. Every client implementation will need to deal with a new set of scenarios and flows. Every server implementation will now have multiple different sets of expectations from clients with no clear way of signaling which is in use.

@MSNev
Copy link

MSNev commented Feb 14, 2023

@Aneurysm9 Just to clarify

I'd offer my usual concern about over-indexing on the known-bad behavior of older browser implementations that may be a non-issue for much of the useful life of this specification.

The requirement to return 204 is actually a newer browser implementation, not older known-bad behavior of older browser implementations, older browser do not have this issue because they don't enforce CORB requirements.
And this issue occurs because client (browser) kills the connection when the CORB violation is caused by the server sending back an unwanted body.

And with (really) old browsers there was no sendBeacon() and until recently you could still use synchronous (blocking) XMLHttpRequest during page unload to ensure that the request was sent (it's been marked as deprecated for years), and now with newer browsers synchronous (blocking) XMLHttpRequest during page unload is blocked and you MUST use sendBeacon() to send the request.

@Aneurysm9
Copy link
Member

Thanks for the clarification @MSNev. I indeed did have my understanding of the landscape backwards. I think the quoted part of my comment can be removed and the rest stands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:protocol Related to the specification/protocol directory triaged-needmoreinfo The issue is triaged - the OTel community needs more information to decide
Projects
None yet
Development

No branches or pull requests

7 participants