Enable remaining HTTP clients in metrics integration tests #2441

scottgerring · 2024-12-17T16:51:27Z

in #2432 and the associated PR, we added integration tests for metrics.

However, HTTP-client exporters have issues that cause test failures, so they are skipped (we are only testing gRPC currently):

opentelemetry-rust/opentelemetry-otlp/tests/integration_test/tests/metrics.rs

Lines 186 to 194 in 551760b

    
           /// 
        
           /// TODO - the HTTP metrics exporters do not seem to flush at the moment. 
        
           /// TODO - fix this asynchronously. 
        
           /// 
        
           #[cfg(test)] 
        
           #[cfg(not(feature = "hyper-client"))] 
        
           #[cfg(not(feature = "reqwest-client"))] 
        
           #[cfg(not(feature = "reqwest-blocking-client"))] 
        
           mod tests {

Remove this exclusion and fix the tests or the impl!

scottgerring · 2024-12-17T16:51:38Z

I will look at this next

scottgerring · 2024-12-18T13:30:07Z

here's the error that's breaking this:

thread 'OpenTelemetry.Metrics.PeriodicReader' panicked at opentelemetry-http/src/lib.rs:166:32:
there is no reactor running, must be called from the context of a Tokio 1.x runtime
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This is being thrown specifically by this line:

opentelemetry-rust/opentelemetry-http/src/lib.rs

Line 166 in 9011f63

    
           let mut response = time::timeout(self.timeout, self.inner.request(request)).await??;

(note that that time::timeout is a tokio function)

And here's what's happening. We create a PeriodicReader with its own thread:

opentelemetry-rust/opentelemetry-sdk/src/metrics/periodic_reader.rs

Lines 166 to 175 in 9011f63

    
           let result_thread_creation = thread::Builder::new() 
        
               .name("OpenTelemetry.Metrics.PeriodicReader".to_string()) 
        
               .spawn(move || { 
        
                   let mut interval_start = Instant::now(); 
        
                   let mut remaining_interval = interval; 
        
                   otel_info!( 
        
                       name: "PeriodReaderThreadStarted", 
        
                       interval_in_millisecs = interval.as_millis(), 
        
                       timeout_in_millisecs = timeout.as_millis() 
        
                   );

And then we do async-things inside it:

opentelemetry-rust/opentelemetry-http/src/lib.rs

Line 166 in 9011f63

    
           let mut response = time::timeout(self.timeout, self.inner.request(request)).await??;

... which should be enabled by our wrapping of the whole thing up in futures_executor:

opentelemetry-rust/opentelemetry-sdk/src/metrics/periodic_reader.rs

Lines 357 to 360 in 9011f63

    
           // Relying on futures executor to execute async call. 
        
           // TODO: Add timeout and pass it to exporter or consider alternative 
        
           // design to enforce timeout here. 
        
           let exporter_result = futures_executor::block_on(self.exporter.export(&mut rm));

scottgerring · 2024-12-18T14:06:24Z

The mystery deepens. I rolled my own futures-timeout that does not depend on tokio (yes, I know):

opentelemetry-rust/opentelemetry-http/src/lib.rs

Lines 225 to 256 in 0254bc1

    
           struct Timeout<F> { 
        
               future: F, 
        
               delay: Pin<Box<dyn Future<Output = ()> + Send>>, 
        
           } 
        
           impl<F: Future + std::marker::Unpin> Future for Timeout<F> where 
        
               F::Output: Send, 
        
           { 
        
               type Output = Result<F::Output, &'static str>; 
        
               fn poll(self: Pin<&mut Self>, cx: &mut std::task::Context<'_>) -> Poll<Self::Output> { 
        
                   let this = self.get_mut(); 
        
                   // Poll the delay future 
        
                   if this.delay.as_mut().poll(cx).is_ready() { 
        
                       return Poll::Ready(Err("timeout")); 
        
                   } 
        
                   // Poll the main future 
        
                   match Pin::new(&mut this.future).poll(cx) { 
        
                       Poll::Ready(output) => Poll::Ready(Ok(output)), 
        
                       Poll::Pending => Poll::Pending, 
        
                   } 
        
               } 
        
           } 
        
           fn timeout<F: Future + Send>(duration: Duration, future: F) -> Timeout<F> { 
        
               Timeout { 
        
                   future, 
        
                   delay: Box::pin(futures_timer::Delay::new(duration)), 
        
               } 
        
           }

This gets us further - we no longer panic in the timeout, but we panic further into hyper, where again we are expecting a tokio runtime - screenshot shows the frame that kicks back into tokio, and eventually panics for lack of a tokio runtime:

I feel like I am missing something! It's not clear to me how all this rather-tokio-dependant-looking-stuff can run within a non-tokio thread context. I can see that the path for the logging integration ends up with its own tokio context for hyper to work with, rather than sticking it on a non-tokio thread:

opentelemetry-rust/opentelemetry-http/src/lib.rs

Lines 139 to 147 in 0254bc1

    
           pub fn new(connector: C, timeout: Duration, authorization: Option<HeaderValue>) -> Self { 
        
               // TODO - support custom executor 
        
               let inner = Client::builder(hyper_util::rt::TokioExecutor::new()).build(connector); 
        
               Self { 
        
                   inner, 
        
                   timeout, 
        
                   authorization, 
        
               } 
        
           }

@cijothomas Is this expected to not be working? I know we spoke about this the other day e.g. #1437 , but I suppose I assumed that the code already in main was already good. Should we change the way we are creating HTTP exporters in the meantime in the integration tests?

lalitb · 2024-12-18T14:40:52Z

Will it make sense to document/support ONLY below scenarios with dedicated-thread periodic reader:

OTLP/gRPC - Using tonic client
OTLP/HTTP - Using reqwest-blocking client

It seems that hyper client and reqwest client will only work with runtime-based periodic reader.

scottgerring · 2024-12-18T15:13:39Z

Will it make sense to document/support ONLY below scenarios with dedicated-thread periodic reader:
* OTLP/gRPC - Using `tonic` client

* OTLP/HTTP - Using `reqwest-blocking` client
It seems that hyper client and reqwest client will only work with runtime-based periodic reader.

That would certainly make this ticket easier :) More generally, what's the thinking with the various clients - save our users a dependency?

cijothomas · 2024-12-18T15:36:47Z

For 1.0, our default is the dedicated thread and default gRPC client is Tonic, and default HTTP Client is reqwest::blocking, both of which are proven to be working with integration tests.

Reqwest/Hyper may have been designed to only work within tokio::runtimes, and we can just ship 1.0 without its support. (if anyone truly wants it, then the experimental-tokio-rt feature can be enabled to get it).

Post 1.0, we can see how to address supporting tonic/reqwest. I have some ideas on how to generally support other runtimes like tokio (and hence networking libs that depend on it like hyper), without exposing public APIs. Will post it into a new issue for discussion, but for 0.28 and 1.0 release, lets stick with @lalitb's suggestion here

cijothomas · 2024-12-18T15:46:42Z

The mystery deepens. I rolled my own futures-timeout that does not depend on tokio (yes, I know):

@ThomsonTan also explored this to enforce timeouts within the dedicated thread. The general conclusion was that
its best to let the individual exporters handle/enforce timeouts. , rather than OTel trying to do the job of an executor!

We may revisit this in the future.

scottgerring · 2024-12-18T15:47:13Z

@cijothomas @lalitb perfect. I'll close this issue as a no-op for the moment!

cijothomas · 2024-12-18T15:50:42Z

Could you check and see if this can be added to the OTLP docs somehow, so users know what to expect. This can be done after we merge the Batchprocessor changes as well.

scottgerring · 2024-12-18T15:51:55Z

Sure - that's a good idea !

scottgerring · 2024-12-18T19:07:21Z

What if we also used the feature to remove the threaded PeriodicReader when the user selects an async HTTP client? I haven't checked if it has flow on effects, but this would save others from going down this rabbit hole.

cijothomas · 2024-12-18T19:40:33Z

Users can do it themselves by opting in to the feature flags. We should not do it ourselves, as that'd need using non-stable features.

I don't really think such issues will be common once Otel picks a default. Previously we didn't have a default, forcing user to pick one. Once we have a default, that we validate will work with one http and gRPC library, then users have less reason to change anything. If they need to change something, the doc can give advice about what combinations are invalid.

Defaults would be:
PeriodicReader, BatchProcessor that spawn own thread + HTTP + ReqwestBlockingClient.
Users can switch to gRPC+Tonic.

Anything else - requires opt-in to the experimental ff for custom-runtimes. Does this sound reasonable for 1.0 timeline?

scottgerring · 2024-12-19T10:02:28Z

I think opinionated defaults are good, and supporting either gRPC/tonic as well as sync/async environments covers everything. Sounds good for 1.0!

Thinking ahead, do we need 3 different async HTTP clients in this library at all - what value is it adding? As cargo/rustc supports different-versions-of-the-same-package-in-the-tree via name mangling, it's not such a drama if we carry our own dep of a single library we like, and let the "outside world" do what it wants. E.g. - if we choose to use reqwest, I reckon that's merely an impl detail of opentelemetry-rust that the outside world shouldn't know or care about.

I suppose you could argue that if the stars line up and the outside and inside world can both use the same version of the same library we shave some bytes at runtime, but that feels like premature optimization.

The upside of cutting down to a single library would be decreased maintenance burden.

@cijothomas @lalitb what do you think? I'm probably missing some important context here :D

cijothomas · 2024-12-20T05:16:03Z

Thinking ahead, do we need 3 different async HTTP clients in this library at all

I think just one would have been sufficient (reqwest-blocking, which works well in default mode of own background thread). But in asynchronous runtimes, we need non-blocking one, so either reqwest or hyper is needed.

I am equally unsure of the need to support many clients! (Also ability to bring-your-own-http-client capability.. maybe there are use cases, I am happy to explore and write down the needs)

scottgerring · 2024-12-20T13:28:26Z

Thinking ahead, do we need 3 different async HTTP clients in this library at all

I think just one would have been sufficient (reqwest-blocking, which works well in default mode of own background thread). But in asynchronous runtimes, we need non-blocking one, so either reqwest or hyper is needed.

I am equally unsure of the need to support many clients! (Also ability to bring-your-own-http-client capability.. maybe there are use cases, I am happy to explore and write down the needs)

I suspect the use cases should focus on the customizations the user might need to provide - e.g. "I need {customHeaderX} for my use case, because of weird proxy stuff". Or, I need a callback hook to provide a custom header, because its a bit more complicated than that". Then the problem becomes - which library makes it easier for us to provide the flexibility we might need in the future? I'm struggling to imagine a situation where a user of the library should need to BYO HTTP client; I think that would smell a bit like a break down of abstraction on our part. But - perhaps again here there is something that's come up in the other otel SDKs that shines light here?
It would be cool to decrease the amount of code / feature switches in there down to the minimum sync/async.

@cijothomas - I feel like this is something that could use an RFC, and would be happy to start writing one if that fits?

cijothomas · 2024-12-20T15:17:30Z

I'm struggling to imagine a situation where a user of the library should need to BYO HTTP client; I think that would smell a bit like a break down of abstraction on our part. But - perhaps again here there is something that's come up in the other otel SDKs that shines light here?

My own experience is mostly .NET, where HttpClient is part of standard library itself, so this was not even a discussion point! (gRPC/protobuf was challenging, but ultimately Otel .NET decided to handcode a protobuf serializer, and got rid of gRPC/protobuf dependencies completely)

@lalitb you mentioned Otel C++ provides some ability to bring-own-http-client?

@cijothomas - I feel like this is something that could use an RFC, and would be happy to start writing one if that fits?

Totally! One scenario I can think of is the security patches - where companies want to provide own http/gRPC clients, since Rust std library has none. (To be clear - even today, we don't support bring-you-own-client capability. we just allow user to pick between 3 implementations maintained by this repo itself.)

For a 1.0 priority, it is sufficient that we support tonic+reqwest-blocking , so we just need to make sure users have enough flexibility to control these libraries like adding extra headers or authentication etc. Post 1.0 is a good time to revisit if we need more flexibility to bring-own-http-client.

(A similar thing for async runtimes and bring-your-own-runtime ability. Good post 1.0 candidate)

lalitb · 2024-12-20T16:23:22Z

@lalitb you mentioned Otel C++ provides some ability to bring-own-http-client?

Thats correct. C++ lacks a universally accepted standard HTTP client—libcurl is commonly used in POSIX environments, while Windows and macOS have their own native clients. Then there could be iOS and Android bindings on top of Otel C++ SDK which have their own http libraries - we can't impose the use of a specific HTTP client. Instead, need to provide flexible and pluggable setup to accommodate different implementations.

scottgerring mentioned this issue Dec 18, 2024

REQUEST: New membership for scottgerring open-telemetry/community#2496

Closed

6 tasks

scottgerring closed this as completed Dec 18, 2024

scottgerring reopened this Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable remaining HTTP clients in metrics integration tests #2441

Enable remaining HTTP clients in metrics integration tests #2441

scottgerring commented Dec 17, 2024

scottgerring commented Dec 17, 2024

scottgerring commented Dec 18, 2024

scottgerring commented Dec 18, 2024 •

edited

Loading

lalitb commented Dec 18, 2024

scottgerring commented Dec 18, 2024 •

edited

Loading

cijothomas commented Dec 18, 2024

cijothomas commented Dec 18, 2024

scottgerring commented Dec 18, 2024

cijothomas commented Dec 18, 2024

scottgerring commented Dec 18, 2024

scottgerring commented Dec 18, 2024

cijothomas commented Dec 18, 2024

scottgerring commented Dec 19, 2024 •

edited

Loading

cijothomas commented Dec 20, 2024

scottgerring commented Dec 20, 2024 •

edited

Loading

cijothomas commented Dec 20, 2024

lalitb commented Dec 20, 2024

Enable remaining HTTP clients in metrics integration tests #2441

Enable remaining HTTP clients in metrics integration tests #2441

Comments

scottgerring commented Dec 17, 2024

scottgerring commented Dec 17, 2024

scottgerring commented Dec 18, 2024

scottgerring commented Dec 18, 2024 • edited Loading

lalitb commented Dec 18, 2024

scottgerring commented Dec 18, 2024 • edited Loading

cijothomas commented Dec 18, 2024

cijothomas commented Dec 18, 2024

scottgerring commented Dec 18, 2024

cijothomas commented Dec 18, 2024

scottgerring commented Dec 18, 2024

scottgerring commented Dec 18, 2024

cijothomas commented Dec 18, 2024

scottgerring commented Dec 19, 2024 • edited Loading

cijothomas commented Dec 20, 2024

scottgerring commented Dec 20, 2024 • edited Loading

cijothomas commented Dec 20, 2024

lalitb commented Dec 20, 2024

scottgerring commented Dec 18, 2024 •

edited

Loading

scottgerring commented Dec 18, 2024 •

edited

Loading

scottgerring commented Dec 19, 2024 •

edited

Loading

scottgerring commented Dec 20, 2024 •

edited

Loading