Added batch span processor with test coverage #195

snehilchopra · 2020-07-21T03:12:49Z

This PR contains the implementation of a Batch Span Processor, as mentioned in #83.
I used the implementations in python, java and golang as references.

However, there are a lot of discrepancies in their implementations. For instance, Golang does not have a forceflush method and
Python has concurrency issues. On the other hand, Java is well synchronized, such that whenever the queue is being manipulated there are always locks guarding it.

I have added comments in the code for better context of the problems I observed.

Please let me know your thoughts on them, thanks!

linux-foundation-easycla · 2020-07-21T03:12:52Z

❌The commit (427fff6 ,a8ac7a1482616b82b07d0d606836feaeb6685dec ,eab7d986f354d76c31e423e4633bf956c8bb2a9a) is missing the User's ID, preventing the EasyCLA check. Consult GitHub Help to resolve. For further assistance with EasyCLA, please submit a support request ticket.

sdk/src/trace/batch_span_processor.cc

sdk/test/trace/batch_span_processor_test.cc

sdk/src/trace/batch_span_processor.cc

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h

sdk/src/trace/batch_span_processor.cc

sdk/test/trace/batch_span_processor_test.cc

codecov · 2020-07-22T17:09:00Z

Codecov Report

Merging #195 into master will increase coverage by 2.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #195      +/-   ##
==========================================
+ Coverage   92.19%   94.22%   +2.03%     
==========================================
  Files         117      115       -2     
  Lines        4038     3882     -156     
==========================================
- Hits         3723     3658      -65     
+ Misses        315      224      -91

Impacted Files	Coverage Δ
...clude/opentelemetry/sdk/common/atomic_unique_ptr.h	`100.00% <ø> (ø)`
...include/opentelemetry/sdk/common/circular_buffer.h	`100.00% <ø> (ø)`
...e/opentelemetry/sdk/common/circular_buffer_range.h	`100.00% <ø> (ø)`
sdk/test/common/atomic_unique_ptr_test.cc	`100.00% <ø> (ø)`
sdk/test/common/circular_buffer_range_test.cc	`100.00% <ø> (ø)`
sdk/test/common/circular_buffer_test.cc	`100.00% <ø> (ø)`
sdk/include/opentelemetry/sdk/trace/span_data.h	`98.33% <100.00%> (ø)`
sdk/src/metrics/meter_provider.cc	`100.00% <0.00%> (ø)`
api/test/metrics/noop_metrics_test.cc
api/include/opentelemetry/metrics/meter.h
... and 2 more

snehilchopra · 2020-07-22T22:42:03Z

/check-cla

sdk/test/trace/batch_span_processor_test.cc

sdk/src/trace/batch_span_processor.cc

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h

sdk/src/trace/batch_span_processor.cc

IlyaKobelevskiy · 2020-07-23T16:40:40Z

sdk/src/trace/batch_span_processor.cc

+
+    // If the queue gets at least half full a preemptive notification is 
+    // sent to the worker thread to start a new export cycle.
+    if(static_cast<int>(buffer_->size()) >= max_queue_size_ / 2){


Is it possible for notification to be missed here? Woudl it make sense to make this a loop like in other places where a notification is sent?

Since this wasn't stated anywhere explicitly in the spec, I was wondering if it was fine to leave this as a hit-or-miss sort of a notify call. All it does is try to make a preemptive notification for better performance and is not a hard requirement.

But I think when there are many producer threads, this sort of preemption might become necessary to minimize the number of spans dropped.

Please let me know what you think about this!

@reyang @pyohannes
It'd be great to get your feedback too on this!

Ok, if it is not a hard requirement, I think it is fine to leave it as a single attempt to wake up.

It seems to be a good idea to me.

sdk/src/trace/batch_span_processor.cc

sdk/test/trace/batch_span_processor_test.cc

IlyaKobelevskiy

I think overall this looks good!

ZiweiZhao · 2020-07-24T20:45:37Z

LGTM!

reyang · 2020-07-25T00:53:42Z

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h

+    /**
+     * Class destructor which invokes the Shutdown() method. The Shutdown() method is supposed to be invoked
+     * when the Tracer is shutdown (as per other languages), but the C++ Tracer only takes shared ownership of the processor, 
+     * and thus doesn't call Shutdown (as the processor might be shared with other Tracers).


FYI @cijothomas @rajkumar-rangaraj

I think @snehilchopra is the first one doing this with the correct consideration of ref counting.

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h

sdk/src/trace/batch_span_processor.cc

sdk/test/trace/batch_span_processor_test.cc

snehilchopra · 2020-07-31T23:35:05Z

@pyohannes Can I please get a final stamp on this?

pyohannes

I have some more remarks. In addition to those, I'd like to propose a simplification for the Shutdown and DoBackgroundWork method:

    void BatchSpanProcessor::Shutdown(std::chrono::microseconds timeout) noexcept
    {
      is_shutdown_ = true;

      cv_.notify_one();
      worker_thread_.join();

      exporter_->Shutdown();
    }

    void BatchSpanProcessor::DoBackgroundWork()
    {
      while (true)
      {
        std::unique_lock<std::mutex> lk(cv_m_);

        cv_.wait_for(lk, schedule_delay_millis_);

        if (is_shutdown_.load() == true)
        {
          DrainQueue();
          return;
        } 

        bool was_force_flush_called = is_force_flush_.load();

        if (was_force_flush_called)
        {
          is_force_flush_ = false;
        } else {
            if (buffer_.empty()) {
                continue;
            }   
        }   

        Export(was_force_flush_called);
      }
    }

This passes all the tests. The only drawback would be that the Shutdown call can take a bit longer, but it simplifies the solution quite a bit and I think it makes it actually easier to implement a stable timeout mechanism on top of that (maybe in another PR).

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h

sdk/test/trace/BUILD

pyohannes · 2020-08-01T00:25:22Z

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h

+  std::mutex cv_m_, force_flush_cv_m_;
+
+  /* The buffer/queue to which the ended spans are added */
+  std::unique_ptr<common::CircularBuffer<Recordable>> buffer_;


It doesn't need to be a pointer. If the batch span processor is allocated on the heap, the buffer will too.

sdk/src/trace/batch_span_processor.cc

pyohannes · 2020-08-01T00:54:58Z

sdk/src/trace/batch_span_processor.cc

+  {
+    // If we already have max_export_batch_size_ spans in the buffer, better to export them
+    // now
+    if (buffer_->size() < max_export_batch_size_)


I tried to understand the effect of this statement.

If lots of spans are constantly added to the buffer (and/or if max_export_batch_size_ is very small) this statement might always evaluate to false, and thus we'll never reach the line with the cv_.wait_for. So we won't honor the schedule_delay_millis_ under certain critical circumstances.

Sure, I can change it. It was just an effort to enhance performance, and I did this using Java's processor implementation as a reference.

sdk/src/trace/batch_span_processor.cc

pyohannes · 2020-08-01T01:07:28Z

sdk/src/trace/batch_span_processor.cc

+    auto end      = std::chrono::steady_clock::now();
+    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
+
+    timeout = schedule_delay_millis_ - duration;


I think we can get rid of this calculation and just go with waiting for the whole schedule_delay_millis_ in each iteration. This would be more in line with the spec, which says:

the delay interval in milliseconds between two consecutive exports

I was under the impression that it meant the delay interval between when two exports essentially began.
I believe python has the same logic too.

That's a good point. I had a look at Java, which doesn't have it.

I was drawn to the simpler solution, but I leave this up to you to decide.

snehilchopra · 2020-08-01T02:00:12Z

I have some more remarks. In addition to those, I'd like to propose a simplification for the Shutdown and DoBackgroundWork method:

    void BatchSpanProcessor::Shutdown(std::chrono::microseconds timeout) noexcept
    {
      is_shutdown_ = true;

      cv_.notify_one();
      worker_thread_.join();

      exporter_->Shutdown();
    }

    void BatchSpanProcessor::DoBackgroundWork()
    {
      while (true)
      {
        std::unique_lock<std::mutex> lk(cv_m_);

        cv_.wait_for(lk, schedule_delay_millis_);

        if (is_shutdown_.load() == true)
        {
          DrainQueue();
          return;
        } 

        bool was_force_flush_called = is_force_flush_.load();

        if (was_force_flush_called)
        {
          is_force_flush_ = false;
        } else {
            if (buffer_.empty()) {
                continue;
            }   
        }   

        Export(was_force_flush_called);
      }
    }

This passes all the tests. The only drawback would be that the Shutdown call can take a bit longer, but it simplifies the solution quite a bit and I think it makes it actually easier to implement a stable timeout mechanism on top of that (maybe in another PR).

Thank you so much for the simplification Johannes!

Around 2 weeks ago my initial implementation was along the same lines, just that we were encountering the case where the condition variable notify calls were being missed by the worker thread, causing the Shutdown() test to take schedule_delay_millis milliseconds to complete.

In order to avoid this, we wrapped notify calls in while loops with boolean flags as checks on whether the notify call had been received or not.

If it's permissible that the Shutdown() method's notify calls can miss, I can make the change right away.

pyohannes · 2020-08-01T02:08:56Z

If it's permissible that the Shutdown() method's notify calls can miss, I can make the change right away.

I'm ok with that. I think invocations of Shutdown are not performance critical. But I'm open to any other opinions on this.

snehilchopra · 2020-08-01T02:19:05Z

I'm ok with that. I think invocations of Shutdown are not performance critical. But I'm open to any other opinions on this.

Sure. I'll change it for implementation simplicity.

Just that there are similar issues in python's implementation, where a lot of notify calls miss.
So, we tried to address those concerns in the C++ implementation as much as possible.

…us wake ups

snehilchopra requested a review from a team July 21, 2020 03:12

snehilchopra commented Jul 21, 2020

View reviewed changes

sdk/src/trace/batch_span_processor.cc Show resolved Hide resolved

snehilchopra commented Jul 21, 2020

View reviewed changes

sdk/src/trace/batch_span_processor.cc Outdated Show resolved Hide resolved

snehilchopra commented Jul 21, 2020

View reviewed changes

sdk/test/trace/batch_span_processor_test.cc Outdated Show resolved Hide resolved

snehilchopra commented Jul 21, 2020

View reviewed changes

sdk/test/trace/batch_span_processor_test.cc Outdated Show resolved Hide resolved

snehilchopra commented Jul 21, 2020

View reviewed changes

sdk/test/trace/batch_span_processor_test.cc Outdated Show resolved Hide resolved

snehilchopra force-pushed the batch_span_processor branch from 8ffce98 to 427fff6 Compare July 21, 2020 05:08

snehilchopra commented Jul 21, 2020

View reviewed changes

sdk/src/trace/batch_span_processor.cc Outdated Show resolved Hide resolved

IlyaKobelevskiy reviewed Jul 21, 2020

View reviewed changes

snehilchopra force-pushed the batch_span_processor branch 2 times, most recently from fe416e4 to bc5f67f Compare July 22, 2020 17:57

snehilchopra mentioned this pull request Jul 23, 2020

Add timeout to Exporter interface #203

Closed

snehilchopra force-pushed the batch_span_processor branch 2 times, most recently from 84e9684 to 696939c Compare July 23, 2020 01:35

snehilchopra commented Jul 23, 2020

View reviewed changes

sdk/test/trace/batch_span_processor_test.cc Outdated Show resolved Hide resolved

snehilchopra force-pushed the batch_span_processor branch from 696939c to a8ac7a1 Compare July 23, 2020 02:12

IlyaKobelevskiy suggested changes Jul 23, 2020

View reviewed changes

snehilchopra force-pushed the batch_span_processor branch from 7addf7e to eab7d98 Compare July 23, 2020 20:23

IlyaKobelevskiy approved these changes Jul 23, 2020

View reviewed changes

snehilchopra force-pushed the batch_span_processor branch from eab7d98 to 1f0bc9c Compare July 23, 2020 22:22

reyang reviewed Jul 25, 2020

View reviewed changes

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h Outdated Show resolved Hide resolved

reyang reviewed Jul 25, 2020

View reviewed changes

sdk/include/opentelemetry/sdk/trace/batch_span_processor.h Outdated Show resolved Hide resolved

reyang reviewed Jul 25, 2020

View reviewed changes

sdk/src/trace/batch_span_processor.cc Outdated Show resolved Hide resolved

reyang reviewed Jul 25, 2020

View reviewed changes

sdk/test/trace/batch_span_processor_test.cc Outdated Show resolved Hide resolved

snehilchopra force-pushed the batch_span_processor branch from 500c646 to 381d435 Compare July 26, 2020 05:28

snehilchopra force-pushed the batch_span_processor branch 9 times, most recently from 9efe9c5 to fdba096 Compare July 31, 2020 23:34

snehilchopra force-pushed the batch_span_processor branch from fdba096 to 9bf95d8 Compare August 1, 2020 01:03

pyohannes reviewed Aug 1, 2020

View reviewed changes

snehilchopra force-pushed the batch_span_processor branch from 47a9fb1 to 61516a3 Compare August 2, 2020 00:28

snehilchopra and others added 9 commits August 4, 2020 00:28

Added batch span processor with test coverage

e58453c

Synchronized buffer, rewrote notify calls and properly handled spurio…

3e254a4

…us wake ups

Addressed review comments

6826eb2

Addressed reviews

61b54ca

Fix Format issue

f7eb038

Fixed Format issue

c9f7ab5

Resolve conflicts

73b1901

Addressed reviews

fef4c4c

Addressed further comments

6d5eb05

snehilchopra force-pushed the batch_span_processor branch from 0b49229 to 6d5eb05 Compare August 4, 2020 00:30

reyang merged commit f0789be into open-telemetry:master Aug 4, 2020

snehilchopra deleted the batch_span_processor branch August 5, 2020 17:46

WillsonHG mentioned this pull request Aug 5, 2020

Pull Request Bot Comment Review for Corporate Contributor communitybridge/easycla#1043

Closed

3 tasks

reyang mentioned this pull request Aug 25, 2020

Implement a batching span processor #83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added batch span processor with test coverage #195

Added batch span processor with test coverage #195

snehilchopra commented Jul 21, 2020 •

edited

Loading

linux-foundation-easycla bot commented Jul 21, 2020 •

edited

Loading

codecov bot commented Jul 22, 2020 •

edited

Loading

snehilchopra commented Jul 22, 2020 •

edited

Loading

IlyaKobelevskiy Jul 23, 2020

snehilchopra Jul 23, 2020

snehilchopra Jul 23, 2020

IlyaKobelevskiy Jul 23, 2020

reyang Jul 25, 2020

IlyaKobelevskiy left a comment

ZiweiZhao commented Jul 24, 2020

reyang Jul 25, 2020

snehilchopra commented Jul 31, 2020

pyohannes left a comment

pyohannes Aug 1, 2020

pyohannes Aug 1, 2020

snehilchopra Aug 1, 2020

pyohannes Aug 1, 2020

snehilchopra Aug 1, 2020

pyohannes Aug 1, 2020

snehilchopra commented Aug 1, 2020

pyohannes commented Aug 1, 2020

snehilchopra commented Aug 1, 2020

Added batch span processor with test coverage #195

Added batch span processor with test coverage #195

Conversation

snehilchopra commented Jul 21, 2020 • edited Loading

linux-foundation-easycla bot commented Jul 21, 2020 • edited Loading

codecov bot commented Jul 22, 2020 • edited Loading

Codecov Report

snehilchopra commented Jul 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IlyaKobelevskiy left a comment

Choose a reason for hiding this comment

ZiweiZhao commented Jul 24, 2020

Choose a reason for hiding this comment

snehilchopra commented Jul 31, 2020

pyohannes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snehilchopra commented Aug 1, 2020

pyohannes commented Aug 1, 2020

snehilchopra commented Aug 1, 2020

snehilchopra commented Jul 21, 2020 •

edited

Loading

linux-foundation-easycla bot commented Jul 21, 2020 •

edited

Loading

codecov bot commented Jul 22, 2020 •

edited

Loading

snehilchopra commented Jul 22, 2020 •

edited

Loading