Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implements new version of WindowTimeout #2822

Closed
wants to merge 7 commits into from

Conversation

OlegDokuka
Copy link
Contributor

closes #1099

Signed-off-by: Oleh Dokuka [email protected]

@OlegDokuka OlegDokuka requested a review from a team as a code owner October 21, 2021 15:28
Copy link
Contributor

@simonbasle simonbasle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level review

}

@Override
public Object scanUnsafe(Attr key) {
if (key == Attr.RUN_ON) return timer;
if (key == Attr.RUN_STYLE) return Attr.RunStyle.ASYNC;
if (key == Attr.RUN_ON) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please revert to single lines in that method

final Scheduler timer;

FluxWindowTimeout(Flux<T> source, int maxSize, long timespan, TimeUnit unit, Scheduler timer) {
final int maxSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add your name above in the @author list

@@ -41,7 +41,7 @@
public void windowWithTimeoutAccumulateOnSize() {
StepVerifier.withVirtualTime(() -> Flux.range(1, 6)
.delayElements(Duration.ofMillis(300))
.windowTimeout(5, Duration.ofMillis(2000))
.windowTimeout(5, Duration.ofMillis(2000), true)
Copy link
Contributor

@simonbasle simonbasle Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test coverage of FluxWindowTimeoutTest wasn't great in the first place (65% of lines in 3.3.x, 62% in this PR), but I think we can improve it?

Either:

  • turn these tests that use fair == trueinto parameterized ones (⚠️ this means forward merging will need to turn them into @ParameterizedTestWithName in main)
  • duplicate the tests, eg. in a @Nested...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, this is targeting main 🤔 so yeah, direct use of ParameterizedTestWithName then

@@ -9788,6 +9788,11 @@ public final void subscribe(Subscriber<? super T> actual) {
return windowTimeout(maxSize, maxTime , Schedulers.parallel());
}

public final Flux<Flux<T>> windowTimeout(int maxSize, Duration maxTime,
boolean fairBackpressure) {
return windowTimeout(maxSize, maxTime , Schedulers.parallel(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fairBackpressure parameter isn't used here, hardcoded to true

@@ -88,13 +856,13 @@ public Object scanUnsafe(Attr key) {
volatile boolean done;
volatile boolean cancelled;

volatile long requested;
volatile long requested;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid applying reformatting to the whole file, only edited lines

@@ -165,7 +165,7 @@ public void dispose() {
};

StepVerifier.create(Flux.range(1, 3).hide()
.windowTimeout(10, Duration.ofMillis(500), testScheduler))
.windowTimeout(10, Duration.ofMillis(500), testScheduler, false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is a reason this particular test only works with old mode, in which case the parameterized comment above wouldn't apply.

@@ -230,7 +230,7 @@ public void dispose() {

@Test
public void scanOperator() {
FluxWindowTimeout<Integer> test = new FluxWindowTimeout<>(Flux.just(1), 123, 100, TimeUnit.MILLISECONDS, Schedulers.immediate());
FluxWindowTimeout<Integer> test = new FluxWindowTimeout<>(Flux.just(1), 123, 100, TimeUnit.MILLISECONDS, Schedulers.immediate(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan tests of WindowTimeoutBackpressureSubscriber and InnerWindow would need to be added

}
}

static final long FINALIZED_STATE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's at least put constants at the top of the class. static methods usually go there too (although here there are a lot of them, so I'm less sure)

@simonbasle
Copy link
Contributor

simonbasle commented Oct 22, 2021

To try and give a little detail on the implementation and tradeoffs:

  • requesting N to main flux is interpreted as "requesting N windows", so 0 demand would mean 0 window opening
  • this version with backpressure requests maxSize to the source whenever a window is subscribed to (inner)
  • if a previous window was closed due to timeout, it means it has requested maxSize M to parent already, but might only have received M1 elements. So we're already expecting M2 = M - M1 elements to come naturally
  • in that case new window will only request M1 from the source (since we have pending M2, M2 + M1 = M = maxSize 👍)

The other big tradeoff is that this variant will actually prolong the life of the active window if it "closes" due to timeout (ie. there is still pending demand to the source) AND there is currently 0 request for a new window. So it doesn't really pause the upstream when there is 0 demand.

There is a fundamental question here: what does it actually mean to say "there is 0 demand for a new window"? windows are by nature tied to the timing of pushes from the source.

Possible alternative interpretations that I can think of:

  1. never request more that 1 from the source at a time (maybe making it possible to actually pause the upstream)
  2. open a new window right away which enqueues elements, and see if it will be requested (through main) and drained (through inner): doesn't pause the upstream, but correctly cuts the old window at the moment of the timeout at least
  3. ???

@joshfree
Copy link

Tagging @JonathanGiles @anuchandy

@anuchandy
Copy link

anuchandy commented Jan 22, 2022

Hi @simonbasle, I pulled the draft and tried using the new operator with the below code.

To give you a summary - The code has a producer, producing one event in every 250 ms once the backpressure request is received. It means it produces 4 events per second. The code has a consumer using the new operator to retrieve events from this producer with a max wait time of 1 second with a max size of 10.

I was expecting the consumer with the new operator to receive a window with four events every 1 second, but it is not.

We can see backpressure request going to the producer, and it is indeed producing events, but for some reason it never comes out of the new operator

@Test
public void windowTimeoutWithBackPressureFromCore() throws InterruptedException {
	// -- The Event Producer
	//    The producer emitting requested events to downstream but with a delay of 250ms between each emission.
	//
	final int eventProduceDelayInMillis = 250;
	Flux<String> eventProducer = Flux.create(sink -> {
		sink.onRequest(request -> {
			if (request != Long.MAX_VALUE) {
				System.out.println("Backpressure Request(" + request + ")");
				LongStream.range(0, request)
						.mapToObj(String::valueOf)
						.forEach(message -> {
							try {
								TimeUnit.MILLISECONDS.sleep(eventProduceDelayInMillis);
							} catch (InterruptedException e) {
								e.printStackTrace();
							}
							System.out.println("Producing:" + message);
							sink.next(message);
						});
			} else {
				sink.error(new RuntimeException("No_Backpressure unsupported"));
			}
		});
	});

	// -- The Event Consumer
	//    The consumer using windowTimeout that batches maximum 10 events with a max wait time of 1 second.
	//    Given the Event producer produces at most 4 events per second (due to 250 ms delay between events),
	//    the consumer should receive 3-4 events.
	//
	final int eventConsumeDelayInMillis = 0;
	final Scheduler scheduler = Schedulers.newBoundedElastic(10, 10000, "queued-tasks");
	final AtomicBoolean hasError = new AtomicBoolean(false);
	final Semaphore isCompleted = new Semaphore(1);
	isCompleted.acquire();

	Disposable subscription = eventProducer.windowTimeout(10, Duration.ofSeconds(1), true)
			.concatMap(Flux::collectList)
			.publishOn(scheduler)
			.subscribe(eventBatch -> {
				for (String event : eventBatch) {
					System.out.println("Consuming: " + event);
					try {
						TimeUnit.MILLISECONDS.sleep(eventConsumeDelayInMillis);
					} catch (InterruptedException e) {
						System.err.println("Could not sleep for delay. Error: " + e);
					}
				}
				System.out.println("Completed batch.");
			}, error -> {
				System.err.println("Error: " + error);
				hasError.set(true);
				isCompleted.release();
			}, () -> {
				System.out.println("Completed.");
				isCompleted.release();
			});

	System.out.println("Running test...");
	final Duration TIME_TO_PUBLISH_EVENTS = Duration.ofMinutes(1);
	try {
		assertFalse(isCompleted.tryAcquire(TIME_TO_PUBLISH_EVENTS.toMinutes(), TimeUnit.MINUTES),
				"Should have been false because it would not error.");

		assertFalse(hasError.get(), "Should not have received an error.");
	} finally {
		subscription.dispose();
	}

	System.out.println("Completed test.");
}
Output

Backpressure Request(10)
Producing:0
Producing:1
Producing:2
Backpressure Request(3)
Producing:3
Producing:0
Producing:4
Producing:1
Producing:5
Producing:2
Consuming: 0
Consuming: 1
Consuming: 2
Completed batch.
Producing:6
Backpressure Request(7)
Producing:7
Producing:0
Producing:8
Producing:1
Producing:9
Running test...
Producing:2
Producing:3
Producing:4
Producing:5
Producing:6
Backpressure Request(10)
Producing:0
Producing:1
Producing:2
Producing:3
Producing:4
Producing:5
Producing:6
Producing:7
Producing:8
Producing:9
Backpressure Request(10)
Producing:0
Producing:1
Producing:2
Producing:3
Producing:4
Producing:5
Producing:6
Producing:7
Producing:8
Producing:9
Backpressure Request(10)
Producing:0
Producing:1
Producing:2
Producing:3
Producing:4
Producing:5
Producing:6
Producing:7
Producing:8
Producing:9
Backpressure Request(10)
Producing:0
Producing:1
Producing:2
Producing:3
Producing:4
Producing:5
Producing:6
Producing:7
Producing:8
Producing:9
Backpressure Request(10)
Producing:0
Producing:1
Producing:2
Producing:3
Producing:4
Producing:5
Producing:6
Producing:7
Producing:8
Producing:9
Backpressure Request(10)
Producing:0
Producing:1
Producing:2

Oleh Dokuka added 2 commits February 1, 2022 01:02
@OlegDokuka OlegDokuka force-pushed the enhancement/window-timeout-new branch from c69ca37 to 2ea6bca Compare January 31, 2022 23:02
@OlegDokuka
Copy link
Contributor Author

OlegDokuka commented Jan 31, 2022

@anuchandy updated implementation a bit. Also, added your tests which demonstrate that the new impl is working as expected (probably).

Also, just FYI - operators like publishOn have a prefetch buffer which may store some amount of collected elements. Also, I added subscribeOn to decouple the requester thread from the producer thread. Impl is still in progress and requires some polishing/stress tests ( but at least it may give more ideas about the new behaviors )

@JonathanGiles
Copy link

Thank you @OlegDokuka - we will get to testing this straight away.

@anuchandy
Copy link

Hi @OlegDokuka, thanks of the updated impl and cleaning up the test. I’ll continue testing the impl.

Just curious, the stress test you mentioned - is it a part of reactor-core public repo or is it kind of validations done internally?

@OlegDokuka
Copy link
Contributor Author

@anuchandy all existing stress tests could be found there -> https://github.com/reactor/reactor-core/tree/main/reactor-core/src/jcstress/java/reactor/core/publisher

@anuchandy
Copy link

Thanks, @OlegDokuka. As part of validations, I'm trying to understand the various backpressure involved, IIUC, considering the following code, there are two categories of backpressure -

Flux<Event> source = ..
Flux<Flux<Event>> sourceWindowed = source.windowTimeout(500, Duration.ofSeconds(3), ..);
  1. Backpressure to source for events.
  2. Backpressure to sourceWindowed for windows.

If we chain an operator with prefetch buffer (e.g., size 50) to sourceWindowed - does that mean

  1. It results in requesting 50 "windows" to sourceWindowed
  2. which internally requests 50 * 500 "events" to the source
  3. with the timeout constrain of 3 sec applied to each of the 50 windows

Is this understanding correct?

  1. Is there a setting that caps the number of concurrent "windows" in the above case? (e.g., capped by the threads in the scheduler)

@OlegDokuka
Copy link
Contributor Author

OlegDokuka commented Feb 3, 2022

@anuchandy

Backpressure to source for events.

there is a backpressure equal to the max elements in the window. So

Backpressure to sourceWindowed for windows.

it depends on the downstream. For example using concatMap(fn, 0) with zero prefetch setup, the demand going to be equal to 1. Once downstream prefetched a value from sourceWindowed the upstream going to receive up to maxWindowSize (which is in example 500. you may wonder why up to 500 - because if the previous window did not fulfill completely, then the next window will not request 500 again, but rather the diff between what received and what is remaining to be sent from the source).

It results in requesting 50 "windows" to sourceWindowed

yes, if downstream prefetches 50, then it will be stored in the requested value.

which internally requests 50 * 500 "events" to the source

nope, only 500 can be requested at a time.

with the timeout constrain of 3 sec applied to each of the 50 windows

since only a single window can be opened at a time. There is not possible intersections between windows in that operator

Hope, that explains everything,
Oleh

@anuchandy
Copy link

Thank you, @OlegDokuka, for the answers. Using the diff for the demand makes sense; we don't want to overread from the main source; combining that with the prolonged window (if no demand for new window) collecting any events after the timeout seems to ensure no event loss.

I am continuing the test/profiling. Please tag me if you amend new commit(s), that are ready for validation.

@OlegDokuka
Copy link
Contributor Author

@anuchandy working on more fixes... will ping you

@anuchandy
Copy link

Hi @OlegDokuka, Hope you're doing great; I just want to check; how is the implementation going?

@JonathanGiles
Copy link

Any status update here @simonbasle and @OlegDokuka ? Thanks!!

@OlegDokuka
Copy link
Contributor Author

@JonathanGiles it is coming. Stay tuned.

@JonathanGiles
Copy link

Thanks for the update, much appreciated. We are very eager to see this improvement!

Oleh Dokuka added 4 commits April 14, 2022 13:05
wip
Signed-off-by: Oleh Dokuka <[email protected]>
wip
Signed-off-by: Oleh Dokuka <[email protected]>
wip
Signed-off-by: Oleh Dokuka <[email protected]>
wip
Signed-off-by: Oleh Dokuka <[email protected]>
@anuchandy
Copy link

thanks @OlegDokuka, just tag me when the next draft is ready, so that I can pull it to test via our use cases

@anuchandy
Copy link

Hi @OlegDokuka, I hope you're doing good. Just checking, how is the work progressing? P lease let me know once you have the next draft to test.

@OlegDokuka
Copy link
Contributor Author

OlegDokuka commented Jun 10, 2022

@anuchandy this PR is superseed by the #3054 which is about to be merged and going to be released next Tuesday. Stay tuned

@OlegDokuka
Copy link
Contributor Author

close since is superseded by #3054

@OlegDokuka OlegDokuka closed this Jun 13, 2022
@chemicL chemicL deleted the enhancement/window-timeout-new branch April 11, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WindowTimeout backpressure
5 participants