-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupBy using only one group after some time #7544
Comments
Because of the If you observe a group, similarly to the second example, you have ample of requests per group and thus get a mixture of processing: .groupBy(item -> item.getPartition())
.flatMap(grouped -> {
return grouped.observeOn(scheduler).flatMap(container -> {
return Flowable.fromAction(() -> {
Thread.sleep(100);
System.out.println("Partition " + container.partition + " " + container.item + " " + Thread.currentThread().getName());
});
}, 1);
}) |
There are 2 nested flatmaps. First flatmap is for groups, and there is no constraint on concurrency. So I was thinking that all available groups will be sequentially filled. Then, for each group, their sequential elements are processed 1 at a time. Scheduled on threadPoolExecuter to make this processing parallel. Do I understand you correctly that concurrency limit on inner flatmap can affect mapping of groups? (.flatMap(grouped ) Thank you for a reply, |
Concurrency limit can result in skewed request patterns, thus it can end up requesting 1 from one group after all. The processing speed is slow compared to the production speed so that when the next 1 item is requested, it gets routed all the way back to the very first category, as it will always have something ready. There is no balancing (or round-robin) happening. |
There are 5 threads in a pool, and processor has enough cores. How is it that that while 1 item is being processed (and given group's flatMap is being blocked, as only 1 concurrent processing per group can happen) other threads are idling? |
Please try out the example I suggested. |
I have rearranged my code: ` But after some time the same situation occurs. One partition only is printing. I might have not understood you correctly. Was your example supposed to be a solution? |
The way you generate the input to The |
It is more clear now, thank you for your replies! |
Combination of groupBy() and flatMap() operators. In the first test with groupBy(), you are creating groups based on the partition value of each Container. However, the flatMap() operator used to process each group is not respecting the grouping, and is allowing items from different groups to be processed in parallel. This can result in some groups being blocked while others are still being processed, causing the behavior you observed where only one partition is being processed at a time. In the second test without groupBy(), you are using the observeOn() operator to ensure that each item is processed sequentially on the scheduler you created. This approach avoids the issue with groupBy() and flatMap(), and allows items to be processed in parallel while still maintaining sequential processing within each partition. To fix the issue in the first test, you can use the concatMap() operator instead of flatMap() to ensure that items from each group are processed sequentially. This would look something like this: |
Hello,
I have experienced a strange behaviour in code and created unit tests that demonstrate it.
I am running RxJava3 v 3.1.5
Tests written in groovy and spock:
First test with groupBy fetches simultaneously values from a source (lets assume that fetching is slow so we do it in parallel for different values)
Fetched values must be processed sequentially, so after joining them in common stream, we group them by "partition" and then process groups in parallel - but still keeping processing of each group sequential.
Unfortunately after few seconds only one partition is being processed. The threads are changing, but data is fetched only from partiton X. I would expect that each group is processed in parallel, so in output log there should be many partitions interleaving.
The other tests shows how the result should be. It actually allows us to fetch data in parallel and proces streams together, sequantially each. So I treat it as a workaroud, but still wonder what happened in first example.
Could you help me with finding a couse of this behaviour?
The text was updated successfully, but these errors were encountered: