Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential inefficiency of Table Synchronizer for ReaderGroups #462

Open
RaulGracia opened this issue Nov 29, 2023 · 0 comments
Open

Potential inefficiency of Table Synchronizer for ReaderGroups #462

RaulGracia opened this issue Nov 29, 2023 · 0 comments
Assignees

Comments

@RaulGracia
Copy link

Problem description
We are doing performance benchmarks of Pravega using the Python binding. Once behavior that we found that is different from the Java client is that, when reading via Python client, we observe a lot of log messages like:

Nov 29 10:32:06 ip-10-0-0-107.ec2.internal pravega-segmentstore[13373]: 2023-11-29 10:32:06,042 3419086 [epollEventLoopGroup-9-3] INFO  i.p.s.s.h.h.PravegaRequestProcessor - [requestId=689] Iterate Table Entries Delta: Segment=terasort1701253869/terasort_reader_group_kvtable/0.#epoch.0 Count=10 FromPositon=60238.
Nov 29 10:32:06 ip-10-0-0-107.ec2.internal pravega-segmentstore[13373]: 2023-11-29 10:32:06,042 3419086 [epollEventLoopGroup-9-8] INFO  i.p.s.s.h.h.PravegaRequestProcessor - [requestId=1745] Iterate Table Entries Delta: Segment=terasort1701253869/terasort_reader_group_kvtable/0.#epoch.0 Count=10 FromPositon=61610.
Nov 29 10:32:06 ip-10-0-0-107.ec2.internal pravega-segmentstore[13373]: 2023-11-29 10:32:06,042 3419086 [epollEventLoopGroup-9-11] INFO  i.p.s.s.h.h.PravegaRequestProcessor - [requestId=1651] Iterate Table Entries Delta: Segment=terasort1701253869/terasort_reader_group_kvtable/0.#epoch.0 Count=10 FromPositon=60238.

This looks like queries to the Reader Group segment for the reader group we instantiate to read from Pravega. In Rust, we know that the synchronization mechanism is not the same as in Java (StateSynchronizer), but it is based on KV Tables.

What we have found is that these messages flood the logs of Pravega segment stores, which suggested us to look at the metrics for the same experiments:
image
image

As you can see, the number of Iterate Entries (which correspond to WireCommands.ReadTableEntriesDelta messages) and, apparently in consequence, the number of Get Info messages is as large as the number of actual "read messages", for the segment store handling the Reader Group segment. While the tests we are doing use relatively large reader groups (30-100 readers in the same reader group), the overhead apparently related to synchronize the reader group state seems excessive in the Rust client.

cc/ @gfinol

Problem location
Probably, the update logic around TableSynchronizer in the Rust client.

Suggestions for an improvement
It would be important for scalability reasons to think ways to minimize the number of "update" calls to the Table Segment keeping the ReaderGroup state, if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants