Potential inefficiency of Table Synchronizer for ReaderGroups #462

RaulGracia · 2023-11-29T15:47:03Z

Problem description
We are doing performance benchmarks of Pravega using the Python binding. Once behavior that we found that is different from the Java client is that, when reading via Python client, we observe a lot of log messages like:

Nov 29 10:32:06 ip-10-0-0-107.ec2.internal pravega-segmentstore[13373]: 2023-11-29 10:32:06,042 3419086 [epollEventLoopGroup-9-3] INFO  i.p.s.s.h.h.PravegaRequestProcessor - [requestId=689] Iterate Table Entries Delta: Segment=terasort1701253869/terasort_reader_group_kvtable/0.#epoch.0 Count=10 FromPositon=60238.
Nov 29 10:32:06 ip-10-0-0-107.ec2.internal pravega-segmentstore[13373]: 2023-11-29 10:32:06,042 3419086 [epollEventLoopGroup-9-8] INFO  i.p.s.s.h.h.PravegaRequestProcessor - [requestId=1745] Iterate Table Entries Delta: Segment=terasort1701253869/terasort_reader_group_kvtable/0.#epoch.0 Count=10 FromPositon=61610.
Nov 29 10:32:06 ip-10-0-0-107.ec2.internal pravega-segmentstore[13373]: 2023-11-29 10:32:06,042 3419086 [epollEventLoopGroup-9-11] INFO  i.p.s.s.h.h.PravegaRequestProcessor - [requestId=1651] Iterate Table Entries Delta: Segment=terasort1701253869/terasort_reader_group_kvtable/0.#epoch.0 Count=10 FromPositon=60238.

This looks like queries to the Reader Group segment for the reader group we instantiate to read from Pravega. In Rust, we know that the synchronization mechanism is not the same as in Java (StateSynchronizer), but it is based on KV Tables.

What we have found is that these messages flood the logs of Pravega segment stores, which suggested us to look at the metrics for the same experiments:

As you can see, the number of Iterate Entries (which correspond to WireCommands.ReadTableEntriesDelta messages) and, apparently in consequence, the number of Get Info messages is as large as the number of actual "read messages", for the segment store handling the Reader Group segment. While the tests we are doing use relatively large reader groups (30-100 readers in the same reader group), the overhead apparently related to synchronize the reader group state seems excessive in the Rust client.

cc/ @gfinol

Problem location
Probably, the update logic around TableSynchronizer in the Rust client.

Suggestions for an improvement
It would be important for scalability reasons to think ways to minimize the number of "update" calls to the Table Segment keeping the ReaderGroup state, if possible.

The text was updated successfully, but these errors were encountered:

RaulGracia assigned RaulGracia and tkaitchuck Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential inefficiency of Table Synchronizer for ReaderGroups #462

Potential inefficiency of Table Synchronizer for ReaderGroups #462

RaulGracia commented Nov 29, 2023

Potential inefficiency of Table Synchronizer for ReaderGroups #462

Potential inefficiency of Table Synchronizer for ReaderGroups #462

Comments

RaulGracia commented Nov 29, 2023