-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] LightweightMessage sets max size incorrectly when parsing #21352
Labels
2.20 Backport Required
2024.1 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Comments
es1024
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Mar 7, 2024
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
status/awaiting-triage
Issue awaiting triage
and removed
status/awaiting-triage
Issue awaiting triage
labels
Mar 7, 2024
rthallamko3
added
2.20 Backport Required
2.18 Backport Required
and removed
status/awaiting-triage
Issue awaiting triage
labels
Mar 11, 2024
1 task
es1024
added a commit
that referenced
this issue
Sep 4, 2024
Summary: LightweightMessage currently sets a maximum size for reading to `rpc_max_message_size` (default 255 MB), but Preparer batches based on `protobuf_message_total_bytes_limit` (default 511 MB). This results in cases where under default settings, we may have protobufs between 255 MB and 511 MB, which `LightweightMessage::ParseFromSlice` is unable to read. This diff changes the limit to use `protobuf_message_total_bytes_limit`, so that protobufs between `rpc_max_message_size` and `protobuf_message_total_bytes_limit` can be parsed properly. This addresses errors such as: ``` Found a corruption in a closed log segment: OK Error: Corruption (yb/consensus/log_util.cc:965): Log file corruption detected.: Failed to parse PB at offset: 26013423, length: 303149529. Cause: Corruption (yb/rpc/lightweight_message.cc:376): Failed to parse ‘entry’: Failed trying to read batch #5 at offset 26013423 for log segment /mnt/d0/yb-data/tserver/wals/table-12345678901234567890123456789012/tablet-12345678901234567890123456789012/wal-000003000: ... ``` (length larger than 255 MB) when such protobufs are written to WALs. This also fixes cause of flakiness for TabletPeerTest.MaxRaftBatchProtobufLimit in TSAN builds. Jira: DB-7654, DB-10251 Test Plan: Jenkins. Added test: ``` yb_build.sh --cxx-test rpc_lwproto-test --gtest_filter LWProtoTest.BigMessage ``` Also ran TabletPeerTest.MaxRaftBatchProtobufLimit 100x on Jenkins. Reviewers: sergei, qhu Reviewed By: qhu Subscribers: yyan, bogdan, rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D33041
jasonyb
pushed a commit
that referenced
this issue
Sep 6, 2024
Summary: c587efd [docs] minor edit (#23796) 31e09f3 [PLAT-15029] yba installer split data and software directory setup f5ba17d [PLAT-15039]: Fix bootstrap on bi-directional xCluster config creation 578248a [#23770] YSQL: Deterministically populate catalog cache in tests with Connection Manager enabled 7b1f22a [#23799] test: Fixed PgTableSizeTest.SharedTableSize test for pg15 788434a [#18771, #21352] docdb: Fix LightweightMessage max size when parsing 02ced43 [#22821] YSQL: Preserve local limit in a multi-page read 50ff737 [#23741] docdb: Fix cloning of colocated databases with only parent table Excluded: 9889df7 [#23706] YSQL: Add table-level catcache Prometheus metrics c770d79 [#23747] MetaCache: Callback should not be called while holding the lock Excluded: 40689bc [#22150] YSQL, QueryDiagnostics: EXPLAIN (ANALYZE, DIST) support for queryDiagnostics 1655e69 [PLAT-15148]: Set XCluster Table Status to DroppedFromTarget if table in replication is dropped from target only 16262f7 [#22519] YSQL: Simplify API of the ExplicitRowLockBuffer class 6614afb [PLAT-14958][PLAT-14959] Make ssh fields optional if skipProvisioning is true 1153b56 [PLAT-14867] Make sure restart alerts don't trigger for small time updates during NTP sync bf1c7bc [PLAT-12226] Add connection pooling status to universe health check 38d8ae8 [PLAT-14805]Support adding EAR configs a180bef [#19134] YSQL, ASH: Setting ASH circular buffer size based on the number of cores 7d8fc76 Adjust heading link (#23807) 4c6cf5a [PLAT-4899]Basic validation of certificates f24eb10 [#23787] YSQL: Avoid executing conn mgr guc variables hooks for parallel workers ee18df8 [PLAT-13921] [K8] [UI] Universe action tasks are disabled after a failed shrink rr node task Excluded: dcf1821 [#23797] YSQL: Modify some tests to run in single connection mode with Connection Manager 0ac22cd [Docs] Remove Drift chat bot (#23802) 0e91003 [#22825] DocDB: Vector Index General Read Path with DummyANN e8f09b5 [PLAT-15175] Make runtime conf for skipping cluster consistency check public ee479ee Versionwarning (#23781) a05c6a3 [DB-12681] yugabyted-ui: Add Voyager commands to different Voyager phases in the UI. cc80d59 [#23777] yugabyted: updating the pg parity testcase to reflect the new gflags enabled for the pg parity feature. Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D37822
es1024
added a commit
that referenced
this issue
Sep 6, 2024
…ize when parsing Summary: Original commit: 788434a / D33041 LightweightMessage currently sets a maximum size for reading to `rpc_max_message_size` (default 255 MB), but Preparer batches based on `protobuf_message_total_bytes_limit` (default 511 MB). This results in cases where under default settings, we may have protobufs between 255 MB and 511 MB, which `LightweightMessage::ParseFromSlice` is unable to read. This diff changes the limit to use `protobuf_message_total_bytes_limit`, so that protobufs between `rpc_max_message_size` and `protobuf_message_total_bytes_limit` can be parsed properly. This addresses errors such as: ``` Found a corruption in a closed log segment: OK Error: Corruption (yb/consensus/log_util.cc:965): Log file corruption detected.: Failed to parse PB at offset: 26013423, length: 303149529. Cause: Corruption (yb/rpc/lightweight_message.cc:376): Failed to parse ‘entry’: Failed trying to read batch #5 at offset 26013423 for log segment /mnt/d0/yb-data/tserver/wals/table-12345678901234567890123456789012/tablet-12345678901234567890123456789012/wal-000003000: ... ``` (length larger than 255 MB) when such protobufs are written to WALs. This also fixes cause of flakiness for TabletPeerTest.MaxRaftBatchProtobufLimit in TSAN builds. Jira: DB-7654, DB-10251 Test Plan: Jenkins. Added test: ``` yb_build.sh --cxx-test rpc_lwproto-test --gtest_filter LWProtoTest.BigMessage ``` Also ran TabletPeerTest.MaxRaftBatchProtobufLimit 100x on Jenkins. Reviewers: sergei, qhu Reviewed By: qhu Subscribers: yyan, bogdan, rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D37781
es1024
added a commit
that referenced
this issue
Sep 6, 2024
…e when parsing Summary: Original commit: 788434a / D33041 LightweightMessage currently sets a maximum size for reading to `rpc_max_message_size` (default 255 MB), but Preparer batches based on `protobuf_message_total_bytes_limit` (default 511 MB). This results in cases where under default settings, we may have protobufs between 255 MB and 511 MB, which `LightweightMessage::ParseFromSlice` is unable to read. This diff changes the limit to use `protobuf_message_total_bytes_limit`, so that protobufs between `rpc_max_message_size` and `protobuf_message_total_bytes_limit` can be parsed properly. This addresses errors such as: ``` Found a corruption in a closed log segment: OK Error: Corruption (yb/consensus/log_util.cc:965): Log file corruption detected.: Failed to parse PB at offset: 26013423, length: 303149529. Cause: Corruption (yb/rpc/lightweight_message.cc:376): Failed to parse ‘entry’: Failed trying to read batch #5 at offset 26013423 for log segment /mnt/d0/yb-data/tserver/wals/table-12345678901234567890123456789012/tablet-12345678901234567890123456789012/wal-000003000: ... ``` (length larger than 255 MB) when such protobufs are written to WALs. This also fixes cause of flakiness for TabletPeerTest.MaxRaftBatchProtobufLimit in TSAN builds. Jira: DB-7654, DB-10251 Test Plan: Jenkins. Added test: ``` yb_build.sh --cxx-test rpc_lwproto-test --gtest_filter LWProtoTest.BigMessage ``` Also ran TabletPeerTest.MaxRaftBatchProtobufLimit 100x on Jenkins. Reviewers: sergei, qhu Reviewed By: qhu Subscribers: ybase, rthallam, bogdan, yyan Differential Revision: https://phorge.dev.yugabyte.com/D37780
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.20 Backport Required
2024.1 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/medium
Medium priority issue
Jira Link: DB-10251
Description
LightweightMessage currently sets a maximum size for reading to rpc_max_message_size (default 255 MB), but Preparer batches based on protobuf_message_total_bytes_limit (default 511 MB). This results in cases where under default settings, we may have protobufs between 255 MB and 511 MB, which LightweightMessage::ParseFromSlice is unable to read.
When these protobufs are written to the WAL, errors like the following may occur afterwards:
where the length in the error is larger than rpc_max_message_size.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: