Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB][LST] Packed columns: FATAL: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50 #13037

Closed
def- opened this issue Jun 24, 2022 · 4 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/high High Priority

Comments

@def-
Copy link
Contributor

def- commented Jun 24, 2022

Jira Link: DB-2767

Description

With LST on my dev server I have run into this issue with packed columsn enabled.
On state 2e8c2fc with 85ac8d8 reverted locally (unrelated bug), I just still got a corruption:

$ bin/yb-ctl --replication_factor 1 create --tserver_flags=ysql_enable_packed_row=true,ysql_packed_row_size_limit=1700 --master_flags=ysql_enable_packed_row=true,ysql_packed_row_size_limit=1700
$ python3.9 ./long_system_test.py --nodes=127.0.0.1:5433 --threads=10 --runtime=0 --complexity=full --max-columns=10 --seed=032216
2022-06-23 16:20:02,627 MainThread INFO
2022-06-23 16:20:02,627 MainThread INFO     --------------------------------------------------------------------------------
2022-06-23 16:20:02,627 MainThread INFO     Running Long System Test 0.1
2022-06-23 16:20:02,627 MainThread INFO     --------------------------------------------------------------------------------
2022-06-23 16:20:02,627 MainThread INFO
2022-06-23 16:20:02,635 MainThread INFO     Reproduce with: git checkout 55a53ee3 && ./long_system_test.py --nodes=127.0.0.1:5433 --threads=10 --runtime=0 --complexity=full --max-columns=10 --seed=032216
2022-06-23 16:20:03,273 MainThread INFO     Database version: PostgreSQL 11.2-YB-2.15.1.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit
2022-06-23 16:20:03,275 MainThread INFO     Creating tables for database db_lst_032216
2022-06-23 16:20:15,646 MainThread INFO     Starting worker_0: RandomSelectAction, SetConfigAction
2022-06-23 16:20:15,647 MainThread INFO     Starting worker_1: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-06-23 16:20:15,649 MainThread INFO     Starting worker_2: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-06-23 16:20:15,650 MainThread INFO     Starting worker_3: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-06-23 16:20:15,650 MainThread INFO     Starting worker_4: RandomSelectAction, SetConfigAction
2022-06-23 16:20:15,651 MainThread INFO     Starting worker_5: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-06-23 16:20:15,652 MainThread INFO     Starting worker_6: RandomSelectAction, SetConfigAction
2022-06-23 16:20:15,653 MainThread INFO     Starting worker_7: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-06-23 16:20:15,653 MainThread INFO     Starting worker_8: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-06-23 16:20:15,655 MainThread INFO     Starting worker_9: RandomSelectAction, SetConfigAction
2022-06-23 16:20:25,665 MainThread INFO     Worker queries/s: [008.0][000.3][001.4][000.4][001.4][004.5][001.3][000.8][016.8][001.5]
[...]
2022-06-23 20:20:30,963 MainThread INFO     Worker queries/s: [002.6][000.1][000.1][000.1][000.2][001.4][000.5][000.1][002.6][000.5]
2022-06-23 20:20:34,040 worker_0   ERROR    Unexpected query failure: InternalError_
Query: SELECT CAST((-39.559406183517076) AS FLOAT8), count(*), 81, '-77', -80.66163256840065, '(,36)'::INT8RANGE, |/ abs(27), CAST((-86.12554451199608) AS FLOAT4), '[-71,-65]'::INT8RANGE, '{"a": 10, "b": ["0", "1", "2"], "c": true}'::jsonb, CAST((abs(-58)) AS NUMERIC), '-64', ('(30.972789861457898,39.594115396209816]'::NUMRANGE) && (range_merge(('(-9.315384351458732,10.245909454978474]'::NUMRANGE), ('[-50.93873269859868,40.04602474469601]'::NUMRANGE))), (int4range(((coalesce(upper(((int4range((-59), NULL)) * ('[50,85]'::INT4RANGE))), (abs(-2)))) / ((35) + (CAST((|/ abs((-58) + (-29))) AS INT)))), NULL)) * ('[-94,-5]'::INT4RANGE), '(-98,)'::INT2RANGE, -29, (CAST((27.02674840336678) AS TEXT)) < ('19'), TRUE, -43.514529451231354, '(-7.803418864505218,66.9163594139032]'::NUMRANGE, '1909-11-06', TRUE, -6, (1.529571636187427) - (-51.05411590382136), -83.00294329335051, (CAST((-52) AS SMALLINT)) || ('-32'), 54, '{"a": 2, "b": ["0", "1", "2", "3"], "c": true}'::json, count(tg2_1.c28_boolean), ('[-68.47975199451739,84.5785714187827]'::NUMRANGE) -|- ('[-35.94270579807552,56.858769412661786]'::NUMRANGE), coalesce(upper((('(,51]'::INT2RANGE) * ('(-66,-48)'::INT2RANGE))), (-49)), -24.41302961643727, count(tg2_1.c32_int8range), '{"a": 9, "b": ["0", "1", "2", "3", "4", "5", "6", "7", "8"], "c": true}'::jsonb, int8range((-8), NULL), '{"a": 6, "b": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"], "c": false}'::jsonb, range_merge(('(1932-05-10,1993-03-10]'::DATERANGE), ('[1903-03-08,1993-05-12)'::DATERANGE)), FALSE, count(tg2_1.c38_int8range), -50.60868045243727, '{"a": 10, "b": ["0", "1", "2", "3", "4", "5", "6"], "c": false}'::json, '(-2,12)'::INT2RANGE, 68, -28.024061251161925, CAST((99.70564298752697) AS NUMERIC), '[26.04043511618754,61.92764269891225)'::NUMRANGE, (CAST((CAST((-41.086224034656915) AS INT)) AS BIGINT)) - (-2147483648), numrange((-82.44810574353845), NULL), 70.12127290915907, -53.45517717877688, '{"a": 3, "b": ["0"], "c": true}'::jsonb, '(-55,26]'::INT8RANGE, '{"a": 3, "b": ["0", "1", "2", "3", "4", "5", "6"], "c": true}'::json, count(tg2_1.c53_numeric), CAST((48.133305107465134) AS FLOAT4), count(*), 6, ('(,-53]'::INT2RANGE) @> (CAST((34) AS SMALLINT))::SMALLINT, 89.09327899208458, '(-65.275754629371,11.22474979887427)'::NUMRANGE, count(tg2_1.c60_float8), '[-93,14]'::INT2RANGE, coalesce(upper(('[1910-08-20,1979-10-04]'::DATERANGE)), (coalesce(lower(('[1976-07-23,2010-06-15]'::DATERANGE)), (coalesce(lower(('(1942-04-02,2007-07-04)'::DATERANGE)), ('1992-05-22')))))), ('[1909-02-17,1957-10-31]'::DATERANGE) << (daterange(NULL, (coalesce(upper(('(2011-01-17,2018-05-15)'::DATERANGE)), ('1994-01-30'))))), '(1909-10-26,1985-09-30)'::DATERANGE, '(8,60)'::INT2RANGE, 16, (-62.8276749317461) * (||/ (33.8675410489754)), coalesce(upper((daterange((coalesce(upper(((daterange(NULL, (coalesce(upper(('(1935-02-12,1944-09-11)'::DATERANGE)), (coalesce(lower((daterange((coalesce(lower((range_merge(('[1942-07-15,2007-07-22]'::DATERANGE), ('[1929-05-08,1973-11-30]'::DATERANGE)))), (coalesce(upper(('[1939-02-19,2005-01-31]'::DATERANGE)), (coalesce(lower(('(1972-06-19,1978-06-26]'::DATERANGE)), ('1977-03-11'))))))), NULL))), (coalesce(upper(('(1901-12-05,1968-02-17]'::DATERANGE)), ('1997-11-08'))))))))) * (daterange(NULL, ('1993-08-22'))))), ('1948-11-13'))), NULL))), (coalesce(lower((('[1904-02-29,1920-02-18]'::DATERANGE) * ('[1907-05-12,1950-09-19]'::DATERANGE))), ('1900-07-05')))), ('[-59.888721599664166,71.83257930761849]'::NUMRANGE) * ('[-55.983497858798216,-49.90332895619132)'::NUMRANGE), -1.8327616728567335, TRUE, 6.04414524323235, count(tg2_1.c73_bigint), int4range(NULL, (-5)), coalesce(upper(('(1900-07-25,1969-02-06]'::DATERANGE)), ('1999-11-08')), -6, '1984-03-20', ('[-9.11834117867123,22.574148406995647]'::NUMRANGE) * (numrange(NULL, (51.65591557754877))), 86, '(-30,)'::INT4RANGE, 60, 4, bit_length('63'), TRUE, range_merge(('[-45,-31]'::INT2RANGE), ('(,57]'::INT2RANGE)), '(1933-07-28,1983-08-26]'::DATERANGE, 13, range_merge((range_merge(('[-96,97]'::INT8RANGE), (int8range((CAST((82) AS BIGINT)), NULL)))), (range_merge((int8range(NULL, (60))), ('(-89,-56)'::INT8RANGE)))), 59, -68, count(tg2_1.c91_int2range), count(tg2_1.c92_int4range), (-26) > ((24) - (@ (80))), '{"a": 6, "b": ["0", "1", "2", "3", "4"], "c": true}'::json, '59', coalesce(upper(('(1907-06-13,1950-03-21)'::DATERANGE)), ('2007-07-18')), '[-66.8870893640009,35.400392102789056)'::NUMRANGE, range_merge((range_merge(('(-85,34)'::INT2RANGE), ('(-29,49)'::INT2RANGE))), ('[-42,93]'::INT2RANGE)), '(-97,-67]'::INT8RANGE, '(19.157885277220288,32.85009657185344)'::NUMRANGE, '{"a": 10, "b": ["0", "1", "2", "3", "4", "5", "6", "7"], "c": true}'::json, -59.062491648158, range_merge(('[-61,93]'::INT8RANGE), ('[-4,)'::INT8RANGE)), -42.81125153716696, 65.09611284887555, '55', 46.221172676971065, '{"a": 7, "b": ["0", "1"], "c": true}'::json, count(tg2_1.c109_json), abs(ceil(95)), count(tg2_1.c111_int8range) FROM tg2_1 ORDER BY 86 ASC LIMIT 3;
  values: None
  runtime: 2022-06-23 20:20:32.860 - 2022-06-23 20:20:34.035
  supports explain: True
  supports rollback: True
  affected rows: None
Action: RandomSelectAction
Error class: InternalError_
Error code: XX000
Error message: ERROR:  Network error: recvmsg error: Connection refused
Transaction isolation level: serializable
DB Node: host: 127.0.0.1, port: 5433
DB Backend PID: 1123021

FATAL tserver file contains:

F20220623 20:20:33 ../../src/yb/tablet/tablet.cc:1337] T e4dcea3137a943979c11a909709242a4 P 197cb13f32cf40349305d9449b2639e9: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50
    @     0x7f4ee6c4734c  google::LogDestination::LogToSinks()
    @     0x7f4ee6c4101f  google::LogMessage::SendToLog()
    @     0x7f4ee6c41928  google::LogMessage::Flush()
    @     0x7f4ee6c44aaf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f4ee9bd9fa7  yb::tablet::Tablet::WriteToRocksDB()
    @     0x7f4ee9bdd2eb  yb::tablet::Tablet::ApplyIntents()
    @     0x7f4ee9bdd652  yb::tablet::Tablet::ApplyIntents()
    @     0x7f4ee9ba53ec  yb::tablet::ApplyIntentsTask::Run()
    @     0x7f4ee70d5119  yb::rpc::Strand::Done()
    @     0x7f4ee70dc3e6  yb::rpc::(anonymous namespace)::Worker::Execute()
    @     0x7f4ee6f1eb41  yb::Thread::SuperviseThread()
    @     0x7f4ee5333694  start_thread
    @     0x7f4ee507541d  __clone

As @spolitov indicated that this is a separate issue, I have opened a new bug for this. Initially I thought this was related to #12813
I have shared the full yugabyte-data directory following this corruption for analysis.

@def- def- added area/docdb YugabyteDB core features priority/high High Priority status/awaiting-triage Issue awaiting triage labels Jun 24, 2022
@yugabyte-ci yugabyte-ci added the kind/bug This issue is a bug label Jun 24, 2022
@def- def- changed the title [DocDB][LST] Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50 [DocDB][LST] Packed columns? Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50 Jun 24, 2022
@def- def- changed the title [DocDB][LST] Packed columns? Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50 [DocDB][LST] Packed columns: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50 Jun 24, 2022
spolitov added a commit that referenced this issue Jun 28, 2022
Summary:
It could happen that a packed row is already near the size limit, and then user adds new columns to the table.
Since each column uses 4 bytes in a packed row, such row could grow over the limit after repacking.
Previously we assumed that repacking row could not make it larger that the limit, and there is a check for that. But clearly it is not so in the scenario above.

Changed the code to force-repack a row even if the repacked row overflows the specified limit, so we could have rows larger than the limit without crashing. It should not be an issue, since we expect it to happen quite rarely in an actual DB. And large packed rows are just less effective, but they still work fine.

Test Plan: PgPackedRowTest.PackOverflow

Reviewers: mbautin

Reviewed By: mbautin

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17904
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jun 28, 2022
spolitov added a commit that referenced this issue Jun 30, 2022
Summary:
It could happen that a packed row is already near the size limit, and then user adds new columns to the table.
Since each column uses 4 bytes in a packed row, such row could grow over the limit after repacking.
Previously we assumed that repacking row could not make it larger that the limit, and there is a check for that. But clearly it is not so in the scenario above.

Changed the code to force-repack a row even if the repacked row overflows the specified limit, so we could have rows larger than the limit without crashing. It should not be an issue, since we expect it to happen quite rarely in an actual DB. And large packed rows are just less effective, but they still work fine.

Original diff: b608dda/D17904

Test Plan: PgPackedRowTest.PackOverflow

Reviewers: mbautin, rthallam

Reviewed By: mbautin, rthallam

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D18012
@bmatican
Copy link
Contributor

@def- Can you re-run this and confirm if Sergei's diff above fixes this?

@def-
Copy link
Contributor Author

def- commented Jun 30, 2022

Yes, running. Will report if I see this again.

@bmatican
Copy link
Contributor

bmatican commented Jul 6, 2022

@def- Is this good to close?

@def-
Copy link
Contributor Author

def- commented Jul 6, 2022

Not seen in 6 days, good enough to close.

@def- def- closed this as completed Jul 6, 2022
@def- def- changed the title [DocDB][LST] Packed columns: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50 [DocDB][LST] Packed columns: FATAL: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/docdb/docdb_compaction_context.cc:265): Unable to pack old value for 50 Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/high High Priority
Projects
None yet
Development

No branches or pull requests

4 participants