Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a server crash #235

Merged
merged 1 commit into from
Feb 1, 2024
Merged

Fix a server crash #235

merged 1 commit into from
Feb 1, 2024

Conversation

japinli
Copy link
Contributor

@japinli japinli commented Jan 29, 2024

When chunk_group_row_limit is bigger than 110000, there is a crash caused by ReadStripeNextVector().

For example:

CREATE TABLE t1 (id int, info text) USING columnar;
CREATE TABLE t2 (id int, info text) USING columnar;
SELECT columnar.alter_columnar_table_set('t1', chunk_group_row_limit => '10000');
SELECT columnar.alter_columnar_table_set('t2', chunk_group_row_limit => '11000');
INSERT INTO t1 SELECT id, md5(id::text) FROM generate_series(1, 10000000) id;
INSERT INTO t2 SELECT id, md5(id::text) FROM generate_series(1, 10000000) id;
[local]:345317 postgres=# SELECT count(*) FROM t1;
  count
----------
 10000000
(1 row)

[local]:345317 postgres=# SELECT count(*) FROM t2;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
: !?>

PostgreSQL version 16.1, hydra columnar dd0ef07.

		if (!ReadChunkGroupNextVector(stripeReadState->chunkGroupReadState,
									  columnValues, columnNulls, 
									  stripeReadState->tupleDescriptor,
									  columnValueOffset, 
									  newVectorSize,
									  rowNumber,
									  chunkFirstRowNumber))
		{
			/* if this chunk group is exhausted, fetch the next one and loop */
			EndChunkGroupRead(stripeReadState->chunkGroupReadState);
			stripeReadState->chunkGroupReadState = NULL;
			stripeReadState->chunkGroupIndex++;

			if (*newVectorSize == 0)
				continue;
		}
		^----------> here the stripeReadState->chunkGroupReadState might be freed, but used next.

		stripeReadState->currentRow += stripeReadState->chunkGroupReadState->rowCount;

@JerrySievert
Copy link
Contributor

awesome, thanks for the PR!

@JerrySievert JerrySievert self-assigned this Jan 29, 2024
Copy link
Contributor

@JerrySievert JerrySievert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a test (in columnar/src/test/regress/sql) for this? I am happy to help you if you need.

@japinli
Copy link
Contributor Author

japinli commented Jan 31, 2024 via email

@japinli
Copy link
Contributor Author

japinli commented Jan 31, 2024

Here is the regressions test out.

dd0ef0720-regression.diffs.txt
9cf098a26-regression.diffs.txt

When chunk_group_row_limit is bigger than 110000, there is a crash
caused by ReadStripeNextVector().
@japinli
Copy link
Contributor Author

japinli commented Jan 31, 2024

@JerrySievert I try to add a test for this crash, however, I cannot run make check locally. Please take a look! Thanks in advance!

@japinli
Copy link
Contributor Author

japinli commented Jan 31, 2024

Here is the regressions test out.

dd0ef0720-regression.diffs.txt 9cf098a26-regression.diffs.txt

Aha, there is an OOM error on my dev. :(

@mkaruza
Copy link
Contributor

mkaruza commented Feb 1, 2024

Hi @japinli, thank you for report and PR - it looks correct to me.

I don't see crash with regression on PG15/PG16.

@japinli
Copy link
Contributor Author

japinli commented Feb 1, 2024

Hi @japinli, thank you for report and PR - it looks correct to me.

I don't see crash with regression on PG15/PG16.

Hi, @mkaruza, thanks for the review.

Yeah, it doesn't crash. The regression fails because there is an out-of-memory, which leads Postgres process being killed.

It seems 8GB RAM isn't enough for running a regression test. I'm not sure about this.

@JerrySievert JerrySievert merged commit 810c2b1 into hydradatabase:main Feb 1, 2024
@japinli japinli deleted the crash branch February 2, 2024 01:34
@wuputah wuputah mentioned this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants