Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Fix OOM for COPY FROM Query #5453

Closed
emhna opened this issue Aug 20, 2020 · 0 comments
Closed

[YSQL] Fix OOM for COPY FROM Query #5453

emhna opened this issue Aug 20, 2020 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL)
Milestone

Comments

@emhna
Copy link
Contributor

emhna commented Aug 20, 2020

Today COPY FROM query has out-of-memory issue while copying over large file to a table.
The memory hyper-growth is observed in Postgres backend query process.

For example, a 500mb file consumes around 40gb (including virtual memory).
When the file is 1gb or larger, the process gets killed automatically.

With the new fix, it is observed that memory consumption lowered to 25 - 30mb on 1 gb file and completed copying successfully (ran on MacOS).
More experiments will be conducted to ensure larger files can be handled.

@emhna emhna added the area/ysql Yugabyte SQL (YSQL) label Aug 20, 2020
@emhna emhna added this to the v2.2.x milestone Aug 20, 2020
@emhna emhna self-assigned this Aug 20, 2020
emhna added a commit that referenced this issue Aug 31, 2020
…M with OOM fix

Summary:
Enabling COPY FROM query to run in batch sizes. The size is passed in using COPY OPTION syntax.

Eg. `COPY tab FROM 'absoluteFilePath' WITH (FORMAT CSV, HEADER, ROWS_PER_TRANSACTION 100);`

1. Freed the memory at batch level using local memory context.
2. Split transaction commit per batch level which helps to copy over larger files (say 15gb) without failure and improved latency.

Fix #1 prevented Postgres backend process from getting killed at 1GB.
The Postgres backend memory consumption reduced down to less than 30MB and latency improved by 1.5 - 2x.
However, when processing 2GB or greater, the query aborted with "missing metadata" error during tablet write (#2855).

Fix #2 split up single transaction into multiple transactions using YB transaction API.
This fix allowed larger files to be copied over successfully without the above exception.
The largest file tested was on 15gb, 320 million rows with replication factor of 1.
Copying over to a table of 4 columns (all integer types) with a primary key took 10 hours and 33 minutes to complete.

Note, batched transaction copy is not supported on non YB relations such as temporary tables, nested transactions such as within begin/end commands, and tables with statement-level before, after triggers.

For detailed findings, please check GitHub issues:
#2855
#5453

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestBatchCopyFrom
./bin/yb-build.sh release
ran manual queries on ysqlsh terminal

Reviewers: neha, mihnea, dmitry, neil, jason, alex

Reviewed By: neil, jason, alex

Subscribers: jenkins-bot, karthik, neil, kannan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D9069
emhna added a commit that referenced this issue Sep 2, 2020
…size for COPY FROM with OOM fix

Summary:
Enabling COPY FROM query to run in batch sizes. The size is passed in using COPY OPTION syntax.

Eg. `COPY tab FROM 'absoluteFilePath' WITH (FORMAT CSV, HEADER, ROWS_PER_TRANSACTION 100);`

1. Freed the memory at batch level using local memory context.
2. Split transaction commit per batch level which helps to copy over larger files (say 15gb) without failure and improved latency.

Fix #1 prevented Postgres backend process from getting killed at 1GB.
The Postgres backend memory consumption reduced down to less than 30MB and latency improved by 1.5 - 2x.
However, when processing 2GB or greater, the query aborted with "missing metadata" error during tablet write (#2855).

Fix #2 split up single transaction into multiple transactions using YB transaction API.
This fix allowed larger files to be copied over successfully without the above exception.
The largest file tested was on 15gb, 320 million rows with replication factor of 1.
Copying over to a table of 4 columns (all integer types) with a primary key took 10 hours and 33 minutes to complete.

Note, batched transaction copy is not supported on non YB relations such as temporary tables, nested transactions such as within begin/end commands, and tables with statement-level before, after triggers.

For detailed findings, please check GitHub issues:
#2855
#5453

Test Plan: Jenkins: rebase: 2.2

Reviewers: mihnea, alex, jason

Reviewed By: alex, jason

Subscribers: yql, kannan

Differential Revision: https://phabricator.dev.yugabyte.com/D9286
emhna added a commit that referenced this issue Sep 5, 2020
Summary:
This change resets memory context regularly on all YB relations that are copied from files.
Instead of resetting memory context at batch size using `rows_per_transaction` query option, memory context will be reset per row when certain conditions hold. See below for details:

With this change:
1. Memory context resets per row instead of per batch size when memory is resettable.
  - eg. `COPY tab FROM 'absoluteFilePath'` will reset memory at every row when conditions are met.

2. Memory context reset per row occurs on YB relations (which excludes temporary tables) and when rows are read from file, not stdin.
  - if `cstate->filename != NULL`, file is consumed either directly or passed in using program option
  - if `cstate->filename == NULL`, query is consumed from stdin.

Note, large files read through stdin will run into OOM since old context path must be used to hold all rows in memory before inserting to table. That issue is tracked in [[ #5603 | here ]].

Test Plan:
Rebuilt and reran Java tests.
Manually verified performance on large queries in ysqlsh.

Reviewers: alex, mihnea, jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D9313
emhna added a commit that referenced this issue Sep 9, 2020
…in COPY FROM Query

Summary:
This change resets memory context regularly on all YB relations that are copied from files.
Instead of resetting memory context at batch size using `rows_per_transaction` query option, memory context will be reset per row when certain conditions hold. See below for details:

With this change:
1. Memory context resets per row instead of per batch size when memory is resettable.
  - eg. `COPY tab FROM 'absoluteFilePath'` will reset memory at every row when conditions are met.

2. Memory context reset per row occurs on YB relations (which excludes temporary tables) and when rows are read from file, not stdin.
  - if `cstate->filename != NULL`, file is consumed either directly or passed in using program option
  - if `cstate->filename == NULL`, query is consumed from stdin.

Note, large files read through stdin will run into OOM since old context path must be used to hold all rows in memory before inserting to table. That issue is tracked in [[ #5603 | here ]].

Test Plan: Jenkins: rebase: 2.2

Reviewers: alex, mihnea, jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D9328
@emhna emhna closed this as completed Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL)
Projects
None yet
Development

No branches or pull requests

1 participant