Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy: implement buffered copy that can retry #99327

Open
cucaroach opened this issue Mar 23, 2023 · 0 comments · May be fixed by #102805
Open

copy: implement buffered copy that can retry #99327

cucaroach opened this issue Mar 23, 2023 · 0 comments · May be fixed by #102805
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@cucaroach
Copy link
Contributor

cucaroach commented Mar 23, 2023

We currently have limited retry support that is disabled and only works for nonatomic copies. We should enable it by default in 23.1 and implement a limited entire copy buffer so we can extend our retry support to atomic copies.

Jira issue: CRDB-25818

@cucaroach cucaroach added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 23, 2023
@cucaroach cucaroach self-assigned this Mar 23, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Mar 23, 2023
cucaroach added a commit to cucaroach/cockroach that referenced this issue Mar 30, 2023
This turned out to be necessary to get customers over the line with DMS
in 22.2.  Now that its had time to prove itself enable it by default.

Epic: none
Informs: cockroachdb#99327
Fixes: cockroachdb#99464
craig bot pushed a commit that referenced this issue Mar 30, 2023
99847: copy: enable copy_from_retries_enabled by default r=cucaroach a=cucaroach

This turned out to be necessary to get customers over the line with DMS
in 22.2.  Now that its had time to prove itself enable it by default.

Epic: none
Informs: #99327
Fixes: #99464


99918: server: ignore spanconfig limit for shared-process tenants r=knz a=stevendanna

This change installs a noop spanconfig limiter when starting shared-process tenants.  Since shared-process tenants are currently used for "unified architecture" in which the tenant is expected to behave as close as possible to the system tenant.

When we have a tenant-side capabilities reader, we should tie this to a capability instead.

This has an unfortunate side-effect of making any tenant-level override for the span limit a lie for shared-process tenants.

Fixes #93562. 
Fixes #93561.

Release note: None

Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Mar 30, 2023
This turned out to be necessary to get customers over the line with DMS
in 22.2.  Now that its had time to prove itself enable it by default.

Epic: none
Informs: #99327
Fixes: #99464
@cucaroach cucaroach linked a pull request May 24, 2023 that will close this issue
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 2, 2023
Previously any errors encountered during atomic copies would just be
passed back to the client which could be very disruptive to large
migrations. Now we buffer up to sql.copy.retry.max_size bytes and
if an error is encountered we replay the COPY from the beginning with
a new transaction.

Release note (copy change): Atomic COPY commands can now be retried
increasing the rate of success of DMS operations especially when atomic
COPY behavior is required.
Fixes: cockroachdb#99327
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 5, 2023
Refactor COPY so that all the buffer reading takes place in a separate
implementation of the io.Reader interface. This does two things, it
enables the COPY implementation to efficiently handle small CopyData
frames by eliminating extra buffering and exposes the COPY bytes as
a pure stream of bytes which makes retry easier. It also cleans up the
COPY code that handles CopyData segments straddling line boundaries, now
we can just let the text/CSV reader do their thing and not have to do
any writeback.

The old implementation would read from a pgwire BufferedReader (copy 1)
into a pgwire "ReadBuffer" (copy 2) and then push those segments into a
bytes.Buffer "buf" (copy 3). The text and binary readers would read
right from buf the CSV reader has its own buffer and we would read
lines from buf and write them into the CSV reader's buffer (copy 4).

The new approach does away with all this and the text format
reads directly from a bufio.Reader (copy 1) stacked on the copy.Reader
(no buffering) stacked on the pgwire BufferedReader (copy 2). For CSV
the CSVReader reads directly from the copy.Reader since it has its
own buffer so again only two copies off the wire. Binary reads directly
from the copy.Reader since it requires no ReadLine (but the I/O is still
buffered at the pgwire level).

This doesn't seem to affect performance much but it gives the GC a nice
break and sets up a clean solution for cockroachdb#99327.

When encountering a memory usage error we used to try to let the encoder
finish the row but with the more efficient buffering this started
succeeds where it always failed before. Now we just don't do the hail
mary and if we hit the limit we bail and return immediately, this is
more OOM safe and simpler.

Fixes: cockroachdb#93156
Informs: cockroachdb#99327
Release note: none
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 5, 2023
Previously any errors encountered during atomic copies would just be
passed back to the client which could be very disruptive to large
migrations. Now we buffer up to sql.copy.retry.max_size bytes and
if an error is encountered we replay the COPY from the beginning with
a new transaction.

Release note (copy change): Atomic COPY commands can now be retried
increasing the rate of success of DMS operations especially when atomic
COPY behavior is required.
Fixes: cockroachdb#99327
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 13, 2023
Refactor COPY so that all the buffer reading takes place in a separate
implementation of the io.Reader interface. This does two things, it
enables the COPY implementation to efficiently handle small CopyData
frames by eliminating extra buffering and exposes the COPY bytes as
a pure stream of bytes which makes retry easier. It also cleans up the
COPY code that handles CopyData segments straddling line boundaries, now
we can just let the text/CSV reader do their thing and not have to do
any writeback.

The old implementation would read from a pgwire BufferedReader (copy 1)
into a pgwire "ReadBuffer" (copy 2) and then push those segments into a
bytes.Buffer "buf" (copy 3). The text and binary readers would read
right from buf the CSV reader has its own buffer and we would read
lines from buf and write them into the CSV reader's buffer (copy 4).

The new approach does away with all this and the text format
reads directly from a bufio.Reader (copy 1) stacked on the copy.Reader
(no buffering) stacked on the pgwire BufferedReader (copy 2). For CSV
the CSVReader reads directly from the copy.Reader since it has its
own buffer so again only two copies off the wire. Binary reads directly
from the copy.Reader since it requires no ReadLine (but the I/O is still
buffered at the pgwire level).

This doesn't seem to affect performance much but it gives the GC a nice
break and sets up a clean solution for cockroachdb#99327.

When encountering a memory usage error we used to try to let the encoder
finish the row but with the more efficient buffering this started
succeeds where it always failed before. Now we just don't do the hail
mary and if we hit the limit we bail and return immediately, this is
more OOM safe and simpler.

Fixes: cockroachdb#93156
Informs: cockroachdb#99327
Release note: none
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 13, 2023
Previously any errors encountered during atomic copies would just be
passed back to the client which could be very disruptive to large
migrations. Now we buffer up to sql.copy.retry.max_size bytes and
if an error is encountered we replay the COPY from the beginning with
a new transaction.

Release note (copy change): Atomic COPY commands can now be retried
increasing the rate of success of DMS operations especially when atomic
COPY behavior is required.
Fixes: cockroachdb#99327
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 14, 2023
Add 3 node config for more retry coverage and turn off backups since its
a perf test.

Release note: none
Epic: CRDB-25321
Informs: cockroachdb#99327
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 22, 2023
Refactor COPY so that all the buffer reading takes place in a separate
implementation of the io.Reader interface. This does two things, it
enables the COPY implementation to efficiently handle small CopyData
frames by eliminating extra buffering and exposes the COPY bytes as
a pure stream of bytes which makes retry easier. It also cleans up the
COPY code that handles CopyData segments straddling line boundaries, now
we can just let the text/CSV reader do their thing and not have to do
any writeback.

The old implementation would read from a pgwire BufferedReader (copy 1)
into a pgwire "ReadBuffer" (copy 2) and then push those segments into a
bytes.Buffer "buf" (copy 3). The text and binary readers would read
right from buf the CSV reader has its own buffer and we would read
lines from buf and write them into the CSV reader's buffer (copy 4).

The new approach does away with all this and the text format
reads directly from a bufio.Reader (copy 1) stacked on the copy.Reader
(no buffering) stacked on the pgwire BufferedReader (copy 2). For CSV
the CSVReader reads directly from the copy.Reader since it has its
own buffer so again only two copies off the wire. Binary reads directly
from the copy.Reader since it requires no ReadLine (but the I/O is still
buffered at the pgwire level).

This doesn't seem to affect performance much but it gives the GC a nice
break and sets up a clean solution for cockroachdb#99327.

When encountering a memory usage error we used to try to let the encoder
finish the row but with the more efficient buffering this started
succeeds where it always failed before. Now we just don't do the hail
mary and if we hit the limit we bail and return immediately, this is
more OOM safe and simpler.

Fixes: cockroachdb#93156
Informs: cockroachdb#99327
Release note: none
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 22, 2023
Previously any errors encountered during atomic copies would just be
passed back to the client which could be very disruptive to large
migrations. Now we buffer up to sql.copy.retry.max_size bytes and
if an error is encountered we replay the COPY from the beginning with
a new transaction.

Release note (sql change): Atomic COPY commands can now be retried
increasing the rate of success of DMS operations especially when atomic
COPY behavior is required.
Fixes: cockroachdb#99327
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 22, 2023
Add 3 node config for more retry coverage and turn off backups since its
a perf test.

Release note: none
Epic: CRDB-25321
Informs: cockroachdb#99327
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 22, 2023
Add 3 node config for more retry coverage and turn off backups since its
a perf test.

Release note: none
Epic: CRDB-25321
Informs: cockroachdb#99327
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 22, 2023
Refactor COPY so that all the buffer reading takes place in a separate
implementation of the io.Reader interface. This does two things, it
enables the COPY implementation to efficiently handle small CopyData
frames by eliminating extra buffering and exposes the COPY bytes as
a pure stream of bytes which makes retry easier. It also cleans up the
COPY code that handles CopyData segments straddling line boundaries, now
we can just let the text/CSV reader do their thing and not have to do
any writeback.

The old implementation would read from a pgwire BufferedReader (copy 1)
into a pgwire "ReadBuffer" (copy 2) and then push those segments into a
bytes.Buffer "buf" (copy 3). The text and binary readers would read
right from buf the CSV reader has its own buffer and we would read
lines from buf and write them into the CSV reader's buffer (copy 4).

The new approach does away with all this and the text format
reads directly from a bufio.Reader (copy 1) stacked on the copy.Reader
(no buffering) stacked on the pgwire BufferedReader (copy 2). For CSV
the CSVReader reads directly from the copy.Reader since it has its
own buffer so again only two copies off the wire. Binary reads directly
from the copy.Reader since it requires no ReadLine (but the I/O is still
buffered at the pgwire level).

This doesn't seem to affect performance much but it gives the GC a nice
break and sets up a clean solution for cockroachdb#99327.

When encountering a memory usage error we used to try to let the encoder
finish the row but with the more efficient buffering this started
succeeds where it always failed before. Now we just don't do the hail
mary and if we hit the limit we bail and return immediately, this is
more OOM safe and simpler.

Fixes: cockroachdb#93156
Informs: cockroachdb#99327
Release note: none
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jun 22, 2023
Previously any errors encountered during atomic copies would just be
passed back to the client which could be very disruptive to large
migrations. Now we buffer up to sql.copy.retry.max_size bytes and
if an error is encountered we replay the COPY from the beginning with
a new transaction.

Release note (sql change): Atomic COPY commands can now be retried
increasing the rate of success of DMS operations especially when atomic
COPY behavior is required.
Fixes: cockroachdb#99327
Epic: CRDB-25321
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jul 15, 2023
Purely test change to get %100 line coverage in readTextData.

Informs: cockroachdb#99327
Epic: None
Release note: None
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jul 18, 2023
Purely test change to get %100 line coverage in readTextData.

Informs: cockroachdb#99327
Epic: None
Release note: None
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jul 18, 2023
Add 3 node config for more retry coverage and turn off backups since its
a perf test.

Release note: none
Epic: CRDB-25321
Informs: cockroachdb#99327
cucaroach added a commit to cucaroach/cockroach that referenced this issue Jul 18, 2023
Refactor COPY so that all the buffer reading takes place in a separate
implementation of the io.Reader interface. This does two things, it
enables the COPY implementation to efficiently handle small CopyData
frames by eliminating extra buffering and exposes the COPY bytes as
a pure stream of bytes which makes retry easier. It also cleans up the
COPY code that handles CopyData segments straddling line boundaries, now
we can just let the text/CSV reader do their thing and not have to do
any writeback.

The old implementation would read from a pgwire BufferedReader (copy 1)
into a pgwire "ReadBuffer" (copy 2) and then push those segments into a
bytes.Buffer "buf" (copy 3). The text and binary readers would read
right from buf the CSV reader has its own buffer and we would read
lines from buf and write them into the CSV reader's buffer (copy 4).

The new approach does away with all this and the text format
reads directly from a bufio.Reader (copy 1) stacked on the copy.Reader
(no buffering) stacked on the pgwire BufferedReader (copy 2). For CSV
the CSVReader reads directly from the copy.Reader since it has its
own buffer so again only two copies off the wire. Binary reads directly
from the copy.Reader since it requires no ReadLine (but the I/O is still
buffered at the pgwire level).

This doesn't seem to affect performance much but it gives the GC a nice
break and sets up a clean solution for cockroachdb#99327.

When encountering a memory usage error we used to try to let the encoder
finish the row but with the more efficient buffering this started
succeeds where it always failed before. Now we just don't do the hail
mary and if we hit the limit we bail and return immediately, this is
more OOM safe and simpler.

```
$ BENCHES=BenchmarkCopyCSVEndToEnd PKG=./pkg/sql/copy scripts/bench master
Executed 1 out of 1 test: 1 test passes.
name                old time/op    new time/op    delta
CopyCSVEndToEnd-10     3.94s ± 4%     3.76s ± 4%   -4.59%  (p=0.000 n=10+9)

name                old alloc/op   new alloc/op   delta
CopyCSVEndToEnd-10    8.92GB ± 1%    8.61GB ± 2%   -3.46%  (p=0.000 n=10+9)

name                old allocs/op  new allocs/op  delta
CopyCSVEndToEnd-10     13.8M ± 0%     11.6M ± 0%  -16.08%  (p=0.000 n=7+8)
```

Fixes: cockroachdb#93156
Informs: cockroachdb#99327
Release note: none
Epic: CRDB-25321
@mgartner mgartner moved this to Active in SQL Queries Jul 24, 2023
@cucaroach cucaroach moved this from Active to 24.1 Release in SQL Queries Sep 12, 2023
@mgartner mgartner moved this from 24.1 Release to New Backlog in SQL Queries Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

1 participant