Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support COPY protocol #8756

Merged
merged 1 commit into from
Aug 30, 2016
Merged

sql: support COPY protocol #8756

merged 1 commit into from
Aug 30, 2016

Commits on Aug 30, 2016

  1. sql: support COPY protocol

    Implement COPY by maintaining data and row buffers in the SQL Session. When
    the row buffer is large enough it is executed as an insertNode.
    
    The COPY protocol is difficult to benchmark with the current state of
    lib/pq, which only supports COPY within transactions. We would like to
    benchmark non-trivial (100k+ rows) datasets, but doing a single transaction
    with 100k rows performs poorly in cockroach. I thus performed some ad-hoc
    benchmarking using single node with comparisons also to Postgres.
    
    I generated a random dataset of 300k rows in Postgres. Then I ran `pg_dump`
    and `pg_dump --inserts` to fetch backups of that table in COPY and INSERT
    modes. I inserted that data into cockroach and then used `cockroach dump`
    to again extract it. This is because `pg_dump --inserts` writes INSERT
    statements with one VALUE row per INSERT, which is inefficient for
    cockroach. `cockroach dump` groups them by 100 rows per INSERT, which is
    also the rate at which COPY rows are grouped.
    
    The COPY pg_dump file and cockroach dump file were timed and inserted each
    into an empty cockroach node. Both ran in about 25s: there was no
    significant performance difference between COPY and INSERT. The same file
    in Postgres took 2s to COPY and 8s with the cockroach dump file.
    
    The conclusion here is that cockroach write speed is far and away the
    bottleneck, and speeding up network and parse operations is not going to
    produce any noticeable speedup.
    
    This change is still useful, however, because this is a common protocol for
    postgres backups.
    
    Our CLI tool does not support COPY syntax yet. lib/pq would need a large
    refactor and enhancement to support non-transactional COPY as it is
    cleverly implemented using the Go database/sql statement API. Adding this
    support is TODO.
    
    Fixes #8585
    maddyblue committed Aug 30, 2016
    Configuration menu
    Copy the full SHA
    7449943 View commit details
    Browse the repository at this point in the history