sql: support COPY protocol #8756

maddyblue · 2016-08-23T06:12:26Z

Implement COPY by maintaining data and row buffers in the SQL Session. When
the row buffer is large enough it is executed as an insertNode.

The COPY protocol is difficult to benchmark with the current state of
lib/pq, which only supports COPY within transactions. We would like to
benchmark non-trivial (100k+ rows) datasets, but doing a single transaction
with 100k rows performs poorly in cockroach. I thus performed some ad-hoc
benchmarking using single node with comparisons also to Postgres.

I generated a random dataset of 300k rows in Postgres. Then I ran pg_dump
and pg_dump --inserts to fetch backups of that table in COPY and INSERT
modes. I inserted that data into cockroach and then used cockroach dump
to again extract it. This is because pg_dump --inserts writes INSERT
statements with one VALUE row per INSERT, which is inefficient for
cockroach. cockroach dump groups them by 100 rows per INSERT, which is
also the rate at which COPY rows are grouped.

The COPY pg_dump file and cockroach dump file were timed and inserted each
into an empty cockroach node. Both ran in about 25s: there was no
significant performance difference between COPY and INSERT. The same file
in Postgres took 2s to COPY and 8s with the cockroach dump file.

The conclusion here is that cockroach write speed is far and away the
bottleneck, and speeding up network and parse operations is not going to
produce any noticeable speedup.

This change is still useful, however, because this is a common protocol for
postgres backups.

Our CLI tool does not support COPY syntax yet. lib/pq would need a large
refactor and enhancement to support non-transactional COPY as it is
cleverly implemented using the Go database/sql statement API. Adding this
support is TODO.

Fixes #8585

This change is

knz · 2016-08-29T10:06:49Z

(This will need a rebase.)

Here you are, overall this looks nice.
I have a couple general concerns which could probably be addressed simply via additional documentation:

the COPY FROM statement behaves very differently with regards to the general executor logic and the planNode protocol. While this is now fresh in your mind, future you (and future us) will need some general help to understand how COPY FROM is different from the rest and how the different pieces connect to each other. I suggest a comment at the start of sql/copy.go saying something along the lines of "COPY FROM is not the usual planNode, its payload is provisioned by pgwire after the Executor has finished executing it via the special methods X and Y. From the moment a COPY FROM was executed as a planNode and until the point where its work is effectively finished, the planner carries a special reference to a copyNode to channel the destination of copy data payloads."
the behavior of COPY FROM when inside a transaction relative to other statements is relatively unclear in this implementation. As I now read it, your code allows for other statements to be executed via pgwire after a COPY FROM but before the COPY data is provided by the client and effectively inserted. What of subsequent statements that assume the COPY has completed? (as in: BEGIN; COPY ...; UPDATE; COMMIT). It is not so clear that we are doing the ordering right here. Assuming this works as intended it should be explained too, and a corresponding test added.
Please add a reference in the source to where you found specs or original code to implement the protocol, in particular the data encoding, so that we can trace back the origin of this protocol in the future.

Reviewed 15 of 15 files at r1.
Review status: all files reviewed at latest revision, 18 unresolved discussions, some commit checks failed.

sql/copy.go, line 15 [r1] (raw file):

// permissions and limitations under the License.
//
// Author: Matt Jibson

Perhaps add an e-mail address here for consistency with other source files.

sql/copy.go, line 28 [r1] (raw file):

)

type copyNode struct {

For consistency with other planNodes you could add a comment here to explain what this node does.

In particular a part of this comment must explain that this node does not use the expandPlan/Start/Next protocol, and explain instead what it does use.

sql/copy.go, line 84 [r1] (raw file):

  }

  p.copyFrom = cn

Needs a check earlier on that another COPY FROM is not already in progress. (if p.copyFrom != nil ...)

Also I am 99% certain you want to assign this pointer in the Start() method, not here. Otherwise a Prepare of a COPY will probably not do what you want.

sql/copy.go, line 116 [r1] (raw file):

  switch msg {
  case copyMsgData:
      // ignore

comment should be // Fall through. not "ignore"

sql/copy.go, line 138 [r1] (raw file):

      }
      // Remove lineDelim from end.
      line = line[:len(line)-1]

Does this protocol specify what happens if the last line before EOF is not NL-terminated? Right now it looks like this code would chomp the last useful char from the data. If the spec says anything please document it here. If not this needs to be tested with pg and then a comment added with your test results.

sql/copy.go, line 143 [r1] (raw file):

          line = line[:len(line)-1]
      }
      parts := bytes.Split(line, []byte{fieldDelim})

Perhaps you could initialize fieldDelim as var fieldDelim := []byte{'\t'} directly.

sql/copy.go, line 149 [r1] (raw file):

      exprs := make(parser.Exprs, len(parts))
      for i, part := range parts {
          s := string(part)

Extract this decode logic to its own function.

sql/copy.go, line 211 [r1] (raw file):

  start := 0
  for i, n := 0, len(in); i < n; i++ {
      if in[i] != '\\' {

I am not happy about the code duplication between this loop and Scanner.scanString() in parser/scan.go. Is it possible to extract and factor just the escape handling code from both functions?

sql/copy.go, line 224 [r1] (raw file):

          buf.WriteByte(decodedChar)
      } else if ch == 'x' {
          // \x can be followed by 1 or 2 hex digits.

How so "1 or 2"? Can you give an example? Where does this spec come from? This looks like am ambiguous syntax definition.

sql/copy.go, line 230 [r1] (raw file):

          }
          ch = in[i]
          digit, ok := hexDigit[ch]

Don't use a map for this. hex/octal conversion should be done arithmetically, CPU cycles are cheaper than memory. For example:

func decodeDigit(c byte, onlyOctal bool) (int, bool) {
   switch {
   case c >= '0' && c <= '7':
     return int(c - '0'), true
   case !onlyOctal && c >= '8' && c <= '9':
     return int(c - '0'), true
   case !onlyOctal && c >= 'a' && c <= 'f':
     return int(c - 'a' + 10), true
   case !onlyOctal && c >= 'A' && c <= 'F':
     return int(c - 'A' + 10), true
   default:
     return 0, false
   }
}

func decodeOctDigit(c byte) (int, bool) { return decodeDigit(c, true) }
func decodeHexDigit(c byte) (int, bool) { return decodeDigit(c, false) }

sql/copy.go, line 244 [r1] (raw file):

      } else if ch >= '0' && ch <= '7' {
          digit := octalDigit[ch]
          // 1 to 2 more octal digits follow.

Same as above.

sql/copy.go, line 269 [r1] (raw file):

}

var decodeMap = map[byte]byte{

ditto re. duplication with scan.go.

sql/copy.go, line 316 [r1] (raw file):

func (p *planner) CopyData(n CopyDataBlock, autoCommit bool) (planNode, error) {
  const copyRowSize = 100

Move this earlier at the start of the file and document it.

sql/copy_in_test.go, line 15 [r1] (raw file):

// permissions and limitations under the License.
//
// Author: Matt Jibson

ditto re. email address.

sql/copy_test.go, line 15 [r1] (raw file):

// permissions and limitations under the License.
//
// Author: Matt Jibson

ditto

sql/pgwire/v3.go, line 309 [r1] (raw file):

      case clientMsgCopyData, clientMsgCopyDone, clientMsgCopyFail:
          // ignore

I don't get it; why are they ignored here? I think I understand that these messages can only be received after a COPY statement was executed (which causes sendResponse to start processing copyIn(), but then that means that if this code is reached here there was a protocol error. I'd expect that error to be reported.

sql/pgwire/v3.go, line 842 [r1] (raw file):

  c.writeBuf.initMsg(serverMsgCopyInResponse)
  c.writeBuf.writeByte(0) // formatText

I'd suggest creating a constant formatText for this.
Also here a link to the protocol docs would be welcome.

sql/pgwire/v3.go, line 874 [r1] (raw file):

      case clientMsgFlush, clientMsgSync:
          // ignore

Are we sure we want to ignore this? I'd think that at least Flush indicates to process what's already in the buffer, regardless of what comes afterwards.

Comments from Reviewable

maddyblue · 2016-08-29T20:17:51Z

Done.

Review status: 8 of 15 files reviewed at latest revision, 19 unresolved discussions, some commit checks pending.

sql/copy.go, line 15 [r1] (raw file):