INSERT ON CONFLICT causes huge latency when executed in parallel #28487

cgebe · 2018-08-11T12:33:13Z

Is this a bug report or a feature request?

bug

BUG REPORT

I have a cockroach cluster running inside kubernetes with 3 nodes. They are running in the same data center on 3 different machines.

Now, i have 3 pods (distributed across the machines), each running 3 workers inside. One worker is inserting data points via batch inserts into 1 specific database (3 databases in total) every second (usually its about 50 data points per INSERT). The data points inserted by 1 worker into 1 database can be in conflict with data points (the same points) inserted by another worker of another pod. In addition, one worker should be able to insert duplicate data points (i solved that issue already by discarding duplicates upfront due to to #21360), however there are still conflicts across workers from different pods which immensely impact the insert time ~10 seconds.

This issue only occurs when i start the pods at the very same time via the deployment. When i manually scale up the deployment, the insert time frames between pods are shifted therefore not going into conflict and keeping the latency in acceptable frames.

Which version of CockroachDB are you using?

v2.0.4

What did you do?

Started INSERT ON CONFLICT batch queries with ~50 data points from different pods.

What did you expect to see?

The latency staying normal and conflicts not having that much of an impact.

What did you see instead?

Until 10:00 i had the three pods running which i scaled up manually. At 10:00 i restarted the whole deployment leading to all worker nearly inserting at the same point in time every second

What was the impact?

Queries taking too long

cgebe · 2018-08-11T12:51:36Z

just saw: #28461 related?

petermattis · 2018-08-11T14:15:08Z

@cgebe #28461 is possibly related, but more likely you're hitting problems with conflicting operations in 2.0. Significant performance improvements for conflict operations have been made in 2.1 (not yet released). If it is possible, can you retest using the latest 2.1 alpha?

cgebe · 2018-08-11T15:30:24Z

hey @petermattis , i updated and scaled the pods up again manually:

First part is one pod, the second pod did not have that much of an impact, however when adding the third one the latency went over 2 seconds again, finally i removed one pod again, and latency going down again where it would be okay.

During the execution of 3 pods i got the following TX error:

pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)

A performance increase is definitely noticeable.

tbg · 2018-08-13T13:08:52Z

@nvanbenschoten for triage. @andreimatei, you might be interested in the TransactionStatusError.

cgebe · 2018-08-13T13:34:51Z

Log excerpt, the number is the record count to be inserted. (if you need more details, let me know)

{"level":"error","time":"2018-08-13T13:21:05Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:05Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:08Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:08Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:12Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:18Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:21Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:21Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:23Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:23Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:24Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:24Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:28Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:28Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:30Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:30Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:31Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:31Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:32Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:32Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:34Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:34Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:35Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:35Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:36Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:36Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:46Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:46Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:47Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:47Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:52Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:52Z","message":"12"}
{"level":"error","time":"2018-08-13T13:22:04Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:04Z","message":"6"}
{"level":"error","time":"2018-08-13T13:22:09Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:09Z","message":"6"}
{"level":"error","time":"2018-08-13T13:22:23Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:23Z","message":"12"}
{"level":"error","time":"2018-08-13T13:22:27Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:27Z","message":"6"}

nvanbenschoten · 2018-08-14T00:27:54Z

Hi @cgebe, thanks for the report. First off, do you mind sharing an example of what the queries you're issuing look like? You mentioned that you're running INSERT ON CONFLICT statements, but I'd like to get a better idea of what the conflict resolution expression is so that I can try to reproduce what you're seeing.

One way that we can try to isolate where the issues lie is to perform the exact same experiment but without any conflicts. Peter mentioned that we have made large improvements in 2.1 to address contention, but I still suspect that its causing issues here.

Another thing we should try is dropping each statement down to only operating on a single data point. Batching is a recommended optimization to improve throughput in cases where there are no conflicts, but it's less clear of a win when there are. Why don't we try reducing the statements to all only operate on a single data point at first and see if we still observe large latency spikes when adding in the third pod.

The TransactionStatusError errors are not expected. Are you only seeing them on 2.1?

cgebe · 2018-08-14T09:51:32Z

hey @nvanbenschoten, the query looks like the following, with a combined primary key:

INSERT INTO trades (tid, market, price, amount, order_type, timestamp) VALUES %s ON CONFLICT (tid, market) DO NOTHING

I am building batch queries this way (map due to #21360):

func (s *Store) InsertTradesMap(trades map[string]model.Trade) error {
	valueStrings := make([]string, 0, len(trades))
	valueArgs := make([]interface{}, 0, len(trades)*6)
	i := 0
	for _, trade := range trades {
		valueStrings = append(valueStrings, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d)", 1+i*6, 2+i*6, 3+i*6, 4+i*6, 5+i*6, 6+i*6))
		valueArgs = append(valueArgs, trade.TID)
		valueArgs = append(valueArgs, trade.Market)
		valueArgs = append(valueArgs, trade.Price)
		valueArgs = append(valueArgs, trade.Amount)
		valueArgs = append(valueArgs, trade.OrderType)
		valueArgs = append(valueArgs, trade.Timestamp)
		i++
	}
	stmt := fmt.Sprintf(sqlInsertTrades, strings.Join(valueStrings, ","))
	_, err := s.db.Exec(stmt, valueArgs...)
	return err
}

Data usually looks like (12345, 'btc-usd', 0.01, 0.01, 'buy', 1534237232981000000)

Currently, i solve the error with the following func in a deferred call, interestingly the func is executed directly after the error with exactly the same insert, and it does not fail at all (did not reach the single insert block yet). It seems the error can only occur in a very small time window.

defer func() {
    sub.Unsubscribe()
    log.Info().Msgf("worker: tear down: %s trades left: %d", w.Exchange, len(trades))
    err = w.Store.InsertTradesMap(trades)
    if err != nil {
        log.Error().Msgf("worker %s: insert batch error: %s", w.Exchange, err.Error())
        for _, t := range trades {
            err = w.Store.InsertTrade(t)
            if err != nil {
                log.Error().Msgf("worker %s: insert single error: %s", w.Exchange, err.Error())
            }
        }
    }
}()

I observed the error being thrown later and less often when fewer pods are running concurrently and inserting the same data (so instead of 3 workers inserting 6 data points each, 2 workers inserting 9 data points each). If i start more pods it is likely to occure earlier and more often. The error first appeared after the update to v2.1.0-alpha.20180730.

Concerning the actual latency issue:

Since the latency got better with the update, i am not quite sure whether this error is somehow related.
Still, adding more pods increases the latency drastically. Before the update, two pods were already causing problems, now they running fine, but as soon as i add a third one the latency breaks out shortly (2-3 seconds) and comes down again (staying at P99 > 1 second). Adding even more pods does not make a different (the query count is not going up due to the steady amount of data points coming in). The amount of data points lies between 500 and 1000 every second (which are distributed to workers across pods). I directly went with the batch approach, since i am still at the process of scaling up to more data sources. I thought a single insert approach wouldn't be able to keep up. I am especially curious how crdb reacts here in face of time series usage

Do you suggest to do single inserts instead, despite the rising amount of data points? I trust in shifting the conflict resolution to the db system, but i might be better off doing it upfront. Right now, i am fine, but i fear worse.

cgebe · 2018-08-15T08:41:07Z

I experienced a sudden drop in latency, due to growing count of ranges? With two pods, i currently have a P99 < 150ms and P50 at 50ms.

asubiotto · 2018-09-14T18:36:30Z

From #30164. The latest beta version is v2.1.0-beta.20180910.

Yes, i updated to the latest beta version. However, i am still facing spikes and high latencies with my batch insert strategy. When switching to single inserts the problem gets worse, even having inserts not going through at all (no error and no records inserted). I can only imagine that the amount of databases i am inserting to (~20) is the real issue here. I am still at investigating the impact of different insert strategies, but no luck so far.

cgebe · 2018-09-15T11:10:09Z

I tried out the following:

INSERT ON CONFLICT DO NOTHING (Batch)
INSERT ON CONFLICT DO NOTHING (Single)
INSERT with ignoring duplicate key errors (Batch)
INSERT with ignoring duplicate key errors (Single)

All strategies leading to query times between 3s and 15s for one single query. I stopped all my parallel insert jobs and tried to execute a single insert with a single client and also getting 4s as query time. This can only mean that my cluster is in a corrupted state. Is there a way to repair my cluster state? my debug zip: https://transfer.sh/bzm6Q/log.zip looking at problemranges i see a few thousand ranges in "no_raft_leader_range_ids" where i have a total of 4.5k ranges. This is probably the issue induced by parallel batch inserts. Can i quickly solve this somehow?

cgebe · 2018-09-15T15:36:30Z

Turning kv.raft_log.synchronize to false apparently solves the issue of high latencies for me. Otherwise, my use case of continuously writing time series data is not realizable with cockroachdb. In face of #19784 maybe interesting.

tbg added the A-coreperf label Aug 13, 2018

tbg assigned nvanbenschoten Aug 13, 2018

tbg added this to the 2.1 milestone Aug 13, 2018

knz added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. labels Aug 31, 2018

cgebe mentioned this issue Sep 13, 2018

Batch insert causes node failures - attempting to finishSQLTxn(), but KV txn is not finalized / huge spikes after node rejoin #30164

Closed

cgebe mentioned this issue Sep 16, 2018

panic: tocommit(691243) is out of range [lastIndex(691183)] causes CrashLoopBackOff #30260

Closed

petermattis removed this from the 2.1 milestone Oct 5, 2018

nvanbenschoten removed the A-coreperf label Oct 12, 2018

nvanbenschoten closed this as completed Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSERT ON CONFLICT causes huge latency when executed in parallel #28487

INSERT ON CONFLICT causes huge latency when executed in parallel #28487

cgebe commented Aug 11, 2018 •

edited

Loading

cgebe commented Aug 11, 2018 •

edited

Loading

petermattis commented Aug 11, 2018

cgebe commented Aug 11, 2018

tbg commented Aug 13, 2018

cgebe commented Aug 13, 2018 •

edited

Loading

nvanbenschoten commented Aug 14, 2018

cgebe commented Aug 14, 2018 •

edited

Loading

cgebe commented Aug 15, 2018

asubiotto commented Sep 14, 2018

cgebe commented Sep 15, 2018 •

edited

Loading

cgebe commented Sep 15, 2018 •

edited

Loading

INSERT ON CONFLICT causes huge latency when executed in parallel #28487

INSERT ON CONFLICT causes huge latency when executed in parallel #28487

Comments

cgebe commented Aug 11, 2018 • edited Loading

cgebe commented Aug 11, 2018 • edited Loading

petermattis commented Aug 11, 2018

cgebe commented Aug 11, 2018

tbg commented Aug 13, 2018

cgebe commented Aug 13, 2018 • edited Loading

nvanbenschoten commented Aug 14, 2018

cgebe commented Aug 14, 2018 • edited Loading

cgebe commented Aug 15, 2018

asubiotto commented Sep 14, 2018

cgebe commented Sep 15, 2018 • edited Loading

cgebe commented Sep 15, 2018 • edited Loading

cgebe commented Aug 11, 2018 •

edited

Loading

cgebe commented Aug 11, 2018 •

edited

Loading

cgebe commented Aug 13, 2018 •

edited

Loading

cgebe commented Aug 14, 2018 •

edited

Loading

cgebe commented Sep 15, 2018 •

edited

Loading

cgebe commented Sep 15, 2018 •

edited

Loading