Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSERT ON CONFLICT causes huge latency when executed in parallel #28487

Closed
cgebe opened this issue Aug 11, 2018 · 11 comments
Closed

INSERT ON CONFLICT causes huge latency when executed in parallel #28487

cgebe opened this issue Aug 11, 2018 · 11 comments
Assignees
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@cgebe
Copy link

cgebe commented Aug 11, 2018

Is this a bug report or a feature request?

bug

BUG REPORT

I have a cockroach cluster running inside kubernetes with 3 nodes. They are running in the same data center on 3 different machines.

Now, i have 3 pods (distributed across the machines), each running 3 workers inside. One worker is inserting data points via batch inserts into 1 specific database (3 databases in total) every second (usually its about 50 data points per INSERT). The data points inserted by 1 worker into 1 database can be in conflict with data points (the same points) inserted by another worker of another pod. In addition, one worker should be able to insert duplicate data points (i solved that issue already by discarding duplicates upfront due to to #21360), however there are still conflicts across workers from different pods which immensely impact the insert time ~10 seconds.

This issue only occurs when i start the pods at the very same time via the deployment. When i manually scale up the deployment, the insert time frames between pods are shifted therefore not going into conflict and keeping the latency in acceptable frames.

  • Which version of CockroachDB are you using?

v2.0.4

  • What did you do?

Started INSERT ON CONFLICT batch queries with ~50 data points from different pods.

  • What did you expect to see?

The latency staying normal and conflicts not having that much of an impact.

  • What did you see instead?

crdbbug

Until 10:00 i had the three pods running which i scaled up manually. At 10:00 i restarted the whole deployment leading to all worker nearly inserting at the same point in time every second

  • What was the impact?

Queries taking too long

@cgebe
Copy link
Author

cgebe commented Aug 11, 2018

just saw: #28461 related?

@petermattis
Copy link
Collaborator

@cgebe #28461 is possibly related, but more likely you're hitting problems with conflicting operations in 2.0. Significant performance improvements for conflict operations have been made in 2.1 (not yet released). If it is possible, can you retest using the latest 2.1 alpha?

@cgebe
Copy link
Author

cgebe commented Aug 11, 2018

hey @petermattis , i updated and scaled the pods up again manually:

crdbbug2

First part is one pod, the second pod did not have that much of an impact, however when adding the third one the latency went over 2 seconds again, finally i removed one pod again, and latency going down again where it would be okay.

During the execution of 3 pods i got the following TX error:

pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)

A performance increase is definitely noticeable.

@tbg tbg added the A-coreperf label Aug 13, 2018
@tbg tbg added this to the 2.1 milestone Aug 13, 2018
@tbg
Copy link
Member

tbg commented Aug 13, 2018

@nvanbenschoten for triage. @andreimatei, you might be interested in the TransactionStatusError.

@cgebe
Copy link
Author

cgebe commented Aug 13, 2018

Log excerpt, the number is the record count to be inserted. (if you need more details, let me know)

{"level":"error","time":"2018-08-13T13:21:05Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:05Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:08Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:08Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:12Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:18Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:21Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:21Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:23Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:23Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:24Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:24Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:25Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:28Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:28Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:30Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:30Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:31Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:31Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:32Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:32Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:34Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:34Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:35Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:35Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:36Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:36Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:38Z","message":"12"}
{"level":"error","time":"2018-08-13T13:21:46Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:46Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:47Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:47Z","message":"6"}
{"level":"error","time":"2018-08-13T13:21:52Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:21:52Z","message":"12"}
{"level":"error","time":"2018-08-13T13:22:04Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:04Z","message":"6"}
{"level":"error","time":"2018-08-13T13:22:09Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:09Z","message":"6"}
{"level":"error","time":"2018-08-13T13:22:23Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:23Z","message":"12"}
{"level":"error","time":"2018-08-13T13:22:27Z","message":"pq: TransactionStatusError: txn record not found (REASON_TXN_NOT_FOUND)"}
{"level":"error","time":"2018-08-13T13:22:27Z","message":"6"}

@nvanbenschoten
Copy link
Member

Hi @cgebe, thanks for the report. First off, do you mind sharing an example of what the queries you're issuing look like? You mentioned that you're running INSERT ON CONFLICT statements, but I'd like to get a better idea of what the conflict resolution expression is so that I can try to reproduce what you're seeing.

One way that we can try to isolate where the issues lie is to perform the exact same experiment but without any conflicts. Peter mentioned that we have made large improvements in 2.1 to address contention, but I still suspect that its causing issues here.

Another thing we should try is dropping each statement down to only operating on a single data point. Batching is a recommended optimization to improve throughput in cases where there are no conflicts, but it's less clear of a win when there are. Why don't we try reducing the statements to all only operate on a single data point at first and see if we still observe large latency spikes when adding in the third pod.

The TransactionStatusError errors are not expected. Are you only seeing them on 2.1?

@cgebe
Copy link
Author

cgebe commented Aug 14, 2018

hey @nvanbenschoten, the query looks like the following, with a combined primary key:

INSERT INTO trades (tid, market, price, amount, order_type, timestamp) VALUES %s ON CONFLICT (tid, market) DO NOTHING

I am building batch queries this way (map due to #21360):

func (s *Store) InsertTradesMap(trades map[string]model.Trade) error {
	valueStrings := make([]string, 0, len(trades))
	valueArgs := make([]interface{}, 0, len(trades)*6)
	i := 0
	for _, trade := range trades {
		valueStrings = append(valueStrings, fmt.Sprintf("($%d, $%d, $%d, $%d, $%d, $%d)", 1+i*6, 2+i*6, 3+i*6, 4+i*6, 5+i*6, 6+i*6))
		valueArgs = append(valueArgs, trade.TID)
		valueArgs = append(valueArgs, trade.Market)
		valueArgs = append(valueArgs, trade.Price)
		valueArgs = append(valueArgs, trade.Amount)
		valueArgs = append(valueArgs, trade.OrderType)
		valueArgs = append(valueArgs, trade.Timestamp)
		i++
	}
	stmt := fmt.Sprintf(sqlInsertTrades, strings.Join(valueStrings, ","))
	_, err := s.db.Exec(stmt, valueArgs...)
	return err
}

Data usually looks like (12345, 'btc-usd', 0.01, 0.01, 'buy', 1534237232981000000)


Currently, i solve the error with the following func in a deferred call, interestingly the func is executed directly after the error with exactly the same insert, and it does not fail at all (did not reach the single insert block yet). It seems the error can only occur in a very small time window.

defer func() {
    sub.Unsubscribe()
    log.Info().Msgf("worker: tear down: %s trades left: %d", w.Exchange, len(trades))
    err = w.Store.InsertTradesMap(trades)
    if err != nil {
        log.Error().Msgf("worker %s: insert batch error: %s", w.Exchange, err.Error())
        for _, t := range trades {
            err = w.Store.InsertTrade(t)
            if err != nil {
                log.Error().Msgf("worker %s: insert single error: %s", w.Exchange, err.Error())
            }
        }
    }
}()

I observed the error being thrown later and less often when fewer pods are running concurrently and inserting the same data (so instead of 3 workers inserting 6 data points each, 2 workers inserting 9 data points each). If i start more pods it is likely to occure earlier and more often. The error first appeared after the update to v2.1.0-alpha.20180730.


Concerning the actual latency issue:

Since the latency got better with the update, i am not quite sure whether this error is somehow related.
Still, adding more pods increases the latency drastically. Before the update, two pods were already causing problems, now they running fine, but as soon as i add a third one the latency breaks out shortly (2-3 seconds) and comes down again (staying at P99 > 1 second). Adding even more pods does not make a different (the query count is not going up due to the steady amount of data points coming in). The amount of data points lies between 500 and 1000 every second (which are distributed to workers across pods). I directly went with the batch approach, since i am still at the process of scaling up to more data sources. I thought a single insert approach wouldn't be able to keep up. I am especially curious how crdb reacts here in face of time series usage

Do you suggest to do single inserts instead, despite the rising amount of data points? I trust in shifting the conflict resolution to the db system, but i might be better off doing it upfront. Right now, i am fine, but i fear worse.

@cgebe
Copy link
Author

cgebe commented Aug 15, 2018

I experienced a sudden drop in latency, due to growing count of ranges? With two pods, i currently have a P99 < 150ms and P50 at 50ms.

@knz knz added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. labels Aug 31, 2018
@asubiotto
Copy link
Contributor

From #30164. The latest beta version is v2.1.0-beta.20180910.

Yes, i updated to the latest beta version. However, i am still facing spikes and high latencies with my batch insert strategy. When switching to single inserts the problem gets worse, even having inserts not going through at all (no error and no records inserted). I can only imagine that the amount of databases i am inserting to (~20) is the real issue here. I am still at investigating the impact of different insert strategies, but no luck so far.

@cgebe
Copy link
Author

cgebe commented Sep 15, 2018

I tried out the following:

  1. INSERT ON CONFLICT DO NOTHING (Batch)
  2. INSERT ON CONFLICT DO NOTHING (Single)
  3. INSERT with ignoring duplicate key errors (Batch)
  4. INSERT with ignoring duplicate key errors (Single)

All strategies leading to query times between 3s and 15s for one single query. I stopped all my parallel insert jobs and tried to execute a single insert with a single client and also getting 4s as query time. This can only mean that my cluster is in a corrupted state. Is there a way to repair my cluster state? my debug zip: https://transfer.sh/bzm6Q/log.zip looking at problemranges i see a few thousand ranges in "no_raft_leader_range_ids" where i have a total of 4.5k ranges. This is probably the issue induced by parallel batch inserts. Can i quickly solve this somehow?

@cgebe
Copy link
Author

cgebe commented Sep 15, 2018

Turning kv.raft_log.synchronize to false apparently solves the issue of high latencies for me. Otherwise, my use case of continuously writing time series data is not realizable with cockroachdb. In face of #19784 maybe interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

6 participants