Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

backup: add retry when tikv is down #508

Merged
merged 11 commits into from
Sep 22, 2020
Merged

Conversation

3pointer
Copy link
Collaborator

What problem does this PR solve?

reset connection when tikv is down during backup.

What is changed and how it works?

add reset function for client.Mgr

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Related changes

  • Need to cherry-pick to the release branch

Release Note

  • No release note

@IANTHEREAL
Copy link
Collaborator

Good Job! LGTM

pkg/conn/conn.go Show resolved Hide resolved
pkg/backup/client.go Show resolved Hide resolved
@3pointer
Copy link
Collaborator Author

/run-integration-test

@3pointer
Copy link
Collaborator Author

/run-all-tests

@YuJuncen
Copy link
Collaborator

LGTM

@ti-srebot ti-srebot added the status/LGT1 LGTM1 label Sep 21, 2020
Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Comment on lines 68 to 69
log.Warn("reset the connection in push", zap.Uint64("storeID", storeID))
time.Sleep(3 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should not be placed in the reset function. You can place them outside.

@ti-srebot ti-srebot removed the status/LGT1 LGTM1 label Sep 22, 2020
@ti-srebot ti-srebot added the status/LGT2 LGTM2 label Sep 22, 2020
@3pointer 3pointer merged commit cde9f30 into pingcap:master Sep 22, 2020
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 failed

3pointer added a commit to 3pointer/br that referenced this pull request Sep 22, 2020
* backup: add retry when tikv is down

* fix build

* address comment && refine some log code

* refine reset code

* add cancel error to retry

* remove cancel error retry

* add integration test

* add failpoint to test

* fix build

* fix failpoint test

* address comment
3pointer added a commit that referenced this pull request Sep 23, 2020
* backup: add retry when tikv is down

* fix build

* address comment && refine some log code

* refine reset code

* add cancel error to retry

* remove cancel error retry

* add integration test

* add failpoint to test

* fix build

* fix failpoint test

* address comment
overvenus pushed a commit to overvenus/br-1 that referenced this pull request Dec 29, 2020
…tream (pingcap#508)

* backend,restore: duplicate more important system variables from downstream

* backend: fix test failure

* tests: enhance the gencol test case to include sysvar-dep exprs

* tests/auto_random_default: relax the check

* tests/generated_columns: add a retry loop to ensure sys vars are changed

* backend: fix invalid gencol sort algorithm

* tests/generated_columns: disable the week test since tidb is buggy

Co-authored-by: glorv <[email protected]>
overvenus added a commit that referenced this pull request Mar 3, 2021
* restore: write index kv pairs and data kv pairs to separate engine (#132)

* restore: write index kvs and data kvs to seperate engine

* restore: use single engine file to store index

* restore: make index engine limited by table concurrency

* restore: modify checkpoint proto

* restore: implement checkpoint for index engine file

* tests: add failpoint for CheckpointStatusIndexImported

* Support CSV (#111)

* loader: recognize CSV files

* *: generalize the parser interface

* mydump: added a CSV parser

* mydump,restore: enable CSV parser

* tests: added integration test for CSV

* mydumper: added test case for empty CSV

* config: improved description

* restore: fixed a compile error on Go 1.12

* update version of tidb to latest release-2.1 (#138)

update tidb version to latest release2.1, and add a test case for issue https://github.com/pingcap/tidb/issues/9532

* tidb-lightning-ctl: added --import-engine and --cleanup-engine commands (#125)

* *: parameter type fix (#141)

* remove changelog and get change logs from release (#146)

* test: fix cleanup script to enable re-run integration test locally (#142)

Currently, cleanup script in integration test does not work, if the developer wants to re-run integration test in his/her local environment, he/she has to clean /tmp/lightning_test_result manually

* restore: fix #140 lightning breaks on tidb-master (#147)

* fix deadlock and goroutine leak in chunk restore (#149)

* restore: eager release index engine worker after imported (#150)

* checkpoint: fix checkpoint not updated in some scenario (#151)

Test `checkpoint_error_destroy` in integration got error randomly, both in local environment and JenkinsCI.
Lightning could exit before all checkpoints are saved, as the `WaitGroup` is not always 
added before this line https://github.com/pingcap/tidb-lightning/blob/master/lightning/restore/restore.go#L204 returns

1. After `RestoreController.Run` is returned, no more messages will be sent to `rc.saveCpCh`, so we can close it safely and ensure all `saveCp` are consumed
2. remove some unused code

* Revert "fix deadlock and goroutine leak in chunk restore (#149)" (#152)

This reverts commit 852c5e954877692035efe5c3e44ddcf8f915d90b.

* restore: add local checksum log (#153)

* restore: add local checksum log

* Update lightning/restore/restore.go

Co-Authored-By: lonng <[email protected]>

* tests: add kill lightning test in checkpoint_chunks case (#158)

use gofail to control lightning to kill itself after one chunk is imported

* *: parse the data source directly into data and skip the KV encoder (#145)

* *: parse the data source directly into data and skip the KV encoder

This skips the more complex pingcap/parser, and speeds up parsing speed
by 50%.

We have also refactored the KV delivery mechanism to use channels
directly, and revamped metrics:

- Make the metrics about engines into its own `engines` counter. The
  `tables` counter is exclusively about tables now.
- Removed `block_read_seconds`, `block_read_bytes`, `block_encode_seconds`
  since the concept of "block" no longer applies. Replaced by the
  equivalents named `row_***`.
- Removed `chunk_parser_read_row_seconds` for being overlapping with
  `row_read_seconds`.
- Changed `block_deliver_bytes` into a histogram vec, with kind=index
  or kind=data. Introduced `block_deliver_kv_pairs`.

* tests,restore: prevent spurious error in checkpoint_chunks test

Only kill Lightning if the whole chunk is imported exactly. The chunk
checkpoint may be recorded before a chunk is fully written, and this will
hit the failpoint more than 5 times.

* kv: use composed interface to simplify some types

* kv: properly handle the SQL mode

* common: disable IsContextCanceledError() when log level = debug

This helps debugging some mysterious cancellation where the log is
inhibited.

Added IsReallyContextCanceledError() for code logic affected by error
type.

* restore: made some log more detailed

* restore: made the SlowDownImport failpoint apply to index engines too

* restore: do not open a write stream when there are no KV pairs to send

* tests: ensure we drop the checkpoints DB before re-run

* mydump: fixed various off-by-one errors in the CSV parser

* *: rename `!IsContextCanceledError` to `ShouldLogError`

* *: addressed comments

* restore: zero the checksums and column permutations on initialization

* *: addressed comments

* tests: add back a missing license header

* tests: improve a comment.

* tests: fix a test failure due to conflict between #145 and #158 (#159)

* tests: fix a test failure due to conflict between #145 and #158

* restore: apply the row count limit to failpoint KillIfImportedChunk too

* restore: give priority to small tables for importing (#156)

Put the large table in the front of the slice which can avoid large table take a long time to import and block small table to release index worker.

* config: Allow overriding some config from command line (#157)

* config: remove deprecated -compact, -switch-mode flags from tidb-lightning

They can still be called from tidb-lightning-ctl.

* config: search also conf/tidb-lightning.toml for default -config path

Fallback to standard default if -config is not supplied

* config: provide command line arguments for some common options

* config: stop searching for tidb-lightning.toml if --config is unspecified

* go.mod: upgrade dependencies, esp TiDB -> v3.0.0-beta.1 (#160)

* Fix interpretation of integers in a BIT column (#161)

* tests: fix existing test failure

* mydump: fixed conversion of integers into bits

We need to create a special branch for integers, since casting 123 and
'123' into BIT type behave differently.

Also fixed handling of 0x/0b bit strings since Ragel doesn't recognize
'+' in a regex -_-.

* mydump: store description of `token` in an array instead of switch cases

* tests: test behavior of integers for ENUM and SET types as well

* *: replace gofail with new failpoint implementation (#165)

* *: use pingcap/log (zap) for logging  (#162)

* *: use pingcap/log (zap) for logging

Some redundant logs (e.g. logging about the same thing inside and outside
a function) are removed.

The {QueryRow,Transact,Exec}WithRetry functions are revamped to include
the logger.

* common,config: addressed comments

* *: addressed comments

* restore,verification: addressed comments

* main: sync log before exit + update failpoint dep (#168)

* kv: fix handling of column default values (#170)

* kv: fix handling of column default values

* if the column is AUTO_INCREMENT, fill in with row_id (assume it is
  missing for the entire table instead of just a few values)
* if the column has DEFAULT, fill in that value

* tests: ensure DEFAULT CURRENT_TIMESTAMP works

* tests,restore: re-enable the exotic_filenames test (#172)

* config: reduce default table-concurrency from 8 to 6 (#175)

Ensures table + index <= the default max-open-engine which is 8.

* Support table routing rules (merging sharded tables) (#95)

* config: added [[routes]] config

* mydump, restore: support table routing

* tests: added test case for table routing

* mydump: ensure rerouted schemas will not be created

* config: allows routes to be case-sensitive

* restore: replace CREATE TABLE -> IF NOT EXISTS using tidb/parser

* mydump/loader_test: add unit test for route() and refactor

* tests: TiDB doesn't support `DECIMAL(20, 0)`.

* tests: workaround pingcap/parser#310

* tests: removes the emoji from a test database name (#179)

The character is too exotic and breaks TiDB and some old git.

* *: fix failpoint-ctl path, unify failpoint runtime and ctl to same version (#180)

* restore: fix the potential null pointer exception when logging progress (#178)

If no files are completely imported within the first 5 minutes, we get
`finished == 0` and the logger will try to log a nil field and crashes.

* kv,restore: log which value caused conversion failure (#154)

* kv,restore: log row content and column info on failure

* restore: log a warning if a column is missing

* restore: retry if deliver KVs to importer (#176)

* config: automatically discover tidb.pd-addr and tidb.port if not provided (#173)

* config: automatically discover tidb.pd-addr and tidb.port if not provided

* config: error if port/pd-addr is still wrong after adjust

* config: use tidb/config.Config instead of a custom struct

* config: remove recognition of AdvertiseAddress

TiDB team says it is useless and TiDB won't work with port-forwarding.

* *: added linters (#183)

* Makefile: added `make check` to perform static linting

Moved the failpoint tool into `tools/bin` for consistency.

Renamed the phony targets `failpoint-{enable,disable}` to
`failpoint_{enable,disable}` for consistency.

Renamed the `install-failpoint` target to the real path name.

* *: fix gofmt suggestions

* go.mod,Makefile: unify the two failpoint versions

Added a script to ensure they never differ.

* Makefile: fixed the scope of golangci-lint

* common: improve unit test coverage of 'common' package (#186)

* common: improve unit test coverage of 'kv' package (#187)

* Make the parsers stricter, and improve unit test coverage of `mydump` package (#185)

* lightning: improve unit test coverage of 'lightning' package (#188)

* README: update coverage status badge (#189)

* config: improve unit test coverage of 'config' package (#192)

Signed-off-by: Lonng <[email protected]>

* Add unit tests for kv/importer and restore/checkpoints, plus some bug fixes (#191)

* kv: add unit test for importer (based on a mocked gRPC client)

* restore: remove the unnecessary SHOW CREATE TABLE calls

We can now reconstruct the table info directly from the HTTP reply, so the
SHOW CREATE TABLE results are now useless. Better drop them.

* restore: add unit test for tidb.go

* common: ensure sqlmock errors are not retryable

* restore: fix error where checkpoint status of index engine is not updated

Also, made the WholeTableEngineID constant public.

* restore: rename a confusing variable

* restore: fix bug where --checkpoint-error-destroy=all skips index engine

* restore: prevent NPE when getting missing table from file checkpoint

* restore: add unit tests for checkpoints

* kv: address importer comments

* kv: also exposes the mock Importer constructor to other tests

* Add unit tests for 'restore.go' (TableRestore and chunkRestore) (#193)

* go.mod: update dependencies (#197)

* config,lightning: Implements server mode  (#198)

* test: speed up TestGetJSON

Force a shorter timeout on the HTTP client, so that accessing
`http://not-exists` won't take 30 seconds.

* config,lightning: implement "server mode"

In "Server Mode" Lightning will wait for tasks submitted via the HTTP API
`POST /tasks`, and will keep running until Ctrl+C. Multiple tasks are
executed sequentially.

The config is split into "Global config" and "Task config", which shares
the same structure for compatibility and simplicity.

The pprof-port setting has been deprecated in favor of status-addr, for
compatibility with other tools.

* lightning,config: cover some of the new code

* lightning: added `GET /tasks` API to get number of queued tasks

* *: addressed comments

* config,lightning: use a linked hash map to store queued configs

Changed /task to return JSON.

This is to prepare for an API removing a queued task, and also to remove
the artificial task queue size limit.

* config: change TaskID to record the current timestamp

* go.mod: update dependencies (#200)

* Post restore config fix (#202)

* fix the mistake in lightning config file

* fix typo

* remote the "also" in description

* Introduce a basic web interface (#199)

* restore,checkpoints: move checkpoints into its own package

This allows both the "restore" package to import the "web" package, and
allow the "web" package to use "checkpoints", without leading to circular
dependency.

* verification: implemented json.Marshaler for KVChecksum

* *: expose the current import progress to HTTP interface

* common: added "Pauser" synchronization primitive

* lightning: allows status address to reliably use port 0 for testing

* config: ensure AllIDs() return a deterministic order

* lightning,restore: support pausing, moving and deleting tasks through HTTP

Also fixed some goroutine leaks and crashes after canceling.

* common: fixed the bug where checksum is not cancelable

* config: added configlist.{MoveToFront, MoveToBack}

* web,lightning: added a web interface

* web: explain the web interface

* web: added OpenAPI (Swagger) spec of the HTTP API

* common: avoid double-close a channel

The channel may be double-closed given this sequence:

0. [B] p.Pause()
1. [A] p.Wait(ctx), run until the select
2. [B] p.Resume(), run until the for loop
3. [C] cancel the ctx
4. [A] continue from select, and close the channel
5. [B] continue the for loop, using the old copy of waiters, it will close
       the channel again, causing double-close error.

We just avoid closing the waiter when ctx expired.

* common: added a test to check for contended pause/resume flip

* common: fixed a potential race condition

* verification: change JSON field of checksum from cksum to checksum

* web: document the OpenAPI def and why we don't support webpack-dev-server

Fixed a potential typing error (see TypeStrong/atom-typescript#1053).

* config: prevents task ID conflict which may happen with a coarse clock

* restore: prevent encodeLoop panicking if deliverResult is closed?

* checkpoints,lightning: address comments

* Update README.md (#207)

* Improve errors and logs on syntax error / conversion failure (#201)

* mydump: ensure syntax error won't log more than 256 bytes of content

* kv: log only the affected column on cast failure

Also annotate the returned error by the failed value

* restore: annotate encode failure with the current file being processed

* checkpoints: reduce log level of missing checkpoint file from warn to info

* tests,web: exclude these directories from the Go module (#209)

* lightning/restore: fix ColumnPermutation calculation (#210)

Signed-off-by: Lonng <[email protected]>

* build(deps): bump lodash from 4.17.11 to 4.17.14 in /web (#213)

* build(deps): bump lodash from 4.17.11 to 4.17.14 in /web

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.11 to 4.17.14.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.11...4.17.14)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <[email protected]>

* web: execute `npm audit fix`

* web: update corresponding Go code

* *: adjust solution for TOOL-1420 and add a test case (#214)

Move the ToLower operation from (*TableRestore).initializeColumns() to
(*ChunkParser).ReadRow(). In fact this is the same as the CSV parser.

* tests: add test case for simple partitioned tables (#206)

* checkpoints: remove node_id field and rename the schema on keep-after-success (#208)

* lightning: reduce the chance of spurious error in web server

* checkpoints: remove node_id

We intend to separate checkpoints from multiple nodes into different
schemas instead.

Since one node now owns the entire schema, when removing all checkpoints
we just drop the entire schema. We also changed the file driver to delete
the file instead of just emptying the content.

* lightning: set the task ID even outside server mode

* checkpoints,restore: move the checkpoints database on keep-after-success

The schema is renamed as `*.{taskID}.bak`.

* checkpoints: addressed comments

* config: attempt to solve TOOL-1405 and modify old test cases (#217)

* remove outdated target of Makefile

* check unused toml keys

* remove outdated toml config keys that block

* config: improve readablity from review suggestions

* config: improve readability

* Update lightning/config/config.go

Co-Authored-By: kennytm <[email protected]>

* restore: fix gc life time not recovered after table restore (#218)

* restore: fix gc life time not recovered after table restore

* empty commit to refresh cla

* address comment

* address comment

* address comment

* fix ci

* *: abstract the Importer communication into an interface (#215)

* common: allow retry on ErrWriteConflictInTiDB (mysql error 8005)

* *: perform SwitchMode and Compact directly through ImportSSTService

This allows us to perform these actions without relying on Importer.

* *: abstracted *kv.Importer into kv.Backend

* common,kv: fix comments

* kv,restore: addressed comments

* restore: fix comments

* restore: update and restore GCLifeTime once when parallel (#220)

* restore: update and restore GCLifeTime once when parallel

* Update lightning/restore/restore.go

Co-Authored-By: amyangfei <[email protected]>

* Update lightning/restore/restore.go

Co-Authored-By: amyangfei <[email protected]>

* Update lightning/restore/restore.go

Co-Authored-By: amyangfei <[email protected]>

* restore: fix bug in reviewing

* restore: call ObtainGCLifeTime closer to use

* restore: improve readabiilty

* restore: reduce struct copy and pointer deref

* restore: fix bug in reviewing

* Update lightning/restore/restore.go

Co-Authored-By: kennytm <[email protected]>

* restore: improve readability

* restore: refine mock expectation order

* restore: adjust detail

* *: support MySQL backend (#221)

* *: support MySQL backend

* config: use a constant for backend and checkpoint.driver

* kv: address comments

* backend: rename `kv` package to `backend` package

* config: always skip the system databases (#225)

* backend: update mysql backend to tidb backend (#228)

* update mysql backend to tidb backend

* fix test

* lightning/common: add unit test (#229)

* add unit test

* add unit test

* Update lightning/common/util_test.go

Co-Authored-By: kennytm <[email protected]>

* Update lightning/common/util_test.go

Co-Authored-By: kennytm <[email protected]>

* mock: update rpc endpoint (#226)

* cmd: do not exit(1) if failed to sync log (#230)

* mock: update rpc endpoint (#234)

* backend: dynamically calculate the maximum auto-inc ID (#227)

This makes sure the final AUTO_INCREMENT value is exactly the maximum value
of the auto-inc column. Fix #222.

* checkpoint: fix empty map become nil after unmarshall (#237)

* checkpoint: fix empty map become nil after unmarshall

* checkpoint: unify code style

* checkpoint: remove hard-coded path

* config: increase default concurrency (#244)

* backend/tidb: use REPLACE INTO or INSERT IGNORE INTO to provide idempotent insertion (#243)

* *: use fixed timestamp to ensure import stability wrt CURRENT_TIMESTAMP (#235)

* *: update dependencies (tidb -> 3.0.4) (#246)

* improve the log when encountering invalid checkpoint (#247)

* restore: improve the log when encountering invalid checkpoint

* config: fix typo in CLI

* config: document the new `[tikv-importer] on-duplicate` setting

* config: adding `[tidb] max-allowed-packet` config (#248)

this allows sending rows > 4 MiB when using TiDB backend

* Fix +incompatible suffix not allowed error reported by go mod (#249)

* allow use a separate pauser instead of global pauser (#251)

* Add password as command line argument (#253)

* Add password as command line argument

* Adds -tidb-pwd to config tests

* Changes tidb-pwd to tidb-password cli flag

* metrics: copy the grafana board into this repo (#256)

* metrics: copy the grafana board into this repo

* metrics: add the alert rule too

* *: Update dependencies and fix unit test on Windows (#254)

* go.mod: update Go dependencies

* *: changes code so unit test can be run on Windows

mainly replaces `path.Join` to `filepath.Join`

* web: update web interface dependencies

* tests/exotic_filenames: generate the directory content at runtime

this should make git behave better on windows

* config: synchronize actual default value with the toml file (#255)

* config: synchronize actual default value with the toml file

* config: expose --check-requirements in command line

* lightning: ensure the web interface still works outside server mode (#259)

* Upgrade TiDB to 4.0.0-beta, and recognize @@tidb_row_format_version = 2 (#268)

* *: remove the deprecated kvencoder.KvPair

Create our own struct at `common` package and use those.

* go.mod: update dependencies (TiDB -> 4.0.0-beta)

* backend: support using row format v2

* backend: still needs to implement (*transaction).Get()

* Support TLS; Reduce the need of config.toml in integration tests (#270)

* *: go fmt

* *: support TLS

* tests: enable TLS for all components in the integration test

* tests: specify TLS and most default arguments via command line

refactored the tests so only essential settings remained in config.toml

* config: the default csv.null should be a capital \N not small \n

* security: clone the http.DefaultTransport rather than shallow copy

* tests: break PD retry loop

* backend: fix unit test failure

* tests: replace curl by wget

The `curl` on CI is too old to handle ECC keys. But `wget` somehow works.

* tests: fix test failure

* *: fix comments

* backend: define a reusable BufStore when creating a new session (#274)

* Update Chinese doc url (#276)

* Update chinese doc url

Update chinese doc url

* update English doc link

Co-Authored-By: kennytm <[email protected]>

Co-authored-by: kennytm <[email protected]>

* Set session var for every new conn (#280)

Note the previous `setSessionConcurrencyVars` will set one connection
from the db connection pool, so we can't make sure the session will take
affect later using the `sql.DB`.

* lightning: split large csv file if possible (#272)

* lightning: split large csv file if possible

* gofmt

* gofmt

* unit test

* add unit test

* tiny change

* tiny refine

* fix ci

* remove useless code

* fix ci

* fix ci

* address comments

* go fmt for all

* address comment

* correct the estimateChunkCount

Co-authored-by: kennytm <[email protected]>

* send a batch of kv in encodeLoop (#279)

* send a batch of kv in encodeLoop

* regine test

* address comment

* Fix wrongly pass nil columnNames.

Should get column names after `parser.ReadRow()` that it's setted
while parser the row.

* Update lightning/restore/restore.go

Co-Authored-By: kennytm <[email protected]>

Co-authored-by: kennytm <[email protected]>
Co-authored-by: Jiahao Huang <[email protected]>

* Replace CSV Ragel parser by a hand-written parser copied from encoding/csv (#275)

The Ragel-based CSV parser is much slower than the standard encoding/csv parser.

As explained in #111, the parser was using Ragel to reuse existing framework for simplicity. However as more and more customers start to use Lightning with CSV input, this is the time we need to optimize the things.

The new parser was inspired by encoding/csv but almost nothing remained except the recordBuffer/fieldIndexes members to reduce the amount of allocations. We cannot reuse encoding/csv because:

encoding/csv does not allow us to track the current read position
encoding/csv does not recognize backslash-escaped fields (required for MySQL-generated CSV)
encoding/csv does not support disabling quoting
Implementing these make the new parser still worse than encoding/csv, but much better than the original one.

Parser	TPC-C "CUSTOMER" Row
encoding/csv	1898 ns/op
Original parser	6930 ns/op
New parser	2426 ns/op
So the new parser is 35% of the original parser and encoding/csv is 80% of the new parser. We can investigate how to squeeze out the remaining 20% later.

* backend: fix issue 282 (#283)

* backend: fix all wrong escape of '\Z' as '\x26' (which is '&')

* tests: try to workaround spurious failure of check_requirements

* tests: make the lightning exit detection more precise

* tests: fix test

* optimize the performance of lightning (#281)

* lightning: split large csv file if possible

* gofmt

* gofmt

* unit test

* add unit test

* tiny change

* tiny refine

* fix ci

* remove useless code

* fix ci

* fix ci

* address comments

* go fmt for all

* Replace CSV Ragel parser by a hand-written parser copied from encoding/csv

Conflicts:
	lightning/mydump/csv_parser_generated.go

* fix conflict

* update

* update again

* send a batch of kv in encodeLoop

* use sync.Pool

* Close channel instead of push one entry.

* Use copy instead append

* Fix test and failpoint version

* Reuse slice of record

This expected to avoid about 3.5% of alloc_objects
alloc_objects:
  Total:   773496750  773873722 (flat, cum)  7.18%
    177            .          .           	parser.fieldIndexes = parser.fieldIndexes[:0]
    178            .          .
    179            .          .           	isEmptyLine := true
...
    225    386621314  386621314           	str := string(parser.recordBuffer) // Convert to string once to batch allocations
    226    386875436  386875436           	dst := make([]string, len(parser.fieldIndexes))

* Use pool for mutation

This take most alloc in WriteRows:
    ROUTINE ======================== github.com/pingcap/tidb-lightning/lightning/backend.(*importer).WriteRows in /Users/huangjiahao/go/src/github.com/pingcap/tidb-lightning/lightning/backend/importer.go
     797370418  980241246 (flat, cum)  9.09% of Total
             .          .    155:   kvs := rows.(kvPairs)
    ...
    ...
             .          .    192:   for i, pair := range kvs {
     772641868  772641868    193:           mutations[i] = &kv.Mutation{
             .          .    194:                   Op:    kv.Mutation_Put,
             .          .    195:                   Key:   pair.Key,
             .          .    196:                   Value: pair.Val,
             .          .    197:           }
             .          .    198:   }

* Set GC percent as 500 default

Lightning allocates too many transient objects and heap size is small,
so garbage collections happen too frequently and lots of time is spent in GC component.

In a test of loading the table `order_line.csv` of 14k TPCC.
The time need of `encode kv data and write` step reduce from 52m4s to 37m30s when change
GOGC from 100 to 500, the total time needed reduce near 15m too.
The cost of this is the memory of lightnin at runtime grow from about 200M to 700M, but it's acceptable.

So we set the gc percentage as 500 default to reduce the GC frequency instead of 100.

* Remove MaxKVPairs in Mydump

has been move to Importer part

* Remove outdate code

* Update tidb version

For https://github.com/pingcap/tidb/commit/495f8b74382fb31924b4948374c0dbba6f2d87cd
disable UpdateDeltaForTable if TxnCtx is nil

* Address comment

* Remain append

Co-authored-by: xuhuaiyu <[email protected]>

* Some SwitchMode improvements (#287)

* backend: ignore Unimplemented error in SwitchMode and Compact

* tidb-lightning-ctl: added --fetch-mode subcommand

* go.mod,web: update dependencies (#289)

* go.mod,web: update dependencies

* *: fix unit test failure

* Support store version format generated by `git describe --tags` (#295)

* Support store version format: git describe --tags

Signed-off-by: Tong Zhigao <[email protected]>

* add tests

Signed-off-by: Tong Zhigao <[email protected]>

* Warn for single large file, change switch mode log level to info (#315)

* warn single large file, chan switch mode log level to info

* address comment

* restore: fix typo (#304)

* print lightning log to local file (#313)

* save logs in local file, print only necessary info

* split error stack info and err info in two lines

* *: avoid accessing internal ports when backend=tidb (#312)

* config: remove strict mode from default SQL mode (#316)

* config: remove strict mode from default SQL mode

* tests: fixed typo

* check table id when loading checkpoint (#317)

* add tableID checkpoint check

* update tidb-tools to latest (#319)

* support alter random && update tidb dependency to latest (#324)

* update tidb

* rebase auto random column

* backend: add local kv storage backend to get rid of importer (#326)

* backend: add local backend, try to move importer's sort kv and split & scatter into lightning

* address comment

* use sync.Map

* write to every peer

* add retry on write and ingest

* fix split region

* udpate leveldb config

* use badger

* use pebble

* update pebble config

* restrict concurrency

* use workerPool to restrict concurrency for all engines

* make range concurrency as config

* flush db at close engine

* wip: use sstable to split range

* fast split ranges by encode key and file size

* fix split bug

* use bigEndian to split range

* fix split region bug & ingest lost write meta bug

* add send-kv-pairs config

* fix duplicate write error

* fix checksum in small table

* use go routine to write tikv and ingest

* fix

* update sort

* update sort

* fix

* fix

* fix retry

* update

* fix concurrency bug

* fix retry bug

* fix iter.Next

* try fix checksum mismatch

* fix deadlock

* update

* fix

* optimize memory usage

* add checkpoint for local mode

* fix checkpoint

* do not write chunk checkpoint in local mode

* fix remote checkpoint

* fix checkpoint

* manually destroy checkpoint

* only flush index if checkpoint is on

* remove some useless code

* format code

* fix unit test

* fix test

* fix

* fix tls

* fix review comment

* add c comment for local.Close

* add some comment

* fix local backend checkpoint

* fix unit test

* refine some test with local backend

* address comment

* fix close engine

* checkpoint integrateion_test for local backend

* try fix

* return nil if engine not exist in CloseEngine and ImportEngine

* address comment

* test localbackend checkpoint

* adjust config to save coverage

* fix review comments

* change ParseIndexKey method

* remove saveCpChan  channel buf

* add test to save coverage

* test ingest failed

* inject failpoint before dataengine importer

* fix review comments

* fix test format

Co-authored-by: luancheng <[email protected]>

* make lightning compatible with allow_auto_random_explicit_insert (#328)

* fix system error when tidb not support allow_auto_random_explicit_insert

* address comment

* config: update example config file (#331)

* update example config

* fix comment

Co-authored-by: kennytm <[email protected]>

* Fix test cases on release-3.0 (#330)

* tests: allow running a single test

* tests: record the cluster version

* tests: skip 'local' backend tests if cluster is below v4.0.0

* Jenkinsfile: move the CI script into git

* Jenkinsfile: run integration tests in parallel

* tests: allow-auto-random is no longer experimental

* tests: skip local backend better

* Jenkinsfile: moved the file elsewhere

* optimize parse csv and local backend write tikv (#334)

* optimize parse csv and local backend write tikv

* fix

* remove IndexAnyUtf8 because the implement buggy for csv parser

* update pebble and options

* fix checkpoint cleanup (#336)

* lightning: fix web page not showing when not using server mode (#337)

Co-authored-by: Ian <[email protected]>

* config,mydumper: replace black-white-list by table-filter (#332)

* optimize encoder and adjust some config (#338)

* optimize local backend

* update some config

* update batch size

* fix type in config.toml and tidy go.mod

* update tools failpoint

* fix session

* update test config

* reset default batch size for not-local backend

* reset batch-size for example toml

* log: fix log file path (#345)

* fix log path

* support special log path '-' for stdout

* fix local backend index split range (#347)

* add log for environment http proxy setting (#340)

* add log for environment http proxy setting

* fix comments

Co-authored-by: kennytm <[email protected]>

* do not always change auto increment id (#348)

* server: check open file ulimit for local backend (#343)

* check open file ulimit for local backend

* fix comment and add a test

* fix tests

* remove useless comments

* fix

Co-authored-by: Neil Shen <[email protected]>

* restore: do not rebase auto-id or generate auto row id for table with common handle (#349)

* update for common handle

* update

* remove parentless

* Fix verbose log message for shell (#352)

* local: fix batch split retry alway failed error (#356)

* fix split

* only retry failed keys

Co-authored-by: 3pointer <[email protected]>

* backend: fix handling of empty binary literals (#357)

Co-authored-by: glorv <[email protected]>
Co-authored-by: Ian <[email protected]>

* add log when execute statement failed (#359)

* check checkpoint schema (#354)

Co-authored-by: kennytm <[email protected]>

* parser: fix csv parse header with empty line (#364)

* fix csv header

* fix infinite loop

* add a test

* restore: fix missing colum infos when restore from checkpoint (#362)

* fix missing colum infos when restore from checkpoint

* add a test

* fix

* fix

* add failpoint for minDeliveryBytes

* fix type

* fix

* update

* save column permutation when update chunk checkpoint

* fix test

* fix review comments

* update

* local: fix import with common handle (#367)

* fix common handle

* update

* update tidb

* fix test

* fix test

* add new line

* enable cluster index for test common handle

* reset global variable after common handle test

* add check version for common handle test

* restore: don't switch mode in tidb backend (#368)

* avoid run switch mode in tidb backend

* fix comments

* restore: support split csv source file with header (#363)

* restore: support file level routing (#366)

* support file level routings

* deprecate table route

* add router

* set default config

* update default regex

* set default rule enable of no rule is set and update example config file

* remote compression test

* fix parse compression

* update example config

* save file meta to checkpoint

* remove size from checkpoint

* sort source data files by sort key

* fix checkpoint

* use fileInfo instead of SourceFileMeta

* replace file routing in chunk restore by file meta

* remove useless fileRegions

* revert some test package

* update import

* revert changes in loader_test

* update imports

* add some comments and change type name

* fix field extractor to math go regex definition and add some tests

* fix test router pattern

* simplify pattern check and support files.path

* resolve comments

* return error if applyed error is invalid

* remove useless code and update import

* tidb_tools: update dependency (#371)

Co-authored-by: glorv <[email protected]>

* restore: check header columns (#372)

* check csv header columns

* resolve comments

* fix test

* fix unit test

* web: update dependencies (#374)

* backend: update committs from unix timestamp to pd tso (#379)

* update pd dependencies (#380)

* local: return error if write to tikv returns no leader info (#381)

Co-authored-by: kennytm <[email protected]>

* encoder: check string value for tidb encoder (#378)

* valid string value for tidb encoder

* don't check valid utf8 for blob types

* check string according to charset

* fix test and remove useless print

Co-authored-by: kennytm <[email protected]>

* Fix running unit tests on Windows (#375)

* backend: make the rlimit check unix-only

* mydump: always use '/' as file path separator

Co-authored-by: 3pointer <[email protected]>

* checkpoint: verify checkpoint when resume from checkpoint (#376)

* test: change integration test script to allow run tests in parallel (#382)

* backend: split and ingest region size more precise (#369)

* wait checkpoint finished if exit before success (#386)

* restore: support restore from s3  (#361)

* support s3 storage

* make filepath compatible with file url

* update br

* update transaction

* update

* update br

* upda

* fix tests

* fix tests

* fix unit tests

* update

* adjust test config

* revert some redundant changes

* fix

* add a s3 integration test

* fix tests

* resolve comments

* fix unit test

* use c.Mkdir instead of os.TempDir for unit test

* update br

* backend: fix sample when split region size is small (#387)

* fix sample for small value

* fix local sample

* disable cluster index

* add a sleep after set global variables

* decrease SlowDownImport sleep time

* fix test

* backend: use peer address as grpc addr for tiflash store (#392)

* fix tiflash and add a test

* update

* use store.PeerAddress for tiflash

* wait for tiflash longer

* fix test

* update tests README and set longer wait time for tiflash replica

* loader: fix store.WalkDir return inaccurate file size for soft link source files (#394)

* update br and fix move checkpoints

* add integration test for soft link source file

* fix test

* fix test

* use default config

* fix unit test

* fix chunk checkpoint may reset offset and row id (#395)

* make tiflash test more stable (#397)

* test: fix integration test for 3.x version (#390)

* skip run with local backend for v3.x

* add cluster index test for all 3 backends

* fix

* add check kv pairs count for common handle test

* fix test checkpoint_error_destroy

* fix common handle test

* remove useless comments

* fix common handle test

* fix test

* test: make start tiflash optional in integration test (#398)

* make start tiflash optional for v3.0.0 cluster

* fix local backend

* restore: support restore apache parquet format source files (#373)

* backend: fix load partition table with local backend (#402)

* fix load partition table with local backend

* fix log and typo

* fix comment

* fix integration test

* fix parenthesis

* lightning: support dynamically modifying the log level (#393)

change the log level through the HTTP API

Co-authored-by: 山岚 <[email protected]>

* Check Lightning version when reusing checkpoint (#383)

* checkpoints: check lightning version too

* mydump: decrease the log level of file-route related logs

* support new collation for kv encoder (#407)

* support new collation

* add unit test

* remove useless code

* fix multi task

* add a comment

* add a integration for new collation

* fix test

* resolve comments

* mydump: support multi bytes csv delimiter and separator (#406)

* more flexible csv

* fix config and add unit test

* remove useless code

* fix unit test

* use empty string for default quote

* update comments in tidb-lightning.toml for separator and delimiter

Co-authored-by: kennytm <[email protected]>

* backend: always retry ingest and get region if it's retryble (#405)

* retry get region if not region leader available

* alway retry if get region return nil

* always retry if get region returns nil

Co-authored-by: 山岚 <[email protected]>
Co-authored-by: kennytm <[email protected]>

* Add license scan report and status (#399)

Signed off by: fossabot <[email protected]>

Co-authored-by: glorv <[email protected]>
Co-authored-by: kennytm <[email protected]>

* mydump: fix infinite loop in ExportStatement when Read() returns non-EOF (#414)

* lightning: start the HTTP server when receiving SIGUSR1 (#415)

Co-authored-by: glorv <[email protected]>

* backend/tidb: fix issue 410 (#412)

* config: fix error on `-d 'C:\Windows\Path'` (#411)

* local: fix infinity loop in retry get region (#418)

* fix infinity loop in retry get region

* update br and parquet-go to fix #416

* update failpoint

* update mock ExternalStorage

* backend: speed up uploading by open multi TCP connections (#400)

* backend: use uncached gRPC channels

* backend: use connect pool of gRPCs.

* backend: add a conns pool to local backend

* backend: remove some unused logs

* local: make connpool private

* local: make init mutex...

* local: address comments

* post-restore: add optional level for post-restore operations (#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <[email protected]>

* restore: disable some pd scheduler during restore (#408)

* disable some pd scheduler during restore

* fix br interface

* update failpoint for tools

* resolve comments

* update br

* fix context cancel cause restore scheduler failed error

* update br

* log: simplify some warn log and do retry write for epoch not match error (#425)

* simplify some warn log

* fix retry

* fix comment

* remove useless log

* fix test (#426)

* backend: fix a bug about wrong column info (#420)

* fix a bug about wrong column info

* add test

Co-authored-by: kennytm <[email protected]>

* restore: better estimate task remain time progress log (#377)

* optimize progress

* update

Co-authored-by: 3pointer <[email protected]>
Co-authored-by: kennytm <[email protected]>

* checksum: use gc ttl api for checksum gc safepoint in v4.0 cluster (#396)

* use gc ttl for checksum

* fix snapshot ts

* resolve a comment

* fix unit test

* resolve comments

* add a unit test

* split checksum into a separated file and fix comment

* udpate

* backend/tidb: add rebase auto id for tidb backend (#428)

* add rebase autoid for tidb backend

* add fetch auto id and a unit test

* avoiding create checksum manager for tidb backend

* fix unit test

* reset the change auto id code since we can depend the logic in tidb side

* also rebase auto random id

* fix auto random

* fix sql

* fix sql

* don't disable pd schedulers for import backend

* simplify the codes

* fix test

* fix test

* update mock

* fix autoid for v4.0.0 (#430)

* make: hide go.mod to resolve cyclic dependency with tidb (#439)

* config: support encode PostOpLevel and Duration as input (#441)

* config: support unmarshall number as PostOpLevel

* implement MarshalText instead

* Update lightning/config/config_test.go

Co-authored-by: kennytm <[email protected]>

* address comment

Co-authored-by: kennytm <[email protected]>

* restore: fix several bugs related to column permutations (#437)

* fix multi error in column permutation

* add unit test

* add a integration test

* rename test db

* change log

* dep: update uuid dependency to latest google/uuid (#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

* tidb-lightning-ctl: change default of -d to 'noop://' (#453)

also add noop:// to supported storage types (to represent an empty store)

* restore: fix the bug that gc life time ttl does not take effect (#448)

* fix gc ttl loop

* resolve comment and add tests

* config: filter out all system schemas by default (#459)

* backend: fix auto random default value for primary key (#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

* mydumper: fix parquet data parser (#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <[email protected]>

* backend/local: use range properties to optimize region range estimate (#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <[email protected]>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <[email protected]>

* fix pd service id is empty (#460)

* fix s3 parquet reader (#461)

Co-authored-by: Neil Shen <[email protected]>

* fix service gc ttl again (#465)

* mydumper: verify file routing config (#470)

* fix file routing

* remove useless line

* remove redundant if check

* config: allow four byte-size config to be specified using human-readable units ("100 GiB") (#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <[email protected]>

* test: change double type syntax (#474)

* restore: add `glue.Glue` interface and other function (#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

* glue: add GlueCheckpointDB and remove external TiDB usage (#478)

* save my work

add notes

save work

save work

fix unit test

remove tidbMgr in RestoreController

remove some comments

remove some comments

change logger in SQLWithRetry

revert replace log.Logger to *zap.Logger

dep: update uuid dependency to latest google/uuid (#452)

* dep: update satori/go.uuid to latest

* fix tests

* change to google/uuid

* fix build

* try fix test

* get familiar with google/uuid

* address comment

tidb-lightning-ctl: change default of -d to 'noop://' (#453)

also add noop:// to supported storage types (to represent an empty store)

replace tab to space

try another port to fix CI

remove some comment

*: more glue

restore: fix the bug that gc life time ttl does not take effect (#448)

* fix gc ttl loop

* resolve comment and add tests

fix CI

report info to host TiDB

config: filter out all system schemas by default (#459)

backend: fix auto random default value for primary key (#457)

* fix auto generate auto random primary key column

* fix default for auto random primary key

* fix test

* use prev row id for auto random and add a test

* replace chunck with session opt

* fix

* fix

mydumper: fix parquet data parser (#435)

* fix parquet

* reorder imports

* fix test

* use empty collation

* fix a error and add more test cases

* add pointer type tests

* resolve comments

Co-authored-by: kennytm <[email protected]>

address comment

backend/local: use range properties to optimize region range estimate (#422)

* use range propreties to estimate region range

* post-restore: add optional level for post-restore operations (#421)

* add optional level for opst-restore operations

* trim leading and suffix '"

* use UnmarshalTOML to unmarshal post restore op level

* resolve comments and fix unit test

* backend/local: do not retry epochNotMatch error when ingest sst (#419)

* do not retry epochNotMatch error when ingest sst

* add retry ingest for 'Raft raft: proposal dropped' error in ingest

* change some retryable error log level from Error to Warn

* fix nextKey

* add a comment for nextKey

* fix comment and add a unit test

* wrap time.Sleep in select

Co-authored-by: kennytm <[email protected]>

* update

* use range properties to optimze region range estimate

* update pebble

* change the default value for batch-size

* add unit tests and reslove comments

* add a comment to range properties test

* add a comment

* add a test for range property with pebble

* rename const variable

Co-authored-by: kennytm <[email protected]>

fix pd service id is empty (#460)

fix s3 parquet reader (#461)

Co-authored-by: Neil Shen <[email protected]>

fix service gc ttl again (#465)

address comment

mydumper: verify file routing config (#470)

* fix file routing

* remove useless line

* remove redundant if check

rename a method in interface

save work

try fix CI

could work

change ctx usage

try fix CI

try fix CI

refine function interface

refine some fucntion interface

debug CI

address comment

config: allow four byte-size config to be specified using human-readable units ("100 GiB") (#471)

* Makefile: add `make finish-prepare` action

* config: accept human-readable size for most byte-related config

e.g. allow `region-split-size = '96M'` in additional to `= 100663296`

(known issue: these values' precisions will be truncated to 53 bits
instead of supporting all 63 bits)

* restore: reduce chance of spurious errors from TestGcTTLManagerSingle

Co-authored-by: glorv <[email protected]>

remove debug log

test: change double type syntax (#474)

address comment

checkpoint: add glue checkpoint

resolve cycle import

expose Retry

refine

change interface to cope with TiDB

fix SQL string

fix SQL

adjust interface to embedded in TiDB

could import now

reduce TLS

restore: add `glue.Glue` interface and other function (#456)

* save my work

* add notes

* save work

* save work

* fix unit test

* remove tidbMgr in RestoreController

* remove some comments

* remove some comments

* change logger in SQLWithRetry

* revert replace log.Logger to *zap.Logger

* replace tab to space

* try another port to fix CI

* remove some comment

* *: more glue

* report info to host TiDB

* fix CI

* address comment

* address comment

* rename a method in interface

* save work

* try fix CI

* could work

* change ctx usage

* try fix CI

* try fix CI

* refine function interface

* refine some fucntion interface

* debug CI

* address comment

* remove debug log

* address comment

modify code

add comment

refine some code

* address comment

* add some comments

* fix CI and change CREATE TABLE

* *: replace context.Backend with app context (#468)

* replace context.Backend with app context

* remove tls.GetJSON

* rename function name

* fix test

Co-authored-by: lance6716 <[email protected]>

* restore: wait sub task finish before exit (#485)

* wait sub task finish before exit

* add a comment

* mydumper: convert parquet columns to lower case (#479)

* convert parquet columns to lower case

* simplify the for loop

Co-authored-by: lance6716 <[email protected]>

* test: fix an unstable integration test (#492)

* mydumper: optimize parquet reader performance (#482)

* get parquet row count faster

* add a log

* convert parquet columns to lower case

* make calculate file regions in parallel

* load full content to memory for small files

* fix

* fix

* add file size in file meta

* update comment

* update

* fix dead lock when met error

* rename all Size in chunk checkpoint to FileSize

* reset

* restore: let the tikv checksum manager respect the DistSQLScanConcurrency (#483)

Co-authored-by: 3pointer <[email protected]>

* support restore view (#417)

* support restore view

* make router compatible

* fix

* don't genearte test files in run.sh

* execute multi create table stmt in serial

* fix unit test

* fix test

* resolve comments

Co-authored-by: 3pointer <[email protected]>

* backend/local: more robust range retry strategy (#476)

* retry write&ingest range

* more robust retry ranges

* fix

* simplify code and add a log

Co-authored-by: lance6716 <[email protected]>
Co-authored-by: kennytm <[email protected]>

* backend/local: batch split region with batch limit (#487)

* kill tiflash by name (#499)

* .github: let challenge-bot recognize the default SIG (#498)

Co-authored-by: lance6716 <[email protected]>

* backend: support stored generated columns in local/importer backends (#505)

* tests: fixed an invalid failpoint

* backend: support stored generated expressions

* tests: add generated columns test case

* backend/local: check and return iter.Error when pebble is not valid (#497)

* check and return iter.Error when pebble is not valid

* replace errors.Annotatef with errors.Annotate

* fix

* add error check for iter.Last()

Co-authored-by: lance6716 <[email protected]>

* backend,restore: duplicate more important system variables from downstream (#508)

* backend,restore: duplicate more important system variables from downstream

* backend: fix test failure

* tests: enhance the gencol test case to include sysvar-dep exprs

* tests/auto_random_default: relax the check

* tests/generated_columns: add a retry loop to ensure sys vars are changed

* backend: fix invalid gencol sort algorithm

* tests/generated_columns: disable the week test since tidb is buggy

Co-authored-by: glorv <[email protected]>

* backend/local: set pebble db max file limit (#501)

* set pebble db max file limit

* fix test

* fix imports

* reslove review comments

* add GetSystemRLimit for windows

* mydump: fix issue 519 (#521)

* backend/local: remove useless and buggy truncate key (#516)

* restore: apply adjust max-pending-peer-count when stop pd schedulers (#517)

* backend/local: fix next key (#523)

* fix next key

* fix integration test

* backend: import planner/core package to initialize expression.RewriteAstExpr (#526)

* *: add some error description (#527)

* test: fix unstable integration test auto_random_default (#529)

* post-process: support run table analyze after all tables are finished  (#509)

* batch split with limit

* batch split with limit

* update

* add log with split region failed

* set batch split size to 2048

* add delay to retry split region

* set outer loop retry split regions to a bigger value

* update

* add retry for region scatter

* update br

* wait some time before retry scatter region

* add start/end key to log if scan region failed

* update br

* fix session

* work around a panic

* fix unit test

* support analyze at last

* fix

* fix

* fix

* better naming and add some comments

Co-authored-by: lance6716 <[email protected]>

* encode retry split key (#531)

Signed-off-by: glorv <[email protected]>

* restore: fix error lost in create schema (#530)

* mydumper: update br to apply auto retry s3 read error (#533)

* Sort index rather than insert it into skiplist (#520)

* use large writebatch for index engine

Signed-off-by: Little-Wallace <[email protected]>

* pass more data

Signed-off-by: Little-Wallace <[email protected]>

* fix mock

Signed-off-by: Little-Wallace <[email protected]>

* fix data race

Signed-off-by: Little-Wallace <[email protected]>

* fix test

Signed-off-by: Little-Wallace <[email protected]>

* fix max file opens

Signed-off-by: Little-Wallace <[email protected]>

* fix sst path bug

Signed-off-by: Little-Wallace <[email protected]>

* close datawriter

Signed-off-by: Little-Wallace <[email protected]>

* fix err get

Signed-off-by: Little-Wallace <[email protected]>

* flush after index writer close

Signed-off-by: Little-Wallace <[email protected]>

* less memory

Signed-off-by: Little-Wallace <[email protected]>

* fix property

Signed-off-by: Little-Wallace <[email protected]>

* rever deliverLoop encoding

Signed-off-by: Little-Wallace <[email protected]>

* add test

Signed-off-by: Little-Wallace <[email protected]>

* fix test

Signed-off-by: Little-Wallace <[email protected]>

* refactor to reduce copy memory

Signed-off-by: Little-Wallace <[email protected]>

* move function position

Signed-off-by: Little-Wallace <[email protected]>

* fix test

Signed-off-by: Little-Wallace <[email protected]>

* fix sstDir

Signed-off-by: Little-Wallace <[email protected]>

* fix fmt

Signed-off-by: Little-Wallace <[email protected]>

* clear writebatch

Signed-off-by: Little-Wallace <[email protected]>

* do not create sst writer if keys is too small

Signed-off-by: Little-Wallace <[email protected]>

* revert indexEngineID

Signed-off-by: Little-Wallace <[email protected]>

* revert irrelevant changes

Signed-off-by: Little-Wallace <[email protected]>

* fix comment

Signed-off-by: Little-Wallace <[email protected]>

* close when err occurs

Signed-off-by: Little-Wallace <[email protected]>

* add mock method call

Signed-off-by: Little-Wallace <[email protected]>

* *: redact log and error messages, add log-redact parameter (#538)

* add --redact-log parameter and redact sensitive log
* remove sensitive info in error

* mydumper: do not remove more than 1 sep if trim last sep is true (#535)

* restore: add error retry for checksum by tikv (#537)

* add error retry for checksum by tikv

* resolve comments

* add retryable error check

* post-process: allow run checksum at last and restrict the number of checksum jobs (#540)

* restore: don't change TiDB config to support lightning via SQL (#545)

* restore: check row value count to avoid unexpected encode result (#528)

* check row value count to avoid unexpected encode result

* check the '_tidb_row_id' field

* resolve comments

* fix issue related to '_tidb_rowid' and move column count to tidb encoder

* add tidb_opt_write_row_id session var

* fix test

* resolve comments

* update tidb to apply tidb#22062

* fix test

* restore: Try to create tables in parallel (#502)

* restore: try to create tables in parallel

* glue: fix error condition test for db close

* restore, glue: remove duplicate db pool implementation

* restore: try to prevent db connection too early

* restore: try to prevent db connection too early

* restore: try to make restore schema run parallel totally

* restore: remove impl&test of tidb#InitSchema

* restore: make restore schema run parallelly

* restore: remove db connection control

* restore: a little change of restore schema schedule

* wip

* restore: keep restore schema job hold the same session(DB connection)

* restore: fix log message error

* restore: remove purpose array for `restoreSchemaWorker`

more details https://github.com/pingcap/tidb-lightning/pull/502#discussion_r542213517

* restore: remove useless sql mode set code

for more https://github.com/pingcap/tidb-lightning/pull/502#discussion_r542207095

* restore: restore view statements run after database|table created

for more: https://github.com/pingcap/tidb-lightning/pull/502#discussion_r542218427

* restore: interrupt job producing when error happens

* util: add SQLDriver interface

* restore: make sure single restore schema job vs. single db session

* restore: run restore view schema statements in txn

* glue: add checkpoints.Session implementation(sqlConnSession)

* restore: close whole database connections after restore schema done

* restore: revert remove of `InitSchema`

for more: https://github.com/pingcap/tidb-lightning/pull/502#discussion_r542208980

* glue: return a new error when sqlConnSesson.CommitTxn called

* Revert "util: add SQLDriver interface"

This reverts commit 3e2cc16b1037ea4dafdfc2cf0156ed4951f7a64f.

* glue: update GetSession(context.Context) for Glue interface

* glue: disable more methods of sqlConnSession

* restore: disable implicit initiation of `sync.WaitGroup`

* restore: cancel nil error throw when restore schema done

* restore: replace session map to pool

* restore: keep restore table statements ordered

* restore: assign single session to `restoreSchemaWorker#doJob`'s goroutine

* restore: add log for `restoreSchema`

* restore: add quit case when error thrown blocked

* restore: Improve the robustness of concurrency pattern

* restore: fix channel send/recv logic to avoid blocked forever occurs.

* restore: `sync.WaitGroup#Add` first when `restoreSchemaWorker#appendJob`

* restore: add impl of `schemaStmtType#String`

* restore: avoid to wait whole jobs done forever when goroutine of `doJob` exit unnormal

* restore: call cancel function when `makeJobs` exit

* restore: a few improvement

* test: add unit tests of `RestoreController#restoreSchema()`

Co-authored-by: lance6716 <[email protected]>
Co-authored-by: glorv <[email protected]>

* Compatible for disk quota (#543)

* compatible for disk quota

Signed-off-by: Little-Wallace <[email protected]>

* unlock when err occur

Signed-off-by: Little-Wallace <[email protected]>

* fix delete

Signed-off-by: Little-Wallace <[email protected]>

* fix method name

Signed-off-by: Little-Wallace <[email protected]>

Co-authored-by: kennytm <[email protected]>
Co-authored-by: glorv <[email protected]>

* *: add method to check whether need local SST of table (#491)

* backend/local: skip split regions if engine total size is smaller than region size (#524)

* config: change redact log parameter name (#547)

* change redact log parameter name

* address comment

* update lightning.toml

* backend/tidb: temporarily disable the strict-mode value check in tidb backend  (#551)

* temporarily disable the strict-mode value check in tidb backend since it's buggy

* add link to related issue

* test: fix invalid failpoint and integration test (#510)

* test: fix s3 integration test (#555)

* grafana dashboards support multiple cluster (#556)

* metrics: use tidb_cluster label get variable values (#559)

Signed-off-by: zhengjiajin <[email protected]>

* restore: add importing progress and optimize the accuracy of restore progress (#506)

* backend/local: fallback retryIngest to retryWrite (#554)

* backend: implement disk quota (#493)

* common: copied the GetStorageSize function from DM

* common: recognize multierr in IsRetryableError()

* restore: refactor runPeriodicActions

* config: …
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants