Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Lightning: support downgrading to row-by-row insert when batch insert meets an error #1366

Closed
JmPotato opened this issue Jul 19, 2021 · 4 comments · Fixed by pingcap/tidb#27008
Labels

Comments

@JmPotato
Copy link
Member

JmPotato commented Jul 19, 2021

Feature Request

Describe your feature request related problem:

Part of #1365. Usually, Lightning will insert multiple rows at once, however, this will make the error recording and skipping not that easy if it's in a batch mode. We need to support downgrading to row-by-row insert when batch insert meets an error.

Describe the feature you'd like:

Normally, Lightning will import multiple rows like this:

start transaction;
insert into t1 values (111), (222), (333), (444);
commit;

If we want to record and skip any row error that occurs during this insert while not interrupt other normal data rows, we need to make the insert look like this:

start transaction;
insert into t1 values (111);
insert into t1 values (222);
insert into t1 values (333);
insert into t1 values (444);
commit;

Though this may lead to performance degradation, we will only downgrade to row-by-row insert when batch insert meets an error.

Implementation

func (be *tidbBackend) WriteRows(ctx context.Context, _ uuid.UUID, tableName string, columnNames []string, rows kv.Rows) error {
var err error
outside:
for _, r := range rows.SplitIntoChunks(be.MaxChunkSize()) {
for i := 0; i < writeRowsMaxRetryTimes; i++ {
err = be.WriteRowsToDB(ctx, tableName, columnNames, r)
switch {
case err == nil:
continue outside
case common.IsRetryableError(err):
// retry next loop
default:
return err
}
}
return errors.Annotatef(err, "[%s] write rows reach max retry %d and still failed", tableName, writeRowsMaxRetryTimes)
}
return nil
}

(*tidbBackend).WriteRows will split data into different rows and check if the error is retryable. Retryable errors are often the result of, e.g, network problems, in which case retrying is feasible. However, errors that we need to record and skip are often caused by some fundamental errors, such as the mismatched column type, for which we only need to process further by implementing new code here.

  • For the recording, we need to know the position and content of the data from the files for import.
  • For the skipping, we need to only skip the row with the error and make sure the others being insert successfully.

Furthermore, metrics and tracking information are needed, such as error counts.

@JmPotato JmPotato added the type/feature-request New feature or request label Jul 19, 2021
@JmPotato
Copy link
Member Author

/label component/import

@ti-chi-bot
Copy link
Member

@JmPotato: The label(s) component/import cannot be applied. These labels are supported: Hacktoberfest, duplicate, good first issue, invalid, needs-cherry-pick-release-3.1, needs-cherry-pick-release-4.0, needs-cherry-pick-release-5.0, needs-cherry-pick-release-5.1, question, release-blocker, wontfix.

In response to this:

/label component/import

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@JmPotato
Copy link
Member Author

@kennytm PTAL, thx!

@kennytm
Copy link
Collaborator

kennytm commented Jul 19, 2021

i mean we should insert the downgrading between line 380 and 381. rest LGTM.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
3 participants