Speed up the DDL execution speed of TiDB backup data restore #27036

IANTHEREAL · 2021-08-09T12:10:01Z

Feature Request

Is your feature request related to a problem? Please describe:

Describe the feature you'd like:

The cluster has 6 TiB data, 30k tables, and 11 TiKVs. When I use BR to backup and restore the cluster, I find that the speed is particularly slow. After investigation, BR can only create 2 tables per second, the entire store speed takes nearly 4h, and the execution time spent creating tables is close to 4h. It can be seen that the execution speed of ddl is the bottleneck in the this scenario.

Describe alternatives you've considered:

In order to speed up the restore speed, I hope TiDB to speed up the execution of DDL. In this scenario, BR can create tables at 200 tables/s
In terms of compatibility, Binlog/CDC is required to be able to replicate the table schema of these tables, and the restored data does not need to be replicated temporarily

A new TiDB batch ddl interface to create tables #30272
BR using new TiDB ddl api to create tables #30284
Binlog can sync data between upstream (TiDB) to downstream #30516
CDC can sync data between upstream (TiDB) to downstream

YuJuncen · 2021-09-15T08:36:17Z

The bottleneck of creating table is to waiting the schema version changed.
Each time we creating table, we should at least wait for CheckVersFirstWaitTime.

tidb/ddl/util/syncer.go

Lines 362 to 367 in e262e59

    
           func (s *schemaVersionSyncer) OwnerCheckAllVersions(ctx context.Context, latestVer int64) error { 
        
           	startTime := time.Now() 
        
           	time.Sleep(CheckVersFirstWaitTime) 
        
           	notMatchVerCnt := 0 
        
           	intervalCnt := int(time.Second / checkVersInterval) 
        
           	updatedMap := make(map[string]struct{})

Currently, BR uses an internal interface named CreateTableWithInfo to creating table, which creating table and wait the schema changing one-by-one, omitting the sync of the ddl job between BR and leader, the procedure of creating one table would be like this:

for _, t := range tables {
  RunInTxn(func(txn) {
    m := meta.New(txn)
    schemaVesrion := m.CreateTable(t)
    m.UpdateSchema(schemaVersion)
  })
  waitSchemaToSync() // <- This would notify and then 
  // waiting for all other TiDB nodes are synced with the latest schema version. 
}

If possible, we can move that I/O bounded slow operation out of the for loop, like:

RunInTxn(func(txn) {
  for _, t := range tables {
    m := meta.New(txn)
    schemaVesrion := m.CreateTable(t)
    m.UpdateSchema(schemaVersion)
  })
}
waitSchemaToSync() // <- only one time of waiting.

YuJuncen · 2021-09-15T08:48:13Z

This need TiDB to provide a batch version of CreateTableWithInfo, which create tables in bulk with one DDL job, might seem like this:

// CreateTablesWithInfo creates many tables via the information.
// If the AutoID / AutoRandomID was set in the table info, 
// those of the new created table would be rebased to that.
// (optional to implement) Try to use the original table ID filled in the table info, 
// only at conflict try to alloc new ID when tryRetainID is `true`. Otherwise report an error.
func (d *DDL) CreateTablesWithInfo(ctx sessionctx.Context,
		schema model.CIStr,
		infos []*model.TableInfo,
		onExist OnExist,
		tryRetainID bool) error

close #27036

IANTHEREAL added the type/feature-request Categorizes issue or PR as related to a new feature. label Aug 9, 2021

xhebox self-assigned this Oct 11, 2021

xhebox mentioned this issue Oct 12, 2021

ddl: support batch create table #28763

Merged

12 tasks

fengou1 self-assigned this Dec 7, 2021

xhebox mentioned this issue Jan 24, 2022

Fix BatchCreateTables for creating views #31923

Closed

fengou1 mentioned this issue Mar 14, 2022

br: add new design for batch create table #33026

Merged

8 tasks

ti-chi-bot closed this as completed in #33026 Apr 12, 2022

ti-chi-bot pushed a commit that referenced this issue Apr 12, 2022

br: add new design for batch create table (#33026)

6dd5d1d

close #27036

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the DDL execution speed of TiDB backup data restore #27036

Speed up the DDL execution speed of TiDB backup data restore #27036

IANTHEREAL commented Aug 9, 2021 •

edited by fengou1

Loading

YuJuncen commented Sep 15, 2021 •

edited

Loading

YuJuncen commented Sep 15, 2021 •

edited

Loading

Speed up the DDL execution speed of TiDB backup data restore #27036

Speed up the DDL execution speed of TiDB backup data restore #27036

Comments

IANTHEREAL commented Aug 9, 2021 • edited by fengou1 Loading

Feature Request

YuJuncen commented Sep 15, 2021 • edited Loading

YuJuncen commented Sep 15, 2021 • edited Loading

IANTHEREAL commented Aug 9, 2021 •

edited by fengou1

Loading

YuJuncen commented Sep 15, 2021 •

edited

Loading

YuJuncen commented Sep 15, 2021 •

edited

Loading