Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
ti-chi-bot authored Jul 4, 2024
1 parent 6cebb40 commit ea316bf
Show file tree
Hide file tree
Showing 8 changed files with 16 additions and 16 deletions.
10 changes: 5 additions & 5 deletions best-practices/high-concurrency-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ This document describes best practices for handling highly-concurrent write-heav

## Target audience

This document assumes that you have a basic understanding of TiDB. It is recommended that you first read the following three blog articles that explain TiDB fundamentals, and [TiDB Best Practices](https://en.pingcap.com/blog/tidb-best-practice/):
This document assumes that you have a basic understanding of TiDB. It is recommended that you first read the following three blog articles that explain TiDB fundamentals, and [TiDB Best Practices](https://www.pingcap.com/blog/tidb-best-practice/):

+ [Data Storage](https://en.pingcap.com/blog/tidb-internal-data-storage/)
+ [Computing](https://en.pingcap.com/blog/tidb-internal-computing/)
+ [Scheduling](https://en.pingcap.com/blog/tidb-internal-scheduling/)
+ [Data Storage](https://www.pingcap.com/blog/tidb-internal-data-storage/)
+ [Computing](https://www.pingcap.com/blog/tidb-internal-computing/)
+ [Scheduling](https://www.pingcap.com/blog/tidb-internal-scheduling/)

## Highly-concurrent write-intensive scenario

Expand All @@ -32,7 +32,7 @@ For a distributed database, it is important to make full use of the capacity of

## Data distribution principles in TiDB

To address the above challenges, it is necessary to start with the data segmentation and scheduling principle of TiDB. Refer to [Scheduling](https://en.pingcap.com/blog/tidb-internal-scheduling/) for more details.
To address the above challenges, it is necessary to start with the data segmentation and scheduling principle of TiDB. Refer to [Scheduling](https://www.pingcap.com/blog/tidb-internal-scheduling/) for more details.

TiDB splits data into Regions, each representing a range of data with a size limit of 96M by default. Each Region has multiple replicas, and each group of replicas is called a Raft Group. In a Raft Group, the Region Leader executes the read and write tasks (TiDB supports [Follower-Read](/follower-read.md)) within the data range. The Region Leader is automatically scheduled by the Placement Driver (PD) component to different physical nodes evenly to distribute the read and write pressure.

Expand Down
8 changes: 4 additions & 4 deletions best-practices/tidb-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ This document summarizes the best practices of using TiDB, including the use of

Before you read this document, it is recommended that you read three blog posts that introduce the technical principles of TiDB:

* [TiDB Internal (I) - Data Storage](https://en.pingcap.com/blog/tidb-internal-data-storage/)
* [TiDB Internal (II) - Computing](https://en.pingcap.com/blog/tidb-internal-computing/)
* [TiDB Internal (III) - Scheduling](https://en.pingcap.com/blog/tidb-internal-scheduling/)
* [TiDB Internal (I) - Data Storage](https://www.pingcap.com/blog/tidb-internal-data-storage/)
* [TiDB Internal (II) - Computing](https://www.pingcap.com/blog/tidb-internal-computing/)
* [TiDB Internal (III) - Scheduling](https://www.pingcap.com/blog/tidb-internal-scheduling/)

## Preface

Expand Down Expand Up @@ -67,7 +67,7 @@ Placement Driver (PD) balances the load of the cluster according to the status o

### SQL on KV

TiDB automatically maps the SQL structure into Key-Value structure. For details, see [TiDB Internal (II) - Computing](https://en.pingcap.com/blog/tidb-internal-computing/).
TiDB automatically maps the SQL structure into Key-Value structure. For details, see [TiDB Internal (II) - Computing](https://www.pingcap.com/blog/tidb-internal-computing/).

Simply put, TiDB performs the following operations:

Expand Down
2 changes: 1 addition & 1 deletion dashboard/dashboard-key-visualizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ This section introduces the basic concepts that relate to Key Visualizer.

In a TiDB cluster, the stored data is distributed among TiKV instances. Logically, TiKV is a huge and orderly key-value map. The whole key-value space is divided into many segments and each segment consists of a series of adjacent keys. Such segment is called a `Region`.

For detailed introduction of Region, refer to [TiDB Internal (I) - Data Storage](https://en.pingcap.com/blog/tidb-internal-data-storage/).
For detailed introduction of Region, refer to [TiDB Internal (I) - Data Storage](https://www.pingcap.com/blog/tidb-internal-data-storage/).

### Hotspot

Expand Down
2 changes: 1 addition & 1 deletion explore-htap.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The following are the typical use cases of HTAP:

When using TiDB as a data hub, TiDB can meet specific business needs by seamlessly connecting the data for the application and the data warehouse.

For more information about use cases of TiDB HTAP, see [blogs about HTAP on the PingCAP website](https://en.pingcap.com/blog/?tag=htap).
For more information about use cases of TiDB HTAP, see [blogs about HTAP on the PingCAP website](https://www.pingcap.com/blog/?tag=htap).

## Architecture

Expand Down
4 changes: 2 additions & 2 deletions faq/migration-tidb-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,13 +170,13 @@ Yes. But the `load data` does not support the `replace into` syntax.
### Why does the query speed getting slow after deleting data?
Deleting a large amount of data leaves a lot of useless keys, affecting the query efficiency. Currently the Region Merge feature is in development, which is expected to solve this problem. For details, see the [deleting data section in TiDB Best Practices](https://en.pingcap.com/blog/tidb-best-practice/#write).
Deleting a large amount of data leaves a lot of useless keys, affecting the query efficiency. Currently the Region Merge feature is in development, which is expected to solve this problem. For details, see the [deleting data section in TiDB Best Practices](https://www.pingcap.com/blog/tidb-best-practice/#write).
### What is the most efficient way of deleting data?
When deleting a large amount of data, it is recommended to use `Delete from t where xx limit 5000;`. It deletes through the loop and uses `Affected Rows == 0` as a condition to end the loop, so as not to exceed the limit of transaction size. With the prerequisite of meeting business filtering logic, it is recommended to add a strong filter index column or directly use the primary key to select the range, such as `id >= 5000*n+m and id < 5000*(n+1)+m`.
If the amount of data that needs to be deleted at a time is very large, this loop method will get slower and slower because each deletion traverses backward. After deleting the previous data, lots of deleted flags remain for a short period (then all will be processed by Garbage Collection) and influence the following Delete statement. If possible, it is recommended to refine the Where condition. See [details in TiDB Best Practices](https://en.pingcap.com/blog/tidb-best-practice/#write).
If the amount of data that needs to be deleted at a time is very large, this loop method will get slower and slower because each deletion traverses backward. After deleting the previous data, lots of deleted flags remain for a short period (then all will be processed by Garbage Collection) and influence the following Delete statement. If possible, it is recommended to refine the Where condition. See [details in TiDB Best Practices](https://www.pingcap.com/blog/tidb-best-practice/#write).
### How to improve the data loading speed in TiDB?
Expand Down
2 changes: 1 addition & 1 deletion faq/sql-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Yes. The exception being that `LOAD DATA` does not currently support the `REPLAC

## Why does the query speed get slow after data is deleted?

Deleting a large amount of data leaves a lot of useless keys, affecting the query efficiency. Currently the [Region Merge](/best-practices/massive-regions-best-practices.md) feature is in development, which is expected to solve this problem. For details, see the [deleting data section in TiDB Best Practices](https://en.pingcap.com/blog/tidb-best-practice/#write).
Deleting a large amount of data leaves a lot of useless keys, affecting the query efficiency. Currently the [Region Merge](/best-practices/massive-regions-best-practices.md) feature is in development, which is expected to solve this problem. For details, see the [deleting data section in TiDB Best Practices](https://www.pingcap.com/blog/tidb-best-practice/#write).

## What should I do if it is slow to reclaim storage space after deleting data?

Expand Down
2 changes: 1 addition & 1 deletion telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,4 +261,4 @@ To meet compliance requirements in different countries or regions, the usage inf
- For IP addresses from the Chinese mainland, usage information is sent to and stored on cloud servers in the Chinese mainland.
- For IP addresses from outside of the Chinese mainland, usage information is sent to and stored on cloud servers in the US.

See [PingCAP Privacy Policy](https://en.pingcap.com/privacy-policy/) for details.
See [PingCAP Privacy Policy](https://www.pingcap.com/privacy-policy/) for details.
2 changes: 1 addition & 1 deletion tiflash/tiflash-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ In TiFlash, the columnar replicas are asynchronously replicated according to the

The above figure is the architecture of TiDB in its HTAP form, including TiFlash nodes.

TiFlash provides the columnar storage, with a layer of coprocessors efficiently implemented by ClickHouse. Similar to TiKV, TiFlash also has a Multi-Raft system, which supports replicating and distributing data in the unit of Region (see [Data Storage](https://en.pingcap.com/blog/tidb-internal-data-storage/) for details).
TiFlash provides the columnar storage, with a layer of coprocessors efficiently implemented by ClickHouse. Similar to TiKV, TiFlash also has a Multi-Raft system, which supports replicating and distributing data in the unit of Region (see [Data Storage](https://www.pingcap.com/blog/tidb-internal-data-storage/) for details).

TiFlash conducts real-time replication of data in the TiKV nodes at a low cost that does not block writes in TiKV. Meanwhile, it provides the same read consistency as in TiKV and ensures that the latest data is read. The Region replica in TiFlash is logically identical to those in TiKV, and is split and merged along with the Leader replica in TiKV at the same time.

Expand Down

0 comments on commit ea316bf

Please sign in to comment.