Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glossary: Add more abbreviations #19213

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 139 additions & 1 deletion glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@

Baseline Capturing captures queries that meet capturing conditions and create bindings for them. It is used for [preventing regression of execution plans during an upgrade](/sql-plan-management.md#prevent-regression-of-execution-plans-during-an-upgrade).

### BR
dveeden marked this conversation as resolved.
Show resolved Hide resolved

BR is the Backup and Restore tool for TiDB. For more information, see [BR Overview](/br/backup-and-restore-overview.md).

### Bucket

A [Region](#regionpeerraft-group) is logically divided into several small ranges called bucket. TiKV collects query statistics by buckets and reports the bucket status to PD. For details, see the [Bucket design doc](https://github.com/tikv/rfcs/blob/master/text/0082-dynamic-size-region.md#bucket).
Expand All @@ -40,6 +44,10 @@

With the cached table feature, TiDB loads the data of an entire table into the memory of the TiDB server, and TiDB directly gets the table data from the memory without accessing TiKV, which improves the read performance.

### CF
dveeden marked this conversation as resolved.
Show resolved Hide resolved

In RocksDB and TiKV, a Column Family (CF) represents a logical grouping of key-value pairs within a database.

### Coalesce Partition

Coalesce Partition is a way of decreasing the number of partitions in a Hash or Key partitioned table. For more information, see [Manage Hash and Key partitions](/partitioned-table.md#manage-hash-and-key-partitions).
Expand All @@ -48,14 +56,72 @@

Introduced in TiDB 5.3.0, Continuous Profiling is a way to observe resource overhead at the system call level. With the support of Continuous Profiling, TiDB provides performance insight as clear as directly looking into the database source code, and helps R&D and operation and maintenance personnel to locate the root cause of performance problems using a flame graph. For details, see [TiDB Dashboard Instance Profiling - Continuous Profiling](/dashboard/continuous-profiling.md).

### CTE
dveeden marked this conversation as resolved.
Show resolved Hide resolved

A Common Table Expression (CTE) enables you to define a temporary result set that can be referred multiple times within a SQL statement using the [`WITH`](/sql-statements/sql-statement-with.md) clause. For more information, see [Common Table Expression](/develop/dev-guide-use-common-table-expression.md).
dveeden marked this conversation as resolved.
Show resolved Hide resolved

## D

### DDL
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Data Definition Language (DDL) statements enables you to create, modify, and drop tables, indexes, columns, and other database objects.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

### DM
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Data Migration (DM) is a tool for migrating data from MySQL-compatible databases into TiDB. It reads data from an instance of MySQL-compatible database and applies it to a TiDB target instance. For more information, see [DM Overview](/dm/dm-overview.md).
dveeden marked this conversation as resolved.
Show resolved Hide resolved

### DML

Data Modification Language (DML) statements enables you to with insert, update, and delete rows in tables.
dveeden marked this conversation as resolved.
Show resolved Hide resolved
qiancai marked this conversation as resolved.
Show resolved Hide resolved

### DMR
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Development Milestone Release (DMR) is a TiDB version that introduces the latest features but does not offer long-term support. For more information, see [TiDB Versioning](/releases/versioning.md).
dveeden marked this conversation as resolved.
Show resolved Hide resolved

### DR
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Disaster Recovery (DR) includes solutions that can be used to recover data from a disaster in the future. These solutions typically involve backups and standby clusters. For more information, see [Overview of TiDB Disaster Recovery Solutions](dr-solution-introduction).
dveeden marked this conversation as resolved.
Show resolved Hide resolved

### DXF
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Distributed eXecution Framework (DXF) is the framework used by TiDB for accelerating index creation and data import by distributing tasks over all available resources. For more information, see [DXF Introduction](/tidb-distributed-execution-framework.md).
dveeden marked this conversation as resolved.
Show resolved Hide resolved

### Dynamic Pruning

Dynamic pruning mode is one of the modes that TiDB accesses partitioned tables. In dynamic pruning mode, each operator supports direct access to multiple partitions. Therefore, TiDB no longer uses Union. Omitting the Union operation can improve the execution efficiency and avoid the problem of Union concurrent execution.

## E

### EC2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to define EC2 in our glossary? Practically speaking, will people be looking to us to define EC2 for them in this doc? I worry that this would expand to defining a bunch of other third party terms if we go down this path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"EC2" is used three times in the 8.4.0 release notes and at least 76 times elsewhere in our docs.

$ git grep EC2 | wc -l
76


[Elastic Compute Cloud (EC2)](https://aws.amazon.com/pm/ec2/) is an AWS service that provides scalable compute resources. It can be used with TiUP to deploy and manage a TiDB cluster.

## G

### GA
dveeden marked this conversation as resolved.
Show resolved Hide resolved

If a feature is General Available (GA), it indicates it is fully tested and can be used in production environments. Note that even if a feature is GA in a [DMR](#dmr) version, it is recommended to use the feature in production environments in a later [LTS](#lts) version.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

### GC
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Garbage Collection (GC) is a process that clears obsolete data to free up resources. For information on TiKV GC process, see [Garbage Collection overview](/garbage-collection-overview.md).

### GTID
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Global Transaction Identifiers (GTIDs) are unique transaction IDs used in MySQL binary logs to track which transactions have been replicated. [Data Migration (DM)](/dm/dm-overview.md) uses these IDs to ensure consistent replication.

## H

### HTAP
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Hybrid Transactional and Analytical Processing (HTAP) is a database feature that enables both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads within the same database. For TiDB, the HTAP feature is provided by using TiKV for row storage and TiFlash for columnar storage. For more information, see [the definition of HTAP on the Gartner website](https://www.gartner.com/en/information-technology/glossary/htap-enabling-memory-computing-technologies).

## I

### IMDS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we want to include this third party abbreviation here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in br/backup-and-restore-storages.md and in the TiDB v8.4.0 release notes.


Instance Metadata Service (IMDS) is an AWS service designed to manage and retrieve metadata for [EC2](#ec2) instances. For more information, see [Instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html).

### Index Merge

Index Merge is a method introduced in TiDB v4.0 to access tables. Using this method, the TiDB optimizer can use multiple indexes per table and merge the results returned by each index. In some scenarios, this method makes the query more efficient by avoiding full table scans. Since v5.4, Index Merge has become a GA feature.
Expand All @@ -64,8 +130,26 @@

The in-memory pessimistic lock is a new feature introduced in TiDB v6.0.0. When this feature is enabled, pessimistic locks are usually stored in the memory of the Region leader only, and are not persisted to disk or replicated through Raft to other replicas. This feature can greatly reduce the overhead of acquiring pessimistic locks and improve the throughput of pessimistic transactions.

## K

### KMS
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Key Management Service (KMS) enables the storage and retrieval of secret keys in a secure way. Examples include AWS KMS, GCP KMS, and HashiCorp Vault. Various TiDB components can use KMS to manage keys for storage encryption and related services.

Check failure on line 137 in glossary.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [Vale.Avoid] Avoid using 'GCP'. Raw Output: {"message": "[Vale.Avoid] Avoid using 'GCP'.", "location": {"path": "glossary.md", "range": {"start": {"line": 137, "column": 122}}}, "severity": "ERROR"}
qiancai marked this conversation as resolved.
Show resolved Hide resolved

### KV
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Key-Value (KV) is a way of storing information by associating values with unique keys, allowing quick data retrieval. TiDB uses TiKV to map tables and indexes into key-value pairs, enabling efficient data storage and access across the database.

## L

### LDAP
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Lightweight Directory Access Protocol (LDAP) is a standardized way of accessing a directory with information. It is commonly used for account and user data management. TiDB supports LDAP via [LDAP authentication plugins](/security-compatibility-with-mysql.md#authentication-plugin-status).

### LTS
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Long Term Support (LTS) refers to software versions that are extensively tested and maintained for extended periods. For more information, see [TiDB Versioning](/releases/versioning.md).

### leader/follower/learner
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Leader/Follower/Learner each corresponds to a role in a Raft group of [peers](#regionpeerraft-group). The leader services all client requests and replicates data to the followers. If the group leader fails, one of the followers will be elected as the new leader. Learners are non-voting followers that only serves in the process of replica addition.
Expand All @@ -82,10 +166,22 @@

## O

### OLAP
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Online Analytical Processing (OLAP) refers to database workloads focused on analytical tasks, such as data reporting and complex queries. OLAP is characterized by read-heavy queries that process large volumes of data across many rows.

Check warning on line 171 in glossary.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "glossary.md", "range": {"start": {"line": 171, "column": 225}}}, "severity": "INFO"}

### Old value

The "original value" in the incremental change log output by TiCDC. You can specify whether the incremental change log output by TiCDC contains the "original value".

### OLTP
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Online Transaction Processing (OLTP) refers to database workloads focused on transactional tasks, such as selecting, inserting, updating, and deleting small sets of records.

## OOM
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Out of Memory (OOM) is a situation where a system fails due to insufficient memory. For more information, see [Troubleshoot TiDB OOM Issues](/troubleshoot-tidb-oom.md).

### Operator

An operator is a collection of actions that applies to a Region for scheduling purposes. Operators perform scheduling tasks such as "migrate the leader of Region 2 to Store 5" and "migrate replicas of Region 2 to Store 1, 4, 5".
Expand All @@ -111,10 +207,18 @@

[Partitioning](/partitioned-table.md) refers to physically dividing a table into smaller table partitions, which can be done by partition methods such as RANGE, LIST, HASH, and KEY partitioning.

### PD
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Placement Driver (PD) is a core component in the [TiDB Architecture](/tidb-architecture.md#placement-driver-pd-server) responsible for storing metadata, assigning [Timestamp Oracle (TSO)](/tso.md) for transaction timestamps, orchestrating data placement on TiKV, and running [TiDB Dashboard](/dashboard/dashboard-overview.md). For more information, see [TiDB Scheduling](/tidb-scheduling.md).

### pending/down

"Pending" and "down" are two special states of a peer. Pending indicates that the Raft log of followers or learners is vastly different from that of leader. Followers in pending cannot be elected as leader. "Down" refers to a state that a peer ceases to respond to leader for a long time, which usually means the corresponding node is down or isolated from the network.

### PITR
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Point in Time Recovery (PITR) enables you to restore data to a specific point in time (for example, just before an unintended `DELETE` statement). For more information, see [TiDB Log Backup and PITR Architecture](/br/br-log-architecture.md).

### Point Get

Point get means reading a single row of data by a unique index or primary index, the returned resultset is up to one row.
Expand All @@ -125,6 +229,10 @@

## Q

### QPS

Queries Per Second (QPS) is the number of queries a database service handles per second, serving as a key performance metric for database throughput.

### Quota Limiter

Quota Limiter is an experimental feature introduced in TiDB v6.0.0. If the machine on which TiKV is deployed has limited resources, for example, with only 4v CPU and 16 G memory, and the foreground of TiKV processes too many read and write requests, the CPU resources used by the background are occupied to help process such requests, which affects the performance stability of TiKV. To avoid this situation, the [quota-related configuration items](/tikv-configuration-file.md#quota) can be set to limit the CPU resources to be used by the foreground.
Expand All @@ -135,6 +243,10 @@

Raft Engine is an embedded persistent storage engine with a log-structured design. It is built for TiKV to store multi-Raft logs. Since v5.4, TiDB supports using Raft Engine as the log storage engine. For details, see [Raft Engine](/tikv-configuration-file.md#raft-engine).

### RAG

Retrieval-Augmented Generation (RAG) is an architecture designed to optimize the output of Large Language Models (LLMs). For more information, See [Vector Search Overview](/vector-search-overview.md#use-cases).

### Region/peer/Raft group

Region is the minimal piece of data storage in TiKV, each representing a range of data (256 MiB by default). Each Region has three replicas by default. A replica of a Region is called a peer. Multiple peers of the same Region replicate data via the Raft consensus algorithm, so peers are also members of a Raft instance. TiKV uses Multi-Raft to manage data. That is, for each Region, there is a corresponding, isolated Raft group.
Expand All @@ -145,10 +257,18 @@

The mechanism of Region split is to use one initial Region to cover the entire key space, and generate new Regions through splitting existing ones every time the size of the Region or the number of keys has reached a threshold.

### restore
### Restore

Restore is the reverse of the backup operation. It is the process of bringing back the system to an earlier state by retrieving data from a prepared backup.

### RPC
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Remote Procedure Call (RPC) is a communication way between software components. In a TiDB cluster, the gRPC standard is used for communication between different components such as TiDB, TiKV, and TiFlash.

### RU
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Request Unit (RU) is a unified abstraction unit for the resource usage in TiDB. It is used with [Resource Control](/tidb-resource-control.md) to manage resource usage.

## S

### scheduler
Expand All @@ -160,6 +280,10 @@
- `hot-region-scheduler`: Balances the distribution of hot Regions
- `evict-leader-{store-id}`: Evicts all leaders of a node (often used for rolling upgrades)

### SST
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Static Sorted Table, Sorted String Table, or Sorted Sequence Table (SST) is a file storage format used in RocksDB.
dveeden marked this conversation as resolved.
Show resolved Hide resolved

### Store

A store refers to the storage node in the TiKV cluster (an instance of `tikv-server`). Each store has a corresponding TiKV instance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we adding the PD component to this glossary should we also add the TiDB Server, TiFlash Server, and TiKV Server components as well for completeness?

Suggested additions:

TiDB Server

The TiDB server is a stateless SQL layer that exposes the connection endpoint of the MySQL protocol to the outside. The TiDB server receives SQL requests, performs SQL parsing and optimization, and ultimately generates a distributed execution plan.

TiFlash Server

The TiFlash server is a special type of storage server. Unlike ordinary TiKV nodes, TiFlash stores data by column, mainly designed to accelerate analytical processing.

TiKV Server

The TiKV server is responsible for storing data. TiKV is a distributed transactional key-value storage engine.

Expand All @@ -170,6 +294,20 @@

Top SQL helps locate SQL queries that contribute to a high load of a TiDB or TiKV node in a specified time range. For details, see [Top SQL user document](/dashboard/top-sql.md).

### TPS
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Transactions Per Second (TPS) is the number of transactions a database processes per second, serving as a key metric for measuring database performance and throughput.

### TSO
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Because TiKV is a distributed storage system, it requires a global timing service, Timestamp Oracle (TSO), to assign a monotonically increasing timestamp. In TiKV, such a feature is provided by PD, and in Google [Spanner](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf), this feature is provided by multiple atomic clocks and GPS. For details, see [TSO](/tso.md).

## U

### URI
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Uniform Resource Identifier (URI) is a standardized format for identifying a resource. For more information, see [Uniform Resource Identifier](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) on Wikipedia.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to include this definition in the glossary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because it is mentioned in our docs quite often. I would assume URI/URL could be considered common knowledge and left out. I'm fine with it either way.

$ git grep -cw URI | sed "s/:/\t/" | sort -k2 -r | head
ticdc/ticdc-sink-to-mysql.md	7
tidb-cloud/config-s3-and-gcs-access.md	6
tidb-cloud/changefeed-sink-to-cloud-storage.md	6
ticdc/ticdc-sink-to-kafka.md	6
dumpling-overview.md	6
br/br-pitr-manual.md	5
tiproxy/tiproxy-api.md	4
tidb-cloud/tidb-cloud-auditing.md	4
tidb-cloud/migrate-sql-shards.md	4
tidb-cloud/migrate-from-op-tidb.md	4


### UUID
dveeden marked this conversation as resolved.
Show resolved Hide resolved

Universally Unique Identifier (UUID) is a 128-bit (16-byte) generated ID used to uniquely identify records in a database. For more information, see [UUID](/best-practices/uuid.md).
Loading