理解数据库分片 #5217

actini · 2019-02-24T12:38:19Z

译文翻译完成，resolve #5135

lsvih · 2019-02-24T12:58:25Z

请把英文原文删去

actini · 2019-02-24T15:01:13Z

请把英文原文删去

@lsvih 待校对者校对完毕，我自会将英文部分删去。

HearFishle · 2019-02-25T01:21:38Z

校对认领

fanyijihua · 2019-02-25T01:21:39Z

@HearFishle 好的呢 🍺

Fengziyin1234 · 2019-02-25T03:04:06Z

请把英文原文删去

@lsvih 待校对者校对完毕，我自会将英文部分删去。

@Romeo0906 你可以先删掉英文，因为这样的话校对起来反而比较不方便。

Fengziyin1234 · 2019-02-25T03:04:27Z

校对认领

fanyijihua · 2019-02-25T03:04:29Z

@Fengziyin1234 妥妥哒 🍻

actini · 2019-02-25T03:39:08Z

请把英文原文删去

@lsvih 待校对者校对完毕，我自会将英文部分删去。

@Romeo0906 你可以先删掉英文，因为这样的话校对起来反而比较不方便。

@Fengziyin1234 想来校对者校对的时候，也是需要看英文原文的吧？因此无论是在编辑器里校对，或者是在 web 端校对，何来保留英文会给校对造成不便一说呢？

对的对的我们当然会看。你去校对的时候他会 git 会帮你对应起来的

https://github.com/xitu/gold-miner/pull/5217/files

所以不需要留英文原文的。欢迎多去帮忙校对～社区需要你！

HearFishle · 2019-02-25T05:08:29Z

请译者看一下校对结果是否提交

HearFishle · 2019-02-25T05:10:49Z

TODO1/understanding-database-sharding.md

 Sharding can also help to make an application more reliable by mitigating the impact of outages. If your application or website relies on an unsharded database, an outage has the potential to make the entire application unavailable. With a sharded database, though, an outage is likely to affect only a single shard. Even though this might make some parts of the application or website unavailable to some users, the overall impact would still be less than if the entire database crashed.

+数据库分片还能使应用更加稳定，因为它降低了宕机的影响。如果你的应用或网站依赖于一个未经分片的数据库，宕机有可能会使得整个应用都不可用。但假如是分片的数据库，宕机也只会影响其中的一个分片。即使这会使得一些用户无法使用我们的应用或网站，但总的影响仍然会比整个数据库都挂掉要小很多。
+


’即使这会使得一些用户无法使用我们的应用或网站'=>'尽管我们的应用或网站的某些部分可能无法被一些用户访问'

HearFishle · 2019-02-25T05:11:56Z

TODO1/understanding-database-sharding.md

 *   **Setting up a remote database**. If you're working with a monolithic application in which all of its components reside on the same server, you can improve your database's performance by moving it over to its own machine. This doesn't add as much complexity as sharding since the database's tables remain intact. However, it still allows you to vertically scale your database apart from the rest of your infrastructure.
+* **使用远程数据库**。如果你有一个庞大的应用，其所有的组建都依赖于同一个数据库服务器，你可以考虑将数据库迁移到一台单独的机器上来提高其性能。这不会像数据库分片那样复杂，因为所有的数据库表都还是完整的。并且，这种方式还允许你能够抛开其他基础设施，单独地对数据库做纵向扩展。


'组建'=>'组件'

HearFishle · 2019-02-25T05:13:10Z

TODO1/understanding-database-sharding.md

 It's relatively simple to have a relational database running on a single machine and scale it up as necessary by upgrading its computing resources. Ultimately, though, any non-distributed database will be limited in terms of storage and compute power, so having the freedom to scale horizontally makes your setup far more flexible.

+我们可以很容易地在某一台机器上运行一个关系型数据库实例，并在必要时对其扩容，只要升级机器的计算资源即可。然而，所有非分布式的数据库最终都会受限于机器的存储容量和计算力，因此能够自由地横向扩展将使你的应用变得更加灵活。


'并在必要时对其扩容，只要升级机器的计算资源即可'=>'只用通过升级机器的计算资源即可对其扩容'

HearFishle · 2019-02-25T05:14:55Z

TODO1/understanding-database-sharding.md

 The main appeal of directory based sharding is its flexibility. Range based sharding architectures limit you to specifying ranges of values, while key based ones limit you to using a fixed hash function which, as mentioned previously, can be exceedingly difficult to change later on. Directory based sharding, on the other hand, allows you to use whatever system or algorithm you want to assign data entries to shards, and it's relatively easy dynamically add shards using this approach.

+基于目录的数据库分片架构最吸引人的地方就是灵活性。基于范围的数据库分片架构要求你必须制定值的区间，而基于键的数据库分片架构则要求你必须使用一个固定的哈希函数，正如前文所说，要修改该函数是非常困难的一件事。然而，基于目录的数据库分片架构允许你使用任何系统、任何算法来指定数据保存的分片位置，而且添加分片也相对简单一些。


'然而，基于目录的数据库分片架构允许你使用任何系统'=>'而基于目录的数据库分片架构则允许你使用任何系统'

HearFishle · 2019-02-25T13:23:28Z

校对完成

actini · 2019-02-25T15:44:07Z

校对完成

@HearFishle 棒棒哒，已经修改好了，十分感谢～

Fengziyin1234

校对了一半

感觉翻译的超级棒！我强行找了一点问题。有不对的地方欢迎指正。

我对数据库这一块一直不那么熟，感觉读这个文章可以学到不少呢！

Fengziyin1234 · 2019-02-26T04:22:40Z

TODO1/understanding-database-sharding.md

 Sharding involves breaking up one's data into two or more smaller chunks, called _logical shards_. The logical shards are then distributed across separate database nodes, referred to as _physical shards_, which can hold multiple logical shards. Despite this, the data held within all the shards collectively represent an entire logical dataset.

+数据库分片包括将数据分成若干个子集，称为**逻辑分片**，然后逻辑分片分布在不同的数据库节点上，称为**物理分片**。每个数据库节点能够承载多个逻辑分片。尽管如此，所有这些分片集合中的数据一起表现出来就如同一整个逻辑数据集。


称为逻辑分片，
=》
称为逻辑分片。
后面是句号！
到目前翻译的都超级棒！我只能找找这种问题

Fengziyin1234 · 2019-02-26T04:37:28Z

TODO1/understanding-database-sharding.md

 Sharding involves breaking up one's data into two or more smaller chunks, called _logical shards_. The logical shards are then distributed across separate database nodes, referred to as _physical shards_, which can hold multiple logical shards. Despite this, the data held within all the shards collectively represent an entire logical dataset.

+数据库分片包括将数据分成若干个子集，称为**逻辑分片**，然后逻辑分片分布在不同的数据库节点上，称为**物理分片**。每个数据库节点能够承载多个逻辑分片。尽管如此，所有这些分片集合中的数据一起表现出来就如同一整个逻辑数据集。


每个数据库节点能够承载多个逻辑分片
=》
每个物理分片能够承载多个逻辑分片

没错的，物理分片就是数据库节点。我提出这个意见是因为我感觉作者想要强调的点更像是
A physical shard contains multiple logical shards.
完全一样。。。我也不知道我为啥提出来。。。。

然后这边英文的给我的理解是数据集是逻辑分片，数据库节点是物理分片

译文给我的理解更像是数据分成若干个子集这个行为叫做逻辑分片，逻辑分片分布在不同的数据库节点上这个行为，叫做物理分片（这也是完全可以忽略的观点，翻译是没问题的，我只是说一下自己的感觉。）

然后数据分成若干个子集 =》数据分成若干个小数据集
small chunks 比起子集我觉得小数据集会更合适（我也没有很强的意见。你自己看）

Fengziyin1234 · 2019-02-26T04:46:02Z

TODO1/understanding-database-sharding.md

 Database shards exemplify a [_shared-nothing architecture_](https://en.wikipedia.org/wiki/Shared-nothing_architecture). This means that the shards are autonomous; they don't share any of the same data or computing resources. In some cases, though, it may make sense to replicate certain tables into each shard to serve as reference tables. For example, let's say there's a database for an application that depends on fixed conversion rates for weight measurements. By replicating a table containing the necessary conversion rate data into each shard, it would help to ensure that all of the data required for queries is held in every shard.

+数据库分片是[**无共享架构**](https://en.wikipedia.org/wiki/Shared-nothing_architecture)的一个例证。这意味着分片行为是自治行为；各个分片之间不会共享任何数据或者计算资源。然而在某些情况下，将某些特定的数据表作为引用表复制到各个分片中也非常有用。比如，某个应用依赖固定的换算率在不同的重量单位之间换算。对于它的数据库，我们需要把保存换算率数据的表复制到每个分片中，才能够确保在每个分片中都保存着换算查询所需要的数据。


某个应用依赖固定的换算率在不同的重量单位之间换算。
=》
有一个基于固定转换率的重量测量应用。

这里的 measurements 翻译成“单位”更合适哈，weight measurements 翻译为重量单位更加通顺~
这个应用可以将千克换算成磅，大概就是这个意思。

Fengziyin1234 · 2019-02-26T04:52:30Z

TODO1/understanding-database-sharding.md

 Database shards exemplify a [_shared-nothing architecture_](https://en.wikipedia.org/wiki/Shared-nothing_architecture). This means that the shards are autonomous; they don't share any of the same data or computing resources. In some cases, though, it may make sense to replicate certain tables into each shard to serve as reference tables. For example, let's say there's a database for an application that depends on fixed conversion rates for weight measurements. By replicating a table containing the necessary conversion rate data into each shard, it would help to ensure that all of the data required for queries is held in every shard.

+数据库分片是[**无共享架构**](https://en.wikipedia.org/wiki/Shared-nothing_architecture)的一个例证。这意味着分片行为是自治行为；各个分片之间不会共享任何数据或者计算资源。然而在某些情况下，将某些特定的数据表作为引用表复制到各个分片中也非常有用。比如，某个应用依赖固定的换算率在不同的重量单位之间换算。对于它的数据库，我们需要把保存换算率数据的表复制到每个分片中，才能够确保在每个分片中都保存着换算查询所需要的数据。


我们需要把保存换算率数据的表复制到每个分片中，才能够确保在每个分片中都保存着换算查询所需要的数据
=》
把保存所需的换算率数据的表复制到每个分片中，将会帮助确保在每个分片中都保存着换算查询所需要的数据

again，我在使劲的扣字里行间的意思（就是这里也完全是个人意见，希望能有帮助的）
译文的 “才能够确保 ” 会给我这是只有这样做才能确保（不这么就不行）的意思。这边的话应该是”你完全可以不存，但是存了能帮助你确保，（不存也有别的方法，可能比较麻烦）“的感觉。

然后 necessary 修饰 conversion rate，我们需要就无从说起了我翻译成了所需的 conversion rate

Fengziyin1234 · 2019-02-26T05:07:12Z

TODO1/understanding-database-sharding.md

 Sharding can also help to make an application more reliable by mitigating the impact of outages. If your application or website relies on an unsharded database, an outage has the potential to make the entire application unavailable. With a sharded database, though, an outage is likely to affect only a single shard. Even though this might make some parts of the application or website unavailable to some users, the overall impact would still be less than if the entire database crashed.

+数据库分片还能使应用更加稳定，因为它降低了宕机的影响。如果你的应用或网站依赖于一个未经分片的数据库，宕机有可能会使得整个应用都不可用。但假如是分片的数据库，宕机也只会影响其中的一个分片。尽管我们的应用或网站的某些功能可能无法被一些用户访问，但总的影响仍然会比整个数据库都挂掉要小很多。


数据库分片还能使应用更加稳定，因为它降低了宕机的影响
=》
数据库分片还能通过降低了宕机的影响，来使应用更加稳定

Fengziyin1234 · 2019-02-26T05:09:21Z

TODO1/understanding-database-sharding.md

 While sharding a database can make scaling easier and improve performance, it can also impose certain limitations. Here, we'll discuss some of these and why they might be reasons to avoid sharding altogether.

+尽管对数据库分片能够易于扩展、提高性能，但是它也有一些局限性。我们将在这部分讨论其部分局限性，并解释为什么不应该一股脑儿地都对数据库做分片处理。


并解释为什么不应该一股脑儿地都对数据库做分片处理。
=》
并解释为什么不应该一股脑儿地对所有的数据库做分片处理。
我喜欢这个翻译一股脑儿很可爱！

Fengziyin1234 · 2019-02-26T05:15:26Z

TODO1/understanding-database-sharding.md

 Another major drawback is that once a database has been sharded, it can be very difficult to return it to its unsharded architecture. Any backups of the database made before it was sharded won't include data written since the partitioning. Consequently, rebuilding the original unsharded architecture would require merging the new partitioned data with the old backups or, alternatively, transforming the partitioned DB back into a single DB, both of which would be costly and time consuming endeavors.

+另一个重要的缺点就是，一旦数据库分片完成了，就很难恢复到分片之前的架构了。分片操作之前所做的任何备份都不包含分区之后写入的数据。因此，要重建原始的未分片的数据库架构就必须将新分区的数据和旧的备份数据融合在一起，或者将各个分区的数据库导入到单个数据库中。这两种方式都费事费力，代价不菲。


费事费力
=》
费时费力

费事费力也对，我只是觉得你可能打错啦哈哈哈

Fengziyin1234 · 2019-02-26T05:18:47Z

TODO1/understanding-database-sharding.md

 A final disadvantage to consider is that sharding isn't natively supported by every database engine. For instance, PostgreSQL does not include automatic sharding as a feature, although it is possible to manually shard a PostgreSQL database. There are a number of Postgres forks that do include automatic sharding, but these often trail behind the latest PostgreSQL release and lack certain other features. Some specialized database technologies — like MySQL Cluster or certain database-as-a-service products like MongoDB Atlas — do include auto-sharding as a feature, but vanilla versions of these database management systems do not. Because of this, sharding often requires a "roll your own" approach. This means that documentation for sharding or tips for troubleshooting problems are often difficult to find.

+最后一个需要考量的缺点，并非所有的数据库引擎都原生地支持分片操作。比如，PostgreSQL 没有自动分片的功能，因此我们只能手动地给 PostgreSQL 数据库分片。虽然有很多 Postgres 的分支版本具备自动分片的功能，但是这些版本的发布通常都晚于最新的 PostgreSQL 版本，并且会缺少某些功能。一些专业的数据库技术 - 比如 MySQL 集群或者某些数据库服务产品（比如 MongoDB Atlas）- 具备自动分片的功能，但是普通版本的数据库关系系统中并不具备这样的功能。正因如此，数据库分片操作通常需要你自己动手，这也意味着很难找到关于数据库分片操作的文档或者排查问题的建议。


vanilla 是不是翻译成原生更好

原生版本？

这里强调跟专业版的区别，所以我感觉翻译成普通版本好一些，嗯。。。原生版本。。。听起来有些别扭。

这里强调跟专业版的区别，所以我感觉翻译成普通版本好一些，嗯。。。原生版本。。。听起来有些别扭。

我的理解是
vanilla是database management systems本身。
基于这些database management systems上面做的东西就是非 vanilla

比如说 vanilla js 是纯js， jQuery 也是js 但是是基于 js 做的library，但感觉并不能说是js 的专业版

我的意见是原生（我也觉得别扭，但是大家都这么翻译）或者是数据库系统自身。

普通版本的数据库关系系统
=》
数据库系统自身/本身

然后数据库管理系统？关系应该是管理吧

Fengziyin1234 · 2019-02-26T05:20:46Z

TODO1/understanding-database-sharding.md

 These are, of course, only some general issues to consider before sharding. There may be many more potential drawbacks to sharding a database depending on its use case.

+当然，这些仅仅是数据库分片操作之前需要考虑的一些常见问题。可能还有很多问题，大都依赖于数据库分片的使用场景。


可能还有很多问题，大都依赖于数据库分片的使用场景。
=》
面对不同的使用场景，数据库分片操作可能会有更多的问题。

这句我读的时候也感觉有点别扭。还有这个译者确实是好厉害。

这句我读的时候也感觉有点别扭。还有这个译者确实是好厉害。

我也觉得膜一下！

leviding · 2019-02-27T11:33:30Z

@Romeo0906 可以修改啦

Fengziyin1234

校对完成。。我这边没有问题了。感觉大佬很懂，而且翻译的也很认真。大写的赞！

Fengziyin1234 · 2019-02-27T18:42:19Z

TODO1/understanding-database-sharding.md

 Once you've decided to shard your database, the next thing you need to figure out is how you'll go about doing so. When running queries or distributing incoming data to sharded tables or databases, it's crucial that it goes to the correct shard. Otherwise, it could result in lost data or painfully slow queries. In this section, we'll go over a few common sharding architectures, each of which uses a slightly different process to distribute data across shards.

+一旦你决定要对数据库进行分片，接下来你要做的就是弄清楚如何去实现它。查询数据的时候或者保存数据到分片的数据库或数据表中的时候，选用正确的分区至关重要。否则，将会造成数据丢失或者令人痛苦的慢查询。在这部分，我们将要介绍一些常用的数据库分片架构，每种架构在不同分片之前保存数据的方式都略有不同。


或者令人痛苦的慢查询
=》
或者查询令人发指的慢

在不同分片之前保存数据
=》
在不同分片之间分布数据

Fengziyin1234 · 2019-02-27T18:46:38Z

TODO1/understanding-database-sharding.md

 _Key based sharding_, also known as _hash based sharding_, involves using a value taken from newly written data — such as a customer's ID number, a client application's IP address, a ZIP code, etc. — and plugging it into a _hash function_ to determine which shard the data should go to. A hash function is a function that takes as input a piece of data (for example, a customer email) and outputs a discrete value, known as a _hash value_. In the case of sharding, the hash value is a shard ID used to determine which shard the incoming data will be stored on. Altogether, the process looks like this:

+**基于键的分片**也叫**基于哈希的分片**，使用新写入数据的值 —— 比如客户 ID、客户端 IP、ZIP 码等等 —— 通过**哈希函数**判断保存的分片位置。哈希函数将输入的数据（如用户的邮件地址）转换成离散数据（也叫做哈希值）输出。在数据库分片中，哈希值将作为数据库分片 ID 将数据保存到对应的分片中。总的来说，整个过程是这样的：


判断保存的分片位置
=》
决定数据保存的分片位置

Fengziyin1234 · 2019-02-27T18:52:58Z

TODO1/understanding-database-sharding.md

 _Key based sharding_, also known as _hash based sharding_, involves using a value taken from newly written data — such as a customer's ID number, a client application's IP address, a ZIP code, etc. — and plugging it into a _hash function_ to determine which shard the data should go to. A hash function is a function that takes as input a piece of data (for example, a customer email) and outputs a discrete value, known as a _hash value_. In the case of sharding, the hash value is a shard ID used to determine which shard the incoming data will be stored on. Altogether, the process looks like this:

+**基于键的分片**也叫**基于哈希的分片**，使用新写入数据的值 —— 比如客户 ID、客户端 IP、ZIP 码等等 —— 通过**哈希函数**判断保存的分片位置。哈希函数将输入的数据（如用户的邮件地址）转换成离散数据（也叫做哈希值）输出。在数据库分片中，哈希值将作为数据库分片 ID 将数据保存到对应的分片中。总的来说，整个过程是这样的：


在数据库分片中，哈希值将作为数据库分片 ID 将数据保存到对应的分片中
=》（个人建议）
在具体的例子中，输出的哈希值将是数据库的分片 ID，决定数据保存到哪个分片中

原因：哈希值将作为数据库分片 ID
这里显然不是 ‘将作为’
而是 hash值‘是’ 已有的数据库分片 ID

我把“将”字去掉了，哈哈

Fengziyin1234 · 2019-02-27T18:57:37Z

TODO1/understanding-database-sharding.md

 While key based sharding is a fairly common sharding architecture, it can make things tricky when trying to dynamically add or remove additional servers to a database. As you add servers, each one will need a corresponding hash value and many of your existing entries, if not all of them, will need to be remapped to their new, correct hash value and then migrated to the appropriate server. As you begin rebalancing the data, neither the new nor the old hashing functions will be valid. Consequently, your server won't be able to write any new data during the migration and your application could be subject to downtime.

+尽管基于键的数据库分片是一种非常常见的数据库分片架构方式，在这种架构上动态地增加或移除数据库服务器却是一件困难的事情。当你增加服务器的时候，每个数据库分片的哈希值都需要调整，因此，即使不是全部数据，也会有很多数据需要重新映射到正确的哈希值，然后迁移到正确的服务器。当你对数据进行均衡性调整的时候，新旧哈希函数都将失效。因此，在数据迁移期间，数据库服务器将不能写入任何数据，应用也将暂停服务。


fairly
=》
蛮/相当

actini · 2019-02-28T13:47:35Z

@leviding 已经修改完啦，请三代目大大合并吧～
感谢 @HearFishle @Fengziyin1234 两位大神，校对非常用心，建议非常中肯，都是优秀的人儿～ @leviding 给两位校对者加鸡腿～

leviding · 2019-03-01T05:38:12Z

@Romeo0906 已经 merge 啦~ 快快麻溜发布到掘金然后给我发下链接，方便及时添加积分哟。

掘金翻译计划有自己的知乎专栏，你也可以投稿哈，推荐使用一个好用的插件。
专栏地址：https://zhuanlan.zhihu.com/juejinfanyi

actini · 2019-03-01T11:26:54Z

@leviding 分享了，记得加积分哦～ https://juejin.im/entry/5c791672f265da2da67c46eb/detail

翻译完成，请大神们校对

c927dac

fanyijihua added the 校对认领 label Feb 24, 2019

lsvih added the 后端 label Feb 24, 2019

fanyijihua added the 正在校对 label Feb 25, 2019

fanyijihua removed the 校对认领 label Feb 25, 2019

HearFishle reviewed Feb 25, 2019

View reviewed changes

第一次校对完成

68aeca2

Fengziyin1234 reviewed Feb 26, 2019

View reviewed changes

leviding added the enhancement 等待译者修改 label Feb 27, 2019

understanding-database-sharding-translated-finished

5b2eb4f

Fengziyin1234 previously approved these changes Feb 28, 2019

View reviewed changes

understanding-database-sharding-translated-finished

dc56d3d

actini dismissed Fengziyin1234’s stale review via dc56d3d February 28, 2019 13:44

Fengziyin1234 added 翻译完成标注待管理员 Review and removed enhancement 等待译者修改正在校对 labels Mar 1, 2019

Update understanding-database-sharding.md

5083e6e

leviding approved these changes Mar 1, 2019

View reviewed changes

leviding merged commit 3b13bb5 into xitu:master Mar 1, 2019

leviding removed the 标注待管理员 Review label Mar 1, 2019

leviding mentioned this pull request Mar 1, 2019

理解数据库分片 #5135

Closed

actini deleted the translate/understanding-database-sharding branch July 27, 2020 13:58

		Sharding can also help to make an application more reliable by mitigating the impact of outages. If your application or website relies on an unsharded database, an outage has the potential to make the entire application unavailable. With a sharded database, though, an outage is likely to affect only a single shard. Even though this might make some parts of the application or website unavailable to some users, the overall impact would still be less than if the entire database crashed.

		数据库分片还能使应用更加稳定，因为它降低了宕机的影响。如果你的应用或网站依赖于一个未经分片的数据库，宕机有可能会使得整个应用都不可用。但假如是分片的数据库，宕机也只会影响其中的一个分片。即使这会使得一些用户无法使用我们的应用或网站，但总的影响仍然会比整个数据库都挂掉要小很多。

		* Setting up a remote database. If you're working with a monolithic application in which all of its components reside on the same server, you can improve your database's performance by moving it over to its own machine. This doesn't add as much complexity as sharding since the database's tables remain intact. However, it still allows you to vertically scale your database apart from the rest of your infrastructure.
		* 使用远程数据库。如果你有一个庞大的应用，其所有的组建都依赖于同一个数据库服务器，你可以考虑将数据库迁移到一台单独的机器上来提高其性能。这不会像数据库分片那样复杂，因为所有的数据库表都还是完整的。并且，这种方式还允许你能够抛开其他基础设施，单独地对数据库做纵向扩展。

		It's relatively simple to have a relational database running on a single machine and scale it up as necessary by upgrading its computing resources. Ultimately, though, any non-distributed database will be limited in terms of storage and compute power, so having the freedom to scale horizontally makes your setup far more flexible.

		我们可以很容易地在某一台机器上运行一个关系型数据库实例，并在必要时对其扩容，只要升级机器的计算资源即可。然而，所有非分布式的数据库最终都会受限于机器的存储容量和计算力，因此能够自由地横向扩展将使你的应用变得更加灵活。

		The main appeal of directory based sharding is its flexibility. Range based sharding architectures limit you to specifying ranges of values, while key based ones limit you to using a fixed hash function which, as mentioned previously, can be exceedingly difficult to change later on. Directory based sharding, on the other hand, allows you to use whatever system or algorithm you want to assign data entries to shards, and it's relatively easy dynamically add shards using this approach.

		基于目录的数据库分片架构最吸引人的地方就是灵活性。基于范围的数据库分片架构要求你必须制定值的区间，而基于键的数据库分片架构则要求你必须使用一个固定的哈希函数，正如前文所说，要修改该函数是非常困难的一件事。然而，基于目录的数据库分片架构允许你使用任何系统、任何算法来指定数据保存的分片位置，而且添加分片也相对简单一些。

		Sharding involves breaking up one's data into two or more smaller chunks, called _logical shards_. The logical shards are then distributed across separate database nodes, referred to as _physical shards_, which can hold multiple logical shards. Despite this, the data held within all the shards collectively represent an entire logical dataset.

		数据库分片包括将数据分成若干个子集，称为逻辑分片，然后逻辑分片分布在不同的数据库节点上，称为物理分片。每个数据库节点能够承载多个逻辑分片。尽管如此，所有这些分片集合中的数据一起表现出来就如同一整个逻辑数据集。

		Database shards exemplify a [_shared-nothing architecture_](https://en.wikipedia.org/wiki/Shared-nothing_architecture). This means that the shards are autonomous; they don't share any of the same data or computing resources. In some cases, though, it may make sense to replicate certain tables into each shard to serve as reference tables. For example, let's say there's a database for an application that depends on fixed conversion rates for weight measurements. By replicating a table containing the necessary conversion rate data into each shard, it would help to ensure that all of the data required for queries is held in every shard.

		数据库分片是[无共享架构](https://en.wikipedia.org/wiki/Shared-nothing_architecture)的一个例证。这意味着分片行为是自治行为；各个分片之间不会共享任何数据或者计算资源。然而在某些情况下，将某些特定的数据表作为引用表复制到各个分片中也非常有用。比如，某个应用依赖固定的换算率在不同的重量单位之间换算。对于它的数据库，我们需要把保存换算率数据的表复制到每个分片中，才能够确保在每个分片中都保存着换算查询所需要的数据。

		While sharding a database can make scaling easier and improve performance, it can also impose certain limitations. Here, we'll discuss some of these and why they might be reasons to avoid sharding altogether.

		尽管对数据库分片能够易于扩展、提高性能，但是它也有一些局限性。我们将在这部分讨论其部分局限性，并解释为什么不应该一股脑儿地都对数据库做分片处理。

		Another major drawback is that once a database has been sharded, it can be very difficult to return it to its unsharded architecture. Any backups of the database made before it was sharded won't include data written since the partitioning. Consequently, rebuilding the original unsharded architecture would require merging the new partitioned data with the old backups or, alternatively, transforming the partitioned DB back into a single DB, both of which would be costly and time consuming endeavors.

		另一个重要的缺点就是，一旦数据库分片完成了，就很难恢复到分片之前的架构了。分片操作之前所做的任何备份都不包含分区之后写入的数据。因此，要重建原始的未分片的数据库架构就必须将新分区的数据和旧的备份数据融合在一起，或者将各个分区的数据库导入到单个数据库中。这两种方式都费事费力，代价不菲。

		A final disadvantage to consider is that sharding isn't natively supported by every database engine. For instance, PostgreSQL does not include automatic sharding as a feature, although it is possible to manually shard a PostgreSQL database. There are a number of Postgres forks that do include automatic sharding, but these often trail behind the latest PostgreSQL release and lack certain other features. Some specialized database technologies — like MySQL Cluster or certain database-as-a-service products like MongoDB Atlas — do include auto-sharding as a feature, but vanilla versions of these database management systems do not. Because of this, sharding often requires a "roll your own" approach. This means that documentation for sharding or tips for troubleshooting problems are often difficult to find.

		最后一个需要考量的缺点，并非所有的数据库引擎都原生地支持分片操作。比如，PostgreSQL 没有自动分片的功能，因此我们只能手动地给 PostgreSQL 数据库分片。虽然有很多 Postgres 的分支版本具备自动分片的功能，但是这些版本的发布通常都晚于最新的 PostgreSQL 版本，并且会缺少某些功能。一些专业的数据库技术 - 比如 MySQL 集群或者某些数据库服务产品（比如 MongoDB Atlas）- 具备自动分片的功能，但是普通版本的数据库关系系统中并不具备这样的功能。正因如此，数据库分片操作通常需要你自己动手，这也意味着很难找到关于数据库分片操作的文档或者排查问题的建议。

		These are, of course, only some general issues to consider before sharding. There may be many more potential drawbacks to sharding a database depending on its use case.

		当然，这些仅仅是数据库分片操作之前需要考虑的一些常见问题。可能还有很多问题，大都依赖于数据库分片的使用场景。

		Once you've decided to shard your database, the next thing you need to figure out is how you'll go about doing so. When running queries or distributing incoming data to sharded tables or databases, it's crucial that it goes to the correct shard. Otherwise, it could result in lost data or painfully slow queries. In this section, we'll go over a few common sharding architectures, each of which uses a slightly different process to distribute data across shards.

		一旦你决定要对数据库进行分片，接下来你要做的就是弄清楚如何去实现它。查询数据的时候或者保存数据到分片的数据库或数据表中的时候，选用正确的分区至关重要。否则，将会造成数据丢失或者令人痛苦的慢查询。在这部分，我们将要介绍一些常用的数据库分片架构，每种架构在不同分片之前保存数据的方式都略有不同。

		_Key based sharding_, also known as _hash based sharding_, involves using a value taken from newly written data — such as a customer's ID number, a client application's IP address, a ZIP code, etc. — and plugging it into a _hash function_ to determine which shard the data should go to. A hash function is a function that takes as input a piece of data (for example, a customer email) and outputs a discrete value, known as a _hash value_. In the case of sharding, the hash value is a shard ID used to determine which shard the incoming data will be stored on. Altogether, the process looks like this:

		基于键的分片也叫基于哈希的分片，使用新写入数据的值 —— 比如客户 ID、客户端 IP、ZIP 码等等 —— 通过哈希函数判断保存的分片位置。哈希函数将输入的数据（如用户的邮件地址）转换成离散数据（也叫做哈希值）输出。在数据库分片中，哈希值将作为数据库分片 ID 将数据保存到对应的分片中。总的来说，整个过程是这样的：

		While key based sharding is a fairly common sharding architecture, it can make things tricky when trying to dynamically add or remove additional servers to a database. As you add servers, each one will need a corresponding hash value and many of your existing entries, if not all of them, will need to be remapped to their new, correct hash value and then migrated to the appropriate server. As you begin rebalancing the data, neither the new nor the old hashing functions will be valid. Consequently, your server won't be able to write any new data during the migration and your application could be subject to downtime.

		尽管基于键的数据库分片是一种非常常见的数据库分片架构方式，在这种架构上动态地增加或移除数据库服务器却是一件困难的事情。当你增加服务器的时候，每个数据库分片的哈希值都需要调整，因此，即使不是全部数据，也会有很多数据需要重新映射到正确的哈希值，然后迁移到正确的服务器。当你对数据进行均衡性调整的时候，新旧哈希函数都将失效。因此，在数据迁移期间，数据库服务器将不能写入任何数据，应用也将暂停服务。

理解数据库分片 #5217

理解数据库分片 #5217

Conversation

actini commented Feb 24, 2019 • edited by lsvih Loading

lsvih commented Feb 24, 2019

actini commented Feb 24, 2019 • edited by Fengziyin1234 Loading

HearFishle commented Feb 25, 2019

fanyijihua commented Feb 25, 2019

Fengziyin1234 commented Feb 25, 2019

Fengziyin1234 commented Feb 25, 2019

fanyijihua commented Feb 25, 2019

actini commented Feb 25, 2019 • edited by Fengziyin1234 Loading

HearFishle commented Feb 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HearFishle commented Feb 25, 2019

actini commented Feb 25, 2019

Fengziyin1234 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leviding commented Feb 27, 2019

Fengziyin1234 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

actini commented Feb 28, 2019

leviding commented Mar 1, 2019

actini commented Mar 1, 2019

actini commented Feb 24, 2019 •

edited by lsvih

Loading

actini commented Feb 24, 2019 •

edited by Fengziyin1234

Loading

actini commented Feb 25, 2019 •

edited by Fengziyin1234

Loading