Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

replica_server: reimplement uniq_timestamp generator #8

Merged
merged 2 commits into from
Apr 13, 2018

Conversation

shengofsun
Copy link
Contributor

Reimplement this for 2 reasons:

  1. All threads shared a lock in the old implementation, which was
    not friendly to performance. As a matter of fact, it's not necessary
    for different replicas to keep an global increasing timestamp, so we
    can try this optimization

  2. Although the timestamp was replicated to secondaries from primary,
    timestamp value of secondaries never updated accordingly, for which reason
    we were exposed to the risks that a newer mutation may had smaller timestamp
    if primary switched.

Reimplement this for 2 reasons:

1. All threads shared a lock in the old implementation, which was
not friendly to performance. As a matter of fact, it's not necessary
for different replicas to keep an global increasing timestamp, so we
can try this optimization

2. Although the timestamp was replicated to secondaries from primary,
timestamp value of secondaries never updated accordingly, for which reason
we were exposed to the risks that a newer mutation may had smaller timestamp
if primary switched.
//
class uniq_timestamp_us {
private:
uint64_t last_ts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

变量名以下划线开头

public:
uniq_timestamp_us() { last_ts = dsn_now_us(); }

void try_update(uint64_t new_ts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加个注释更清晰:

// Update the local timestamp (being a Secondary) to ensure 
// when it's elected as primary, the timestamp is monotonically
// increasing.

@shengofsun
Copy link
Contributor Author

shengofsun commented Apr 12, 2018

仔细想了下这个递增的时间戳和真实的时间戳有什么关系:

  1. 肯定是一个不小于机器真实时间的时间戳,有两层含义:
    (1) 如果有两个mutation挨的非常紧,保证了他们的时间戳是不一样的。
    (2)如果发生了机器时钟的跳变,尤其是回退,保证我们的时间戳不会回退。

  2. 不停和dsn_now_us对比而取最大者,保证了时间戳能较为真实的反应写事件的发生时间。这和"decree"这种逻辑时钟是迥然不同的,这也是跨集群时key具有可比性的依据。

另外,和@neverchanje 讨论时,谈到secondary能否在on_prepare时无脑update primary发过来的timestamp,想了下貌似不太好。因为这个secondary可能是一个旧primary降级下来的,所以本地的timestamp可能要比对方发过来的要大。

目前这种实现中导致时间戳回跳的主要可能是:选出一个新primary时, 其mutation log全删除,且物理时钟也回跳了。

@neverchanje
Copy link
Contributor

准确说是要 三副本 之间保证 共同的 timestamp 递增关系,所以要求 secondary 的 timestamp 也要单调递增,不能减少

@qinzuoyan
Copy link
Member

qinzuoyan commented Apr 13, 2018

其实就是在复制状态机模型下,操作序列中每个操作的timestamp要保证严格递增的偏序关系(与decree具有相同的偏序性质),这样对于同一个key,后面操作的timestamp总是保证大于其前面操作的timestamp,避免后写入的数据无法生效的情况。
但是,当前的方案还不能完全保证这一点,因此不是一个完美的解决方案。但是由于去掉了锁,所以性能上会有优化,可以暂时采用该方案,后面再改进。

Copy link
Member

@qinzuoyan qinzuoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants