Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: alloc timestamp after recover #1173

Merged
merged 9 commits into from
Aug 9, 2018
Merged

Conversation

rleungx
Copy link
Member

@rleungx rleungx commented Aug 1, 2018

What have you changed? (required)

This PR allows allocating timestamp after recovering from the incorrect system time.

What are the type of the changes (required)?

  • Improvement (non-breaking change which is an improvement to an existing feature)

How has this PR been tested (required)?

  • Manual tests

Refer to a related PR or issue link (optional)

#499, #940, pingcap/tidb#5709.

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

@rleungx rleungx changed the title alloc timestamp after recover simulator: alloc timestamp after recover Aug 1, 2018
@rleungx rleungx added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2018
@rleungx rleungx changed the title simulator: alloc timestamp after recover server: alloc timestamp after recover Aug 1, 2018
server/tso.go Outdated
if subTimeByWallClock(last, now)+updateTimestampGuard > 0 {
log.Errorf("system time may be incorrect, last: %v, now: %v", last, now)
s.ts.Store(&atomicObject{physical: last})
return nil
Copy link
Contributor

@disksing disksing Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to update now and save it to etcd.

server/tso.go Outdated
continue
// If the time window is enough, it will increase the physical time,
// otherwise, it will wait for updating the time window.
if subTimeByWallClock(s.lastSavedTime, current.physical) > updateTimestampGuard {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lastSavedTime is accessed concurrently without any mutex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.

server/tso.go Outdated
log.Debugf("save timestamp ok: prev %v last %v save %v", prev, last, save)
// If the time window is no more than updateTimestampGuard,
// it will adjust the time window.
if subTimeByWallClock(saved, now) <= updateTimestampGuard {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better also validate the TsoSaveInterval large than updateTimestampGuard in config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not in the same order of magnitude, and thus it doesn't make much sense. What do you think?

Copy link
Contributor

@nolouch nolouch Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, It's ok. I before thought the updateTimestampGuard was 1s.

server/tso.go Outdated
}
// Since the physical timestamp is greater than the current system time,
// the physical timestamp will be used to alloc timestamp temporarily.
current.physical = prev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logical part will clean to 0, but you do not increase the physical part.

@nolouch
Copy link
Contributor

nolouch commented Aug 2, 2018

PTAL @liukun4515

@rleungx rleungx removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2018
server/tso.go Outdated
physical: current.physical.Add(time.Millisecond),
logical: 0,
}
s.ts.Store(last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

miss a continue?

@iamxy
Copy link
Contributor

iamxy commented Aug 2, 2018

/rebuild

@rleungx rleungx added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2018
@disksing
Copy link
Contributor

disksing commented Aug 2, 2018

I think we can refactor updateTimestamp() like this:

func updateTimestamp() {
    var next int64                             // step1. determine next physical timestamp
    if now > prev.physical + 1ms {            // a. if system is greater, sync with system time
        next = now
    }
    else if prev.logical > maxLogical/2 {          // b. if logical is about to use up, increase physical by 1ms
        next = prev.physical+1ms
    } else {                                      // otherwise the physical does not need update
        return
    }

    if saved - next <= 1ms {                  // step2. determine if we need to save ts to etcd
        saveTimestamp(next + 3s)          // when it is unsafe to increase physical to `next`, we save etcd so the saved will increase by 3s
    }

    s.ts.Store({next, 0})        // step3. update physical
    return nil
}

@rleungx rleungx force-pushed the tso-no-wait branch 4 times, most recently from 154689e to 2daa77f Compare August 2, 2018 13:45
@rleungx rleungx removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2018
@disksing
Copy link
Contributor

disksing commented Aug 3, 2018

I think we need more detailed comments.

server/tso.go Outdated
next = now
} else if atomic.LoadInt64(&prev.logical) > maxLogical/2 {
next = prev.physical.Add(time.Millisecond)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsoCounter.WithLabelValues("skip_save").Inc() before return nil

server/tso.go Outdated
if jetLag > updateTimestampGuard {
next = now
} else if atomic.LoadInt64(&prev.logical) > maxLogical/2 {
next = prev.physical.Add(time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add warning log here.

server/tso.go Outdated
// the timestamp allocation will start from the saved etcd timestamp temporarily.
if subTimeByWallClock(next, last) < updateTimestampGuard {
log.Errorf("system time may be incorrect: last: %v next %v", last, next)
next = next.Add(updateTimestampGuard)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next = last.Add(updateTimestampGuard)?

@disksing
Copy link
Contributor

disksing commented Aug 3, 2018

Please review this PR carefully. /cc @nolouch @siddontang @liukun4515

@rleungx rleungx force-pushed the tso-no-wait branch 5 times, most recently from 14ff8f2 to 4e9fd5b Compare August 3, 2018 02:52
@rleungx
Copy link
Member Author

rleungx commented Aug 3, 2018

@disksing PTAL.

//
// Here is some constraints that this function must satisfy:
// 1. The physical time is monotonically increasing.
// 2. The saved time is monotonically increasing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the physical should always less than saved timestamp.

Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siddontang
Copy link
Contributor

please construct a test for the time fallback.

@rleungx rleungx force-pushed the tso-no-wait branch 3 times, most recently from 3c15dc1 to 8ae950c Compare August 6, 2018 06:18
Copy link
Contributor

@siddontang siddontang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@siddontang
Copy link
Contributor

why add DNM label again? @rleungx

@rleungx
Copy link
Member Author

rleungx commented Aug 8, 2018

There is a concurrency problem in the test. @siddontang

@rleungx
Copy link
Member Author

rleungx commented Aug 9, 2018

The test will be added in another PR. I will create it once this PR is merged.

@rleungx rleungx removed the DNM label Aug 9, 2018
@disksing disksing merged commit aea8e1c into tikv:master Aug 9, 2018
nolouch pushed a commit to nolouch/pd that referenced this pull request Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants