Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support auto-compaction with finer granularity #8563

Merged
merged 3 commits into from
Sep 29, 2017

Conversation

fanminshi
Copy link
Member

fixes #8503

@fanminshi fanminshi added the WIP label Sep 14, 2017
@fanminshi
Copy link
Member Author

Some manual test result:

test 5s interval:

$ bin/etcd --auto-compaction-retention 5s
2017-09-14 16:11:32.726296 N | compactor: Starting auto-compaction at revision 1 (retention: 5s )
2017-09-14 16:11:32.729183 I | mvcc: store.index: compact 1
2017-09-14 16:11:32.729237 N | compactor: Finished auto-compaction at revision 1
2017-09-14 16:11:32.729302 I | mvcc: finished scheduled compaction at 1 (took 41.163µs)
2017-09-14 16:11:37.726362 N | compactor: Starting auto-compaction at revision 1 (retention: 5s )
2017-09-14 16:11:37.726519 N | compactor: Finished auto-compaction at revision 1
2017-09-14 16:11:42.726478 N | compactor: Starting auto-compaction at revision 1 (retention: 5s )
2017-09-14 16:11:42.726642 N | compactor: Finished auto-compaction at revision 1

test no compaction:

$ bin/etcd --auto-compaction-retention 0
2017-09-14 16:12:08.019460 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!

test auto-compaction doesn't compact at every 1 time.Duration(1):

$bin/etcd --auto-compaction-retention 1
2017-09-14 16:12:30.743382 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!

@fanminshi
Copy link
Member Author

if the approach seems fine, I'll refactor the tests for compactor as it is currently tightly couple with compactor's implementation.

if err == nil || err == mvcc.ErrCompacted {
plog.Noticef("Finished auto-compaction at revision %d", rev)
} else {
plog.Noticef("Failed auto-compaction at revision %d (%v)", err, rev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, rev, err?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

continue
}

rev, remaining := t.getRev(t.periodInHour)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not necessary to remove this to support the feature and removing it completely changes the behavior of the periodic compactor

plog.Noticef("Failed auto-compaction at revision %d (%v)", err, rev)
plog.Noticef("Retry after %v", checkCompactionInterval)
t.mu.RUnlock()
rev := t.rg.Rev()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong. we should keep the previous logic to compact the revision of the past last x duration.

@fanminshi
Copy link
Member Author

@xiang90 I changed the compactor logic with following:

  1. get store rev now.
  2. wait for retention period to pass.
  3. compact at rev from step 1.
  4. go to step 1.

@xiang90
Copy link
Contributor

xiang90 commented Sep 26, 2017

why do we change the previous logic? i feel all we need to do there is to change the wait time from a fix x hours to some user given timeout, no?

@fanminshi
Copy link
Member Author

fanminshi commented Sep 26, 2017

@xiang90 the previous logic has a checkCompactionInterval that defaults to 5 mins. So the loop runs every 5 minutes to check if it should compact. It works for any wait time > 5 mins but not < 5 mins. therefore, I get rid of it. Having a looping checking every 5 minutes seems unnecessary. All the compactor really care is the rev at time.Now() - retentionPeriod. So I don't see the need to keep tracks revs every 5 minutes and then get the right rev through some math when needs to compact.

@xiang90
Copy link
Contributor

xiang90 commented Sep 26, 2017

@fanminshi

There are a few issues with your approach, which the old approach addressed.

  1. short retry time when error occurs (5 minutes instead of 1hr)
  2. not updating compaction rev when error occurs

I would suggest you understand how the previous code handles failures before simplifying it. The rev recording part can be simplified though.

@fanminshi
Copy link
Member Author

@xiang90 i see. let me rethink about the error handling part.

@fanminshi
Copy link
Member Author

@xiang90 i add fast retry behavior into the code.

@xiang90
Copy link
Contributor

xiang90 commented Sep 26, 2017

this is getting more complicated. i am wondering why not keep the exact previous logic and change the checkInterval to min(user-specified-timeout/10, 1*second) or something.

@fanminshi
Copy link
Member Author

@xiang90 checking every checkCompactionInterval to see if I should compact seems a bit heavy to me. I think there can be a more efficient way. Anyways, I can keep the original approach by changing checkCompactionInterval to a appropriate value.

@xiang90
Copy link
Contributor

xiang90 commented Sep 26, 2017

checking every checkCompactionInterval to see if I should compact seems a bit heavy to me. I think there can be a more efficient way.

99% people will compact > 1 minute interval. that is max 0.16 read request / second, which is almost nothing.

@fanminshi fanminshi force-pushed the make_auto_compaction_granular branch 2 times, most recently from ff350b5 to da7f6d7 Compare September 26, 2017 23:27
@fanminshi
Copy link
Member Author

fanminshi commented Sep 26, 2017

@xiang90 redone the pr with previous approach. PTAL

edit: Thing to notice. the original code compacts every hour after T > retentionPeriod. That must be change for <1 hour compaction retention. So I change the compaction frequency to every checkCompactionInterval when T > retentionPeriod.

@fanminshi fanminshi removed the WIP label Sep 26, 2017
}
}

// N divides Periodic.period in into checkCompactionInterval duration
var N = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not a const? why make this public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably name this to something more descriptive too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'll come up with a better name.

@@ -66,31 +70,27 @@ func (t *Periodic) Run() {
case <-t.ctx.Done():
return
case <-clock.After(checkCompactionInterval):
t.mu.Lock()
t.mu.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change to use rlock here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i noticed that we are just reading t.paused. so I used rlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the critical area is super tight and not accessed frequently. it probably not worths changing it.

if you want to make it more efficient, do it in another PR. Let us keep one PR for one thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright then

@@ -113,8 +113,8 @@ func (t *Periodic) Resume() {
t.paused = false
}

func (t *Periodic) getRev(h int) (int64, []int64) {
i := len(t.revs) - int(time.Duration(h)*time.Hour/checkCompactionInterval)
func (t *Periodic) getRev() (int64, []int64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass in n? probably we do not need the global defined const n.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the n can be used in testing. that's why I had a const n.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. ok. makes sense.

@xiang90
Copy link
Contributor

xiang90 commented Sep 27, 2017

left some comments about minor issues. defer @gyuho or @jpbetz for a final look.

@gyuho
Copy link
Contributor

gyuho commented Sep 27, 2017

@fanminshi Fix Travis CI failures?

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Thanks for handling the backward compatibility of the flags carefully.

Just a couple comments on code structure and documentation.

fs.IntVar(&cfg.AutoCompactionRetention, "auto-compaction-retention", 0, "Auto compaction retention for mvcc key value store. 0 means disable auto compaction.")
fs.StringVar(&cfg.AutoCompactionMode, "auto-compaction-mode", "periodic", "Interpret 'auto-compaction-retention' as hours when 'periodic', as revision numbers when 'revision'.")
fs.StringVar(&cfg.AutoCompactionRetention, "auto-compaction-retention", "0", "Auto compaction retention for mvcc key value store. 0 means disable auto compaction.")
fs.StringVar(&cfg.AutoCompactionMode, "auto-compaction-mode", "periodic", "'periodic' means hours if an integer or a duration string otherwise, 'revision' means revision numbers to retain by auto compaction")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the Interpret 'auto-compaction-retention' as.. preamble helpful. It directly relates the two flags. Keep it? Maybe:

Interpret 'auto-compaction-retention' one of: periodic|revision. 'periodic' for duration based retention, defaulting to hours if no time unit is provided (e.g. '5m'). 'revision' for revision number based retention. ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your suggestion seems clearer. I'll change to that.

@@ -73,24 +77,20 @@ func (t *Periodic) Run() {
continue
}
}

if clock.Now().Sub(last) < executeCompactionInterval {
if clock.Now().Sub(last) < t.period {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify the timer logic a bit here? I found the logic a bit difficult to follow with the case <-clock.After(checkCompactionInterval): select case and the periodDivisor combined together.

Could we instead do something like

select {
...
case <-clock.After(t.period - clock.Now().Sub(last)):
  ...

and then eliminate this if clock.Now().Sub(last) < t.period { check and the periodDivisor?

(let me know if I'm missing an important subtlety to how the timers are being use here...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz I am trying to minimize any change. we can refactor the code in a future pr if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -73,24 +77,20 @@ func (t *Periodic) Run() {
continue
}
}

if clock.Now().Sub(last) < executeCompactionInterval {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're improving this function. Can move the remaining code in the for loop into the case <-clock.After for readability? It's only ever run as part of that case...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refactor the code in a future pr?

@fanminshi
Copy link
Member Author

made the auto-compaction-mode more clear.

@xiang90
Copy link
Contributor

xiang90 commented Sep 28, 2017

lgtm. defer to @gyuho

@jpbetz let's improve the code in the next pr.

@@ -29,8 +29,7 @@ var (
)

const (
checkCompactionInterval = 5 * time.Minute
executeCompactionInterval = time.Hour
checkCompactionInterval = 5 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this const can also be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkCompactionInterval is used Revision compactor.
https://github.com/coreos/etcd/blob/master/compactor/revision.go#L64
I don't think it should be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

if rev < 0 {
continue
}

plog.Noticef("Starting auto-compaction at revision %d (retention: %d hours)", rev, t.periodInHour)
plog.Noticef("Starting auto-compaction at revision %d (retention: %v )", rev, t.period)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/(retention: %v )/(retention: %v)/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

etcdmain/help.go Outdated
@@ -99,7 +99,7 @@ clustering flags:
--auto-compaction-retention '0'
auto compaction retention length. 0 means disable auto compaction.
--auto-compaction-mode 'periodic'
'periodic' means hours, 'revision' means revision numbers to retain by auto compaction
Interpret 'auto-compaction-retention' one of: periodic|revision. 'periodic' for duration based retention, defaulting to hours if no time unit is provided (e.g. '5m'). 'revision' for revision number based retention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpret to make casing consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

h int
)
// AutoCompactionRetention defaults to "0" if not set.
if len(cfg.AutoCompactionRetention) == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a check see if AutoCompactionRetention is not set.

@fanminshi
Copy link
Member Author

merge when green.

@fanminshi fanminshi merged commit bcef78c into etcd-io:master Sep 29, 2017
_, err := t.c.Compact(t.ctx, &pb.CompactionRequest{Revision: rev})
if err == nil || err == mvcc.ErrCompacted {
t.revs = remaining
last = clock.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we delete this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good a catch. I don't think this is meant to be deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it actually causes a bug as #9443

Copy link
Member Author

@fanminshi fanminshi Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I dig into on why last = clock.Now() is deleted again. It is explained:

I change the compaction frequency to every checkCompactionInterval when T > retentionPeriod.
#8563 (comment)

I think that's cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

support <1hr automatic compaction retention
6 participants