Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster shard loading #5372

Merged
merged 9 commits into from
Mar 29, 2016
Merged

Faster shard loading #5372

merged 9 commits into from
Mar 29, 2016

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Jan 15, 2016

This PR improves startup times for databases with many shards. The main changes are:

  • Load shards concurrently
  • Avoid querying the TSM index multiple times to get keys and then types for those keys
  • Avoid some unnecessary allocations
  • Move the database in-memory index locking into the index type to allow shards to be loaded concurrently

Should help #5311.

@jwilder
Copy link
Contributor Author

jwilder commented Jan 15, 2016

@dswarbrick @toddboom Would be interested to see if this improves startup times of some larger DBs.

@toddboom
Copy link
Contributor

@jwilder tested this on a smaller database this morning with inconclusive results (there's one large shard that takes all of the time, so parallelization didn't matter), but will test it on the larger datasets this afternoon.

@thedrow
Copy link

thedrow commented Feb 3, 2016

This needs to be rebased FYI.

@jwilder
Copy link
Contributor Author

jwilder commented Feb 3, 2016

Yes. I may need to revert some changes in here as well.

@@ -203,23 +209,35 @@ func (f *FileStore) Remove(paths ...string) {
sort.Sort(tsmReaders(f.files))
}

func (f *FileStore) Keys() []string {
func (f *FileStore) WalkKeys(fn func(key string, typ byte) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exported method needs a comment.

@e-dard
Copy link
Contributor

e-dard commented Mar 23, 2016

LGTM

values = getFloat64Values(nvals)
for i := 0; i < nvals; i++ {
values[i] = &FloatValue{}
}
case integerEntryType:
values = getIntegerValues(nvals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete line:571?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

When loading many shards concurrently they block trying to
acquire a write lock in the sync pool adding a new source of
contention.  Since this code flow always needs to allocate a
buffer it's not really buying us much.
Since loading a shard can allocate a lot of memory, running them all
at once could OOM the process.  This limits the number of shards
loaded to 4.  This will be changed to a config option provided the
approach helps.
Avoids allocating a big map or all keys.
@jwilder jwilder merged commit b3c1320 into master Mar 29, 2016
@jwilder jwilder deleted the jw-shard-load branch March 29, 2016 19:11
@jwilder jwilder mentioned this pull request Mar 31, 2016
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants