Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move caching of the size of a directory to StoreDirectory. #30581

Merged
merged 7 commits into from
Jun 5, 2018

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented May 14, 2018

In spite of the existing caching, I have seen a number of nodes hot threads
where one thread had been spending all its cpu on computing the size of a
directory. I am proposing to move the computation of the size of the directory
to StoreDirectory in order to skip recomputing the size of the directory if
no changes have been made. This should help with users that have read-only
indices, which is very common for time-based indices.

In spite of the existing caching, I have seen a number of nodes hot threads
where one thread had been spending all its cpu on computing the size of a
directory. I am proposing to move the computation of the size of the directory
to `StoreDirectory` in order to skip recomputing the size of the directory if
no changes have been made. This should help with users that have read-only
indices, which is very common for time-based indices.
@jpountz jpountz added >enhancement :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. labels May 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jpountz jpountz requested a review from s1monw May 18, 2018 13:28
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some issues with this change:

  • we introduced a time-based cache since some stats-endpoint hammered this which caused slowdowns
  • the stats didn't need to be accurate

this change will bring back the problems we tried to solve since it will re-calculate once stuff has changed. I agree that it is only on live shards but still. I think we should stick with the time-based model but also have some indicator to skip-recalculation once we have an accurate number. ie. can we have some mod-count on createOutput and friends as well as once the stream is closed. and on deleteFile that way we can skip the re-calculation on non-live shards and are still protected from hammering this output.

The other option would be to move the cache out of Store entirely and cache on the IndexShard level. That way we can skip recalculation if the shard is idle. that would make most sense to me. WDYT?

@jpountz
Copy link
Contributor Author

jpountz commented May 22, 2018

@s1monw This PR actually implements your first suggestion already: it has kept the time-based model via SingleObjectCache and only allows to skip even more store size computations on read-only shards by checking the mod count. So basicaly, the store size is only recomputed if the content of the directory has changed AND if more than refresh_interval millis elapsed.

I think the store is a better place for this than IndexShard since we care about inactivity in terms of written bytes as opposed to indexed documents, and background merges could cause variations of the store size on a shard that hasn't indexed any new documents?

@s1monw
Copy link
Contributor

s1monw commented May 22, 2018

I think the store is a better place for this than IndexShard since we care about inactivity in terms of written bytes as opposed to indexed documents, and background merges could cause variations of the store size on a shard that hasn't indexed any new documents?

I still think the complexity here is very high for a minimal impact can we try to make this simpler?

@jpountz
Copy link
Contributor Author

jpountz commented May 23, 2018

Why do you think the impact will be minimal? My reasoning was that if you have daily indices with a 30 days retention policy, 5 shards for new indices which are shrunk to 1 shard when they become read-only. Then you would have 5 read-write shards and 29 read-only shards, so such a change would reduce the number of reads of file attributes by about 7x since we would only need to compute the store size of 5 shards every 10 seconds instead of 34.

@ywelsch
Copy link
Contributor

ywelsch commented May 23, 2018

I also believe the impact to be non-negligible here, especially for cases where there are many (inactive) shards on a node. Regarding code complexity, I think it's possible to detach the modification counting directory from the cache, see https://github.com/elastic/elasticsearch/compare/master...ywelsch:mod-count-directory?expand=1 as an alternative. Maybe you like this better, @s1monw?

@s1monw
Copy link
Contributor

s1monw commented May 23, 2018

Why do you think the impact will be minimal? My reasoning was that if you have daily indices with a 30 days retention policy, 5 shards for new indices which are shrunk to 1 shard when they become read-only. Then you would have 5 read-write shards and 29 read-only shards, so such a change would reduce the number of reads of file attributes by about 7x since we would only need to compute the store size of 5 shards every 10 seconds instead of 34.

sorry I mean code-impact. I totally see the point of your change. I should have been more clear. What I had in mind is something that doesn't increment a counter on every little byte write. The granularity is too much for he benefit. I would rather want to implement this on a Shard level and ask the engine if there are any pending merges or in-flight merges are happening ie.

if (isActive.get() || engine.hasRunningOrPendingMerges()) {
  doRefresh();
}

Also these stats are shard level stats I think that is the right place for it?

@jpountz
Copy link
Contributor Author

jpountz commented May 23, 2018

@s1monw Would you rather like something along the lines of #30817?

@jpountz
Copy link
Contributor Author

jpountz commented May 24, 2018

After further discussion with @s1monw I tried to move this option forward, but to remove the write to an AtomicLong in every writeByte(s) call. @s1monw Can you take another look?

@jpountz jpountz requested a review from s1monw May 29, 2018 16:42
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s1monw
Copy link
Contributor

s1monw commented Jun 4, 2018

@jpountz wanna push this?

@jpountz jpountz merged commit 03dcf22 into elastic:master Jun 5, 2018
@jpountz jpountz deleted the enhancement/cache_file_size branch June 5, 2018 07:01
martijnvg added a commit that referenced this pull request Jun 5, 2018
* es/master:
  Take into account the return value of TcpTransport.readMessageLength(...) in Netty4SizeHeaderFrameDecoder
  Move caching of the size of a directory to `StoreDirectory`. (#30581)
  Clarify docs about boolean operator precedence. (#30808)
  Docs: remove notes on sparsity. (#30905)
  Fix MatchPhrasePrefixQueryBuilderTests#testPhraseOnFieldWithNoTerms
  run overflow forecast a 2nd time as regression test for elastic/ml-cpp#110 (#30969)
  Improve documentation of dynamic mappings. (#30952)
  Decouple MultiValueMode. (#31075)
  Docs: Clarify constraints on scripted similarities. (#31076)
  Update get.asciidoc (#31084)
jpountz added a commit that referenced this pull request Jun 5, 2018
In spite of the existing caching, I have seen a number of nodes hot threads
where one thread had been spending all its cpu on computing the size of a
directory. I am proposing to move the computation of the size of the directory
to `StoreDirectory` in order to skip recomputing the size of the directory if
no changes have been made. This should help with users that have read-only
indices, which is very common for time-based indices.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 5, 2018
* elastic/master:
  [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient
  TEST:  Retry synced-flush if ongoing ops on primary (elastic#30978)
  Fix docs build.
  Only auto-update license signature if all nodes ready (elastic#30859)
  Add BlobContainer.writeBlobAtomic() (elastic#30902)
  Add a doc value format to binary fields. (elastic#30860)
  Take into account the return value of TcpTransport.readMessageLength(...) in Netty4SizeHeaderFrameDecoder
  Move caching of the size of a directory to `StoreDirectory`. (elastic#30581)
  Clarify docs about boolean operator precedence. (elastic#30808)
  Docs: remove notes on sparsity. (elastic#30905)
  Fix MatchPhrasePrefixQueryBuilderTests#testPhraseOnFieldWithNoTerms
  run overflow forecast a 2nd time as regression test for elastic/ml-cpp#110 (elastic#30969)
  Improve documentation of dynamic mappings. (elastic#30952)
  Decouple MultiValueMode. (elastic#31075)
  Docs: Clarify constraints on scripted similarities. (elastic#31076)
dnhatn added a commit that referenced this pull request Jun 5, 2018
* 6.x:
  Share common readFrom/writeTo code in AcknowledgeResponse (#30983)
  [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient
  Fix rest test skip version
  Fix docs build.
  Add a doc value format to binary fields. (#30860)
  Only auto-update license signature if all nodes ready (#30859)
  Add BlobContainer.writeBlobAtomic() (#30902)
  Move caching of the size of a directory to `StoreDirectory`. (#30581)
  Clarify docs about boolean operator precedence. (#30808)
  Docs: remove notes on sparsity. (#30905)
  Improve documentation of dynamic mappings. (#30952)
  Decouple MultiValueMode. (#31075)
  Docs: Clarify constraints on scripted similarities. (#31076)
@jpountz
Copy link
Contributor Author

jpountz commented Jun 11, 2018

For the record, this change doesn't look like it made indexing slower: https://elasticsearch-benchmarks.elastic.co/index.html#tracks/nyc-taxis/nightly/30d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >enhancement v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants