Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page-based cache recycling. #4559

Closed
wants to merge 1 commit into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Dec 27, 2013

Here is an attempt to refactor cache recycling so that it only caches large
arrays (pages) that can later be used to build more complex data-structures
such as hash tables.

  • QueueRecycler now takes a limit like other non-trivial recyclers.
  • New PageCacheRecycler (inspired of CacheRecycler) has the ability to cache
    byte[], int[], long[], double[] or Object[] arrays using a fixed amount of
    memory (either globally or per-thread depending on the Recycler impl, eg.
    queue is global while thread_local is per-thread).
  • Paged arrays in o.e.common.util can now optionally take a PageCacheRecycler
    to reuse existing pages.
  • All aggregators' data-structures now use PageCacheRecycler:
    • all arrays
    • LongHash can now take a PageCacheRecycler
    • there is a new BytesRefHash (inspired from Lucene but quite different,
      still; for instance it cheats on BytesRef comparisons by using Unsafe)
      that also takes a PageCacheRecycler

Close #4557

}
context.queryResult().aggregations(new InternalAggregations(aggregations));
} finally {
for (Aggregator aggregator : aggregators) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have some utils for Releasable like IOUtils.close(Closeable) that ensures that all of them are released?

@ghost ghost assigned jpountz Dec 31, 2013
@jpountz
Copy link
Contributor Author

jpountz commented Jan 2, 2014

@kimchy @s1monw I pushed a new commit that addresses your concerns:

  • uses componentSettings
  • use of Releasable and Releasable.release(Releasable...)
  • try/finally for release
  • better configuration of the cache size, by default 10% of the heap size

public static ByteSizeValue parseBytesSizeValueOrHeapRatio(String sValue) {
if (sValue.endsWith("%")) {
double percent = Double.parseDouble(sValue.substring(0, sValue.length() - 1));
return new ByteSizeValue((long) ((percent / 100) * JvmInfo.jvmInfo().getMem().getHeapMax().bytes()), ByteSizeUnit.BYTES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting the heap knowledge in here is the wrong place encapsulation wise? Maybe add to Settings a getAsMemory, that will do this?

@jpountz
Copy link
Contributor Author

jpountz commented Jan 3, 2014

New commit pushed:

  • added double-release detection
  • better separation of concerns: parseBytesSizeValueOrHeapRatio moved to MemorySizeValue
  • added comments about the weighting per data type


private static int maximumSearchThreadPoolSize(ThreadPool threadPool) {
final Executor executor = threadPool.executor(ThreadPool.Names.SEARCH);
return ((ThreadPoolExecutor) executor).getMaximumPoolSize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the ThreadPool#info API here, so we don't have to cast to ThreadPoolExecutor, and be able to get the max back through the info class.

@Inject
public MockPageCacheRecycler(Settings settings, ThreadPool threadPool) {
super(settings, threadPool);
random = new Random(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the settings should get a random seed per index maybe I should pass on the node seed from the test cluster as well so you can get it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this here 602c63d

@s1monw
Copy link
Contributor

s1monw commented Jan 3, 2014

added some smallish comments - looks really good though. Lets get this in soon!

Refactor cache recycling so that it only caches large arrays (pages) that can
later be used to build more complex data-structures such as hash tables.

 - QueueRecycler now takes a limit like other non-trivial recyclers.
 - New PageCacheRecycler (inspired of CacheRecycler) has the ability to cache
   byte[], int[], long[], double[] or Object[] arrays using a fixed amount of
   memory (either globally or per-thread depending on the Recycler impl, eg.
   queue is global while thread_local is per-thread).
 - Paged arrays in o.e.common.util can now optionally take a PageCacheRecycler
   to reuse existing pages.
 - All aggregators' data-structures now use PageCacheRecycler:
   - for all arrays (counts, mins, maxes, ...)
   - LongHash can now take a PageCacheRecycler
   - there is a new BytesRefHash (inspired from Lucene but quite different,
     still; for instance it cheats on BytesRef comparisons by using Unsafe)
     that also takes a PageCacheRecycler

Close elastic#4557
@jpountz
Copy link
Contributor Author

jpountz commented Jan 6, 2014

Thanks for the comments, @s1monw. I rebased and did the changes you suggested:

  • new Releasable.release(boolean success, Releasable... releasables) that forwards to release when success is true and releaseWhileHandlingException otherwise
  • fixed style
  • MockPageCacheRecycler gets the seed from TestCluster
  • s/AssertionError/ElasticsearchIllegalStateException/

@s1monw
Copy link
Contributor

s1monw commented Jan 6, 2014

LGTM +1 to push

@jpountz jpountz closed this Jan 6, 2014
@jpountz jpountz deleted the fix/cache_recycling branch January 6, 2014 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving cache recycling
4 participants