Make BytesStreamOutput more efficient #5318

hhoffstaette · 2014-03-03T14:58:03Z

Makes BytesStreamOutput more efficient by introducing internal paging via a (currently private) non-recycling PageCacheRecycler. This change brought to light a bug in FsTranslog, which used seek() incorrectly/ambiguously; this is fixed so that it works with both the old and new implementation.

Fix for bug #5159

via a (currently private) non-recycling PageCacheRecycler. This change brought to light a bug in FsTranslog, which used seek() incorrectly/ambiguously; this is fixed so that it works with both the old and new implementation. Fix for bug #5159

jpountz · 2014-03-03T16:41:56Z

src/main/java/org/elasticsearch/common/io/stream/BytesStreamOutput.java

+        // acquire all other requested pages up front if expectedSize > pageSize
+        if (expectedSize > pageSize) {
+            ensureCapacity(expectedSize);
+        }


I'm wondering if there is a benefit in pre-allocating those pages compared to requesting them when necessary?

We are only preallocating (pre-requesting) when the initial capacity is given by the client. By default we only acquire a single page to get started. Strictly speaking it's not necessary, but clients who announce a capacity goal are expected to use it. Also, once we actually start using a real recycler, this will reduce contention (all pages for one acquire).

jpountz · 2014-03-03T16:57:34Z

Part of the code looks very similar to what we have in BigByteArray so I'm wondering if we could somehow share some code between these classes?

jpountz · 2014-03-03T16:59:15Z

src/main/java/org/elasticsearch/common/io/stream/BytesStreamOutput.java

+        byte[] page = pages.get(count / pageSize).v();
+        int offset = count % pageSize;
+        page[offset] = b;
+        count++;
    }

    @Override
    public void writeBytes(byte[] b, int offset, int length) throws IOException {


When doing many small writes, it may help to keep a reference on the current page to not have to recompute it every time?

That's a good point - for now I was more concerned about getting it right every time, but adding a currentPage pointer should not be too difficult. Since we will likely revisit this code soon I'd like to keep this as it is for now.

jpountz · 2014-03-03T17:25:42Z

I just discussed with @kimchy about this change and it might make sense to use BigArrays internally in order to implement paging, this should make things easier to implement as it already has the logic to do writes that may span across pages, and there is a BigArrays.NON_RECYCLING_INSTANCE static variable that would remove the need for NonePageCacheRecyclerService. @hhoffstaette what do you think?

A new metric aggregation that can compute approximate values of arbitrary percentiles. Close #5323

This commit fixes ensures that for external builds (e.g. plugin development) that the REST tests that are copied are properly filtered to only include the API by default. The code prior to this change resulted in including both the API and tests since the copy.include resulted as an empty list by default since the stream is empty unless explicitly configured. related #52114 fixes #53183

hhoffstaette added 2 commits March 3, 2014 15:55

Minor style fixes, update BlobStoreRepository similar to FsTranslog.

d5278df

jpountz reviewed Mar 3, 2014
View reviewed changes

polyfractal and others added 4 commits March 4, 2014 12:03

$@polyfractal$

Percentiles aggregation.

8bf1cad

A new metric aggregation that can compute approximate values of arbitrary percentiles. Close #5323

Fix test bug: a too low compression level can make accuracy terrible.

a0935bc

[DOCS] Java API JSON typo

778ddb2

Rwrite on top of BigArrays/ByteArray. Delete NonePageCacheRecycler.

4c8b076

hhoffstaette closed this Mar 4, 2014

hhoffstaette deleted the bytestream branch March 4, 2014 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make BytesStreamOutput more efficient #5318

Make BytesStreamOutput more efficient #5318

hhoffstaette commented Mar 3, 2014

jpountz Mar 3, 2014

hhoffstaette Mar 3, 2014

jpountz commented Mar 3, 2014

jpountz Mar 3, 2014

hhoffstaette Mar 3, 2014

jpountz commented Mar 3, 2014

Make BytesStreamOutput more efficient #5318

Make BytesStreamOutput more efficient #5318

Conversation

hhoffstaette commented Mar 3, 2014

jpountz Mar 3, 2014

Choose a reason for hiding this comment

hhoffstaette Mar 3, 2014

Choose a reason for hiding this comment

jpountz commented Mar 3, 2014

jpountz Mar 3, 2014

Choose a reason for hiding this comment

hhoffstaette Mar 3, 2014

Choose a reason for hiding this comment

jpountz commented Mar 3, 2014