New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use segmented Slice in SliceDictionaryWriter #15956

Merged

highker merged 1 commit into prestodb:master from arunthirupathi:segmented_slice_builder_2

Apr 26, 2021

arunthirupathi commented Apr 17, 2021 •

edited

Loading

Store elements of dictionary in Segmented Slices, instead of
one contiguous segment. When the number of elements
in the dictionary is less than 100,000 there is no noticeable
performance degradation. When the number of elements in the
dictionary reaches 10,000,000 sorting/comparing the element
needs to compute segment/offset which makes it worse by 10%.
But this is an unlikely case.

Test plan -
Added new test cases for the SegmentedSlices.
Dictionary is covered by existing tests.

== RELEASE NOTES ==

General Changes
* Store dictionary elements in Segmented Slice.

arunthirupathi added the draft label

arunthirupathi marked this pull request as draft

April 17, 2021 07:22

arunthirupathi removed the draft label

arunthirupathi force-pushed the segmented_slice_builder_2 branch from dac8e5c to 668c781 Compare

April 17, 2021 18:23

arunthirupathi changed the title ~~Segmented Slice builder~~ Use segmented Slice in SliceDictionaryWriter

arunthirupathi force-pushed the segmented_slice_builder_2 branch from 668c781 to 1bbc8e0 Compare

April 17, 2021 18:41

arunthirupathi requested review from sdruzkin, highker and bhhari

April 17, 2021 18:43

arunthirupathi marked this pull request as ready for review

April 17, 2021 18:43

arunthirupathi force-pushed the segmented_slice_builder_2 branch from 1bbc8e0 to 4762596 Compare

April 17, 2021 22:10

Author

arunthirupathi commented Apr 18, 2021 •

edited

Loading

Here is the performance comparison before and after this change. Base is before this change. Segmented is with this change. Note Direct in base and direct in segmented has no change between them and they should be 100% and the result is close enough.

MIN of Score		Benchmark
(typeSignature)	(uniqueValuesPercentage)	Direct.Segmented	Direct.Base	DictionaryToDirect.Segmented	DictionaryToDirect.Base	Dictionary.Segmented	Dictionary.Base
varchar	1	100.50%	100.00%	316.04%	314.30%	288.96%	280.23%
	5	100.00%	113.50%	424.47%	371.08%	432.08%	428.40%
	10	100.00%	105.99%	537.32%	518.47%	702.22%	711.15%
	100	100.00%	106.36%	1169.77%	1293.29%	5512.29%	4659.38%

arunthirupathi force-pushed the segmented_slice_builder_2 branch from 4762596 to 942e3ca Compare

April 20, 2021 00:30

highker reviewed

View reviewed changes

Contributor

highker left a comment

mostly nits

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

presto-orc/src/test/java/com/facebook/presto/orc/writer/TestSegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

presto-orc/src/test/java/com/facebook/presto/orc/writer/TestSegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Outdated

Comment on lines 294 to 291

    
                  public BlockBuilder newBlockBuilderLike(BlockBuilderStatus blockBuilderStatus, int expectedEntries)

                  {

                      if (blockBuilderStatus != null) {

                          throw new UnsupportedOperationException("Not yet implemented");

                      }

Contributor

highker Apr 22, 2021

blockBuilderStatus is actually fairly important. QQ: Is newBlockBuilderLike used anywhere in orc package? If not, just throw?

Author

arunthirupathi Apr 23, 2021

This blockBuilder is only used by the SliceDictionaryBuilder and it passes in null for the blockBuilderStatus.

https://github.com/prestodb/presto/blob/master/presto-orc/src/main/java/com/facebook/presto/orc/writer/SliceDictionaryBuilder.java#L51

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Show resolved Hide resolved

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Outdated

    
                  private final DynamicSliceOutput openSliceOutput;

                  private int openSegment;

Contributor

highker Apr 22, 2021

s/openSegment/openSegmentIndex, if I read the code correctly

Author

arunthirupathi Apr 23, 2021

You are right, renamed it.

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java

    
                      offsets[openSegment][openSegmentOffset] = openSliceOutput.size();

                      if (openSegmentOffset == SegmentHelper.SEGMENT_SIZE) {

                          // Add the current finalized slice to closedSlices

                          Slice slice = openSliceOutput.copySlice();

Contributor

highker Apr 22, 2021

We do a copy to save space? We usually don't call this method to avoid heavy GC. Maybe slice() is good enough.

Author

arunthirupathi Apr 23, 2021

Slice() gives the view of the object. After the segment is full, the bytes[] are copied to the copySllice and the dynamicSlliceOuptut is reset and reused for the new segment.

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

arunthirupathi force-pushed the segmented_slice_builder_2 branch from 942e3ca to 8c884ea Compare

April 23, 2021 05:47

arunthirupathi requested a review from highker

April 23, 2021 05:52

arunthirupathi force-pushed the segmented_slice_builder_2 branch 2 times, most recently from 21cd7b1 to 9bf70ba Compare

April 23, 2021 23:17

highker approved these changes

View reviewed changes

Contributor

highker left a comment

nits only

presto-orc/src/main/java/com/facebook/presto/orc/writer/SegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

presto-orc/src/test/java/com/facebook/presto/orc/writer/TestSegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

presto-orc/src/test/java/com/facebook/presto/orc/writer/TestSegmentedSliceBlockBuilder.java Outdated Show resolved Hide resolved

highker self-assigned this


          Use segmented Slice in SliceDictionaryWriter

16e9b84

Store elements of dictionary in Segmented Slices, instead of
one contiguous segment. When the number of elements
in the dictionary is less than 100,000 there is no noticeable
performance degradation. When the number of elements in the
dictionary reaches 10,000,000 sorting/comparing the element
needs to compute segment/offset which makes it worse by 10%.
But this is an unlikely case.

arunthirupathi force-pushed the segmented_slice_builder_2 branch from 9bf70ba to 16e9b84 Compare

April 26, 2021 01:43

arunthirupathi requested a review from highker

April 26, 2021 01:44

highker merged commit 53ebcda into prestodb:master

vaishnavibatni mentioned this pull request

Add release notes for 0.252 #16013

Merged

3 tasks

arhimondr mentioned this pull request

Revert "Use segmented Slice in SliceDictionaryWriter" #16173

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet