error on value counter overflow instead of writing sad segments #9559

clintropolis · 2020-03-25T02:44:05Z

Description

This PR fixes an integer overflow issue if too many values are written to a column within a single segment, preferring to fail at segment creation time instead of write a sad segment with negative values.

This issue was first noticed with multi-value columns, exploding at query time with an error of the form:

org.apache.druid.java.util.common.IAE: Index[-131072] < 0
	at org.apache.druid.segment.data.GenericIndexed.checkIndex(GenericIndexed.java:269)
	at org.apache.druid.segment.data.GenericIndexed.access$300(GenericIndexed.java:79)
	at org.apache.druid.segment.data.GenericIndexed$3.get(GenericIndexed.java:696)
	at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.loadBuffer(CompressedVSizeColumnarIntsSupplier.java:437)
	at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.get(CompressedVSizeColumnarIntsSupplier.java:350)
	at org.apache.druid.segment.data.SliceIndexedInts.get(SliceIndexedInts.java:60)

but would also occur for any serializer given more than Integer.MAX_VALUES rows as input. tl;dr too many values were written to a single segment so the 'offsets' portion of the multi-value column overflowed into negative numbers.

To fix, primitive column serializers now check the number of values (row count in most cases, total number of values for the case of multi-value strings) to ensure that it does not extend beyond the values that will be expressed in the column header and won't cause any issues at query time. A new exception, ColumnCapacityExceededException has been added which will give an error message that suggests

Too many values to store for %s column, try reducing maxRowsPerSegment

where %s is the column name (which all the serializers now know).

I added a bunch of tests to confirm that this works, and also marked them all @Ignore because they take forever to run. The same IAE error can be replicated by running V3CompressedVSizeColumnarMultiIntsSerializerTest.testTooManyValues without the modifications to check that overflow has occurred.

I also added a CompressedDoublesSerdeTest that copies CompressedFloatsSerdeTest since i noticed there wasn't a test for double columns.

Finally, I ran into an issue with IntermediateColumnarLongsSerializer that made it so that I could not test the case when you write too many values to the column, as it must store the entire column on heap while it determines the best encoding, so my attempts to run the test were met with an oom exception. This should probably be fixed, or we should advise against using 'auto' encoding for larger segments, but I did neither in this PR.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

Key changed/added classes in this PR

Primitive column serializers and related tests

suneet-s · 2020-03-25T02:56:36Z

processing/src/main/java/org/apache/druid/segment/data/ColumnCapacityExceededException.java

+  public static String formatMessage(String columnName)
+  {
+    return StringUtils.format(
+        "Too many values to store for %s column, try reducing maxRowsPerSegment",


+1 for this operator friendly error message

suneet-s · 2020-03-26T18:33:37Z

but would also occur for any serializer given more than Integer.MAX_VALUES rows as input.

Can this happen? maxRowsPerSegment is an Integer in the partitionSpec

suneet-s · 2020-03-26T18:53:53Z

processing/src/main/java/org/apache/druid/segment/data/ColumnarDoublesSerializer.java

@@ -29,5 +29,6 @@
 public interface ColumnarDoublesSerializer extends Serializer
 {
  void open() throws IOException;
+  int size();


javadocs please. What does the size indicate? Number of rows in the column? Or space it's taking up. Reading the code it looks like it's the former

aw, I was just making it symmetrical with ColumnarFloatsSerializer and ColumnarLongsSerializer, neither of which have any docs either.

This area of the code in general is sort of barren of javadocs, so how about a bargain: unless there is something else to change on this PR, how about I just do a follow-up that adds javadocs to a bunch of this stuff?

sounds like a good deal to me. I'm almost done reviewing, pressed the wrong button when I made that comment. I don't have anything else yet.

clintropolis · 2020-03-26T19:38:49Z

Can this happen? maxRowsPerSegment is an Integer in the partitionSpec

It shouldn't happen, but nothing in the serializer was ensuring this, so it is a defensive mechanism since this part of the code is rather far away from where maxRowsPerSegment is supposed to be enforced.

suneet-s

VSizeColumnarInts#get does this calculation

@Override
  public int get(int index)
  {
    return buffer.getInt(buffer.position() + (index * numBytes)) >>> bitsToShift;
  }

^ for a large index, index * numBytes can overflow. I haven't looked through all types of ColumnarInts yet, but is there a way to test that they can read from segments with very large offsets?

Maybe generate some with the tests you've already written, and use those to read large offsets?

clintropolis · 2020-03-26T22:34:32Z

VSizeColumnarInts#get does this calculation

Ah, i forgot to mention in the description that I didn't update VSizeColumnarIntsSerializer on purpose since it didn't follow the same pattern as the other implementations, and stores the size in bytes in the header in a checked cast from a long so would fail at indexing time already if it was too big. Also it shouldn't really be used in practice for real segments since it isn't compressed (except maybe for intermediary segments if indexSpecForIntermediatePersists is configured to be uncompressed, but these will be smallish segments). Since it reads directly from a ByteBuffer it is limited to the maximum size of a ByteBuffer.

suneet-s · 2020-03-26T22:44:31Z

@clintropolis Makes sense. I'm just going to read through all the implementations now...

suneet-s

LGTM thanks for the explanations!

jihoonson

LGTM

…he#9559)

[Backport] error on value counter overflow instead of writing sad segments (apache#9559)

… (#9572)

error on value counter overflow instead of writing sad segments

5c3356f

clintropolis added Bug Area - Segment Format and Ser/De labels Mar 25, 2020

suneet-s reviewed Mar 25, 2020

View reviewed changes

himanshug approved these changes Mar 26, 2020

View reviewed changes

suneet-s reviewed Mar 26, 2020

View reviewed changes

suneet-s approved these changes Mar 26, 2020

View reviewed changes

jihoonson approved these changes Mar 26, 2020

View reviewed changes

jihoonson merged commit 2c49f6d into apache:master Mar 26, 2020

clintropolis deleted the too-many-values-in-column branch March 27, 2020 01:15

clintropolis added a commit to clintropolis/druid that referenced this pull request Mar 27, 2020

error on value counter overflow instead of writing sad segments (apac…

a775e58

…he#9559)

clintropolis mentioned this pull request Mar 27, 2020

[Backport] error on value counter overflow instead of writing sad segments #9572

Merged

clintropolis added this to the 0.18.0 milestone Mar 27, 2020

clintropolis added a commit to implydata/druid-public that referenced this pull request Mar 27, 2020

error on value counter overflow instead of writing sad segments (apac…

8a91bd9

…he#9559)

clintropolis mentioned this pull request Mar 27, 2020

[Backport] error on value counter overflow instead of writing sad segments (#9559) implydata/druid-public#75

Merged

clintropolis added a commit to implydata/druid-public that referenced this pull request Mar 27, 2020

Merge pull request #75 from implydata/backport-segment-fix

eb6cc97

[Backport] error on value counter overflow instead of writing sad segments (apache#9559)

clintropolis added a commit that referenced this pull request Mar 27, 2020

error on value counter overflow instead of writing sad segments (#9559)…

0fcdfb0

… (#9572)

jihoonson mentioned this pull request Apr 19, 2020

[Draft] 0.18.0 release notes #9652

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error on value counter overflow instead of writing sad segments #9559

error on value counter overflow instead of writing sad segments #9559

clintropolis commented Mar 25, 2020 •

edited

Loading

suneet-s Mar 25, 2020

suneet-s commented Mar 26, 2020

suneet-s Mar 26, 2020

clintropolis Mar 26, 2020

suneet-s Mar 26, 2020

clintropolis commented Mar 26, 2020

suneet-s left a comment

clintropolis commented Mar 26, 2020

suneet-s commented Mar 26, 2020

suneet-s left a comment

jihoonson left a comment

error on value counter overflow instead of writing sad segments #9559

error on value counter overflow instead of writing sad segments #9559

Conversation

clintropolis commented Mar 25, 2020 • edited Loading

Description

Key changed/added classes in this PR

suneet-s Mar 25, 2020

Choose a reason for hiding this comment

suneet-s commented Mar 26, 2020

suneet-s Mar 26, 2020

Choose a reason for hiding this comment

clintropolis Mar 26, 2020

Choose a reason for hiding this comment

suneet-s Mar 26, 2020

Choose a reason for hiding this comment

clintropolis commented Mar 26, 2020

suneet-s left a comment

Choose a reason for hiding this comment

clintropolis commented Mar 26, 2020

suneet-s commented Mar 26, 2020

suneet-s left a comment

Choose a reason for hiding this comment

jihoonson left a comment

Choose a reason for hiding this comment

clintropolis commented Mar 25, 2020 •

edited

Loading