Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error on value counter overflow instead of writing sad segments #9559

Merged
merged 1 commit into from
Mar 26, 2020

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Mar 25, 2020

Description

This PR fixes an integer overflow issue if too many values are written to a column within a single segment, preferring to fail at segment creation time instead of write a sad segment with negative values.

This issue was first noticed with multi-value columns, exploding at query time with an error of the form:

org.apache.druid.java.util.common.IAE: Index[-131072] < 0
	at org.apache.druid.segment.data.GenericIndexed.checkIndex(GenericIndexed.java:269)
	at org.apache.druid.segment.data.GenericIndexed.access$300(GenericIndexed.java:79)
	at org.apache.druid.segment.data.GenericIndexed$3.get(GenericIndexed.java:696)
	at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.loadBuffer(CompressedVSizeColumnarIntsSupplier.java:437)
	at org.apache.druid.segment.data.CompressedVSizeColumnarIntsSupplier$CompressedVSizeColumnarInts.get(CompressedVSizeColumnarIntsSupplier.java:350)
	at org.apache.druid.segment.data.SliceIndexedInts.get(SliceIndexedInts.java:60)

but would also occur for any serializer given more than Integer.MAX_VALUES rows as input. tl;dr too many values were written to a single segment so the 'offsets' portion of the multi-value column overflowed into negative numbers.

To fix, primitive column serializers now check the number of values (row count in most cases, total number of values for the case of multi-value strings) to ensure that it does not extend beyond the values that will be expressed in the column header and won't cause any issues at query time. A new exception, ColumnCapacityExceededException has been added which will give an error message that suggests

Too many values to store for %s column, try reducing maxRowsPerSegment

where %s is the column name (which all the serializers now know).

I added a bunch of tests to confirm that this works, and also marked them all @Ignore because they take forever to run. The same IAE error can be replicated by running V3CompressedVSizeColumnarMultiIntsSerializerTest.testTooManyValues without the modifications to check that overflow has occurred.

I also added a CompressedDoublesSerdeTest that copies CompressedFloatsSerdeTest since i noticed there wasn't a test for double columns.

Finally, I ran into an issue with IntermediateColumnarLongsSerializer that made it so that I could not test the case when you write too many values to the column, as it must store the entire column on heap while it determines the best encoding, so my attempts to run the test were met with an oom exception. This should probably be fixed, or we should advise against using 'auto' encoding for larger segments, but I did neither in this PR.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • Primitive column serializers and related tests

public static String formatMessage(String columnName)
{
return StringUtils.format(
"Too many values to store for %s column, try reducing maxRowsPerSegment",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this operator friendly error message

@suneet-s
Copy link
Contributor

but would also occur for any serializer given more than Integer.MAX_VALUES rows as input.

Can this happen? maxRowsPerSegment is an Integer in the partitionSpec

@@ -29,5 +29,6 @@
public interface ColumnarDoublesSerializer extends Serializer
{
void open() throws IOException;
int size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs please. What does the size indicate? Number of rows in the column? Or space it's taking up. Reading the code it looks like it's the former

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aw, I was just making it symmetrical with ColumnarFloatsSerializer and ColumnarLongsSerializer, neither of which have any docs either.

This area of the code in general is sort of barren of javadocs, so how about a bargain: unless there is something else to change on this PR, how about I just do a follow-up that adds javadocs to a bunch of this stuff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good deal to me. I'm almost done reviewing, pressed the wrong button when I made that comment. I don't have anything else yet.

@clintropolis
Copy link
Member Author

Can this happen? maxRowsPerSegment is an Integer in the partitionSpec

It shouldn't happen, but nothing in the serializer was ensuring this, so it is a defensive mechanism since this part of the code is rather far away from where maxRowsPerSegment is supposed to be enforced.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VSizeColumnarInts#get does this calculation

@Override
  public int get(int index)
  {
    return buffer.getInt(buffer.position() + (index * numBytes)) >>> bitsToShift;
  }

^ for a large index, index * numBytes can overflow. I haven't looked through all types of ColumnarInts yet, but is there a way to test that they can read from segments with very large offsets?

Maybe generate some with the tests you've already written, and use those to read large offsets?

@clintropolis
Copy link
Member Author

VSizeColumnarInts#get does this calculation

Ah, i forgot to mention in the description that I didn't update VSizeColumnarIntsSerializer on purpose since it didn't follow the same pattern as the other implementations, and stores the size in bytes in the header in a checked cast from a long so would fail at indexing time already if it was too big. Also it shouldn't really be used in practice for real segments since it isn't compressed (except maybe for intermediary segments if indexSpecForIntermediatePersists is configured to be uncompressed, but these will be smallish segments). Since it reads directly from a ByteBuffer it is limited to the maximum size of a ByteBuffer.

@suneet-s
Copy link
Contributor

@clintropolis Makes sense. I'm just going to read through all the implementations now...

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the explanations!

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jihoonson jihoonson merged commit 2c49f6d into apache:master Mar 26, 2020
@clintropolis clintropolis deleted the too-many-values-in-column branch March 27, 2020 01:15
clintropolis added a commit to clintropolis/druid that referenced this pull request Mar 27, 2020
@clintropolis clintropolis added this to the 0.18.0 milestone Mar 27, 2020
clintropolis added a commit to implydata/druid-public that referenced this pull request Mar 27, 2020
clintropolis added a commit to implydata/druid-public that referenced this pull request Mar 27, 2020
[Backport] error on value counter overflow instead of writing sad segments (apache#9559)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants