Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When removeNullBytes is set, length calculations did not take into account null bytes. #17232

Merged
merged 4 commits into from
Oct 7, 2024

Conversation

cryptoe
Copy link
Contributor

@cryptoe cryptoe commented Oct 3, 2024

Fix for errros like :

java.lang.RuntimeException: org.apache.druid.java.util.common.ISE: Invalid value start byte [28]
	at org.apache.druid.java.util.common.Either.valueOrThrow(Either.java:95)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.runProcessorNow(FrameProcessorExecutor.java:259)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.run(FrameProcessorExecutor.java:138)
	at org.apache.druid.msq.exec.WorkerImpl$1$2.run(WorkerImpl.java:838)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:259)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.druid.java.util.common.ISE: Invalid value start byte [28]
	at org.apache.druid.frame.field.StringFieldReader$Selector.updateCurrentUtf8Strings(StringFieldReader.java:357)
	at org.apache.druid.frame.field.StringFieldReader$Selector.computeCurrentUtf8Strings(StringFieldReader.java:284)
	at org.apache.druid.frame.field.StringFieldReader$Selector.getObject(StringFieldReader.java:169)
	at org.apache.druid.msq.indexing.processor.SegmentGeneratorFrameProcessor.lambda$addFrame
	

Patch was prepared with help from @LakshSingla .

The test cases added here does not completely check the new code flow.

@cryptoe cryptoe added this to the 31.0.0 milestone Oct 3, 2024
Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner if the tests are added in the following way:

  1. Remove the inherited file StringFieldWritersWithNullHandlingTest.java since it's not checking stuff with "null-handling". We are checking null-bytes, which is completely different than what Druid associates the term "null-handling" for.
  2. In the setUp of StringFieldWriterTest, there are a couple of field writers. There should be two more - fieldWriterWithRemoveNullBytes, fieldWriterUtf8WithRemoveNullBytes.
  3. Replicate the doTest method, with doTestWithRemoveNullBytes. Call that method along with doTest in the test cases that are present already.
  4. Add a new test case that tests a String like "abc\u0000". I am surprised that you are able to replicate the behavior with the added test case since that is not what the patch is for. doTest should fail with this test case, because of null bytes in the string (which is not happening in this PR), and we should have an assertion for that error. The doTestWithRemoveNullBytes will pass.
  5. Moreover, I think the memory is initialized with null bytes , so you'd have to fill up the memory with junk bytes that are not null. I am not sure if this is strictly required. We can check this, but I don't see any downside of having some randomly generated garbage data (without null) in the memory where the field writer is writing that will ensure that the test fails regardless of the initialization conditions.

What do you think of the above? Also, I have a couple of questions:

  1. Have I missed any reason for having a new class overriding the before method for the testcases and making the variables protected?
  2. Are you running into any errors with the test case that has been added, without the changes. I expect the test case to run fine because that is not what the patch is intended for, and it is certainly worth looking into if it is failing without these changes.

@@ -106,6 +106,12 @@ public void testMultiValueStringContainingNulls()
doTest(Arrays.asList("foo", NullHandling.emptyToNullIfNeeded(""), "bar", null));
}

@Test
Copy link
Contributor

@LakshSingla LakshSingla Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is misleading because it tests "nulls" instead of "null bytes". The string should be like "foo\u0000" in order for it to have the null byte. Does this test fail without the changes to the StringFieldWriter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed up a new patch.

Copy link
Contributor

@LakshSingla LakshSingla Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think a new file is required for the test case. This should be a helper method like doTest in the StringFieldWriterTest.

This will allow the reader to visualize the different test cases together. Both the helper methods would exhibit the same behavior except for strings where null bytes are present, where the doTest would throw and the newer one would silently eat up the null bytes.

@cryptoe
Copy link
Contributor Author

cryptoe commented Oct 7, 2024

I have added a new test case and machinery to test individual field writers. The new test case should be hitting the new code changes now.
Thanks for the tip @LakshSingla

@cryptoe cryptoe merged commit 6a4352f into apache:master Oct 7, 2024
90 checks passed
kfaraz pushed a commit to kfaraz/druid that referenced this pull request Oct 7, 2024
…count null bytes. (apache#17232)

* When replaceNullBytes is set, length calculations did not take into account null bytes.
abhishekagarwal87 pushed a commit that referenced this pull request Oct 7, 2024
…count null bytes. (#17232) (#17266)

* When replaceNullBytes is set, length calculations did not take into account null bytes.

Co-authored-by: Karan Kumar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants