SOLR-10255 Add support for docValues to solr.BinaryField #2536

serba · 2024-06-26T05:53:47Z

Description

Add support for docValues to solr.BinaryField

Solution

Lucene has support for BinaryDocValuesField since forever. This is pretty straightforward PR with exposing that Lucene field in Solr.

Tests

Added additional asserts to test solr.BinaryField with docValues="true" in existing TestBinaryField test.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide

serba · 2024-06-26T05:58:47Z

solr/core/src/java/org/apache/solr/schema/BinaryField.java

+    return new org.apache.lucene.document.StoredField(field.getName(), getBytesRef(val));
+  }
+
+  private static BytesRef getBytesRef(Object val) {


Moved the code for converting Object val to BytesRef for code reuse (and readability)

serba · 2024-06-26T06:01:16Z

solr/core/src/java/org/apache/solr/schema/BinaryField.java

+  }
+
+  @Override
+  public List<IndexableField> createFields(SchemaField field, Object val) {


This is pretty much borrowed from StrField type that supports creating stored and/or docValued Lucene fields for a single Solr field.

serba · 2024-06-26T06:08:39Z

solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java

@@ -609,7 +609,7 @@ private Object decodeDVField(
      case BINARY:
        BinaryDocValues bdv = e.getBinaryDocValues(localId, leafReader, readerOrd);
        if (bdv != null) {
-          return BytesRef.deepCopyOf(bdv.binaryValue());
+          return BytesRef.deepCopyOf(bdv.binaryValue()).bytes;


Previously this code was returning BytesRef objects that were serialized up the stack with a simple .toString into BytesRef@<hash-code> string. You can revert this change and see it yourself in a test failure.

Now we return a byte[] which is correctly supported by Solr/SolrJ client and serialized as base64.

Note that there's no any additional performance overhead as we were already doing deepCopy into a new byte array and .bytes simply return a reference to a byte array.

serba · 2024-06-26T06:11:48Z

solr/core/src/test/org/apache/solr/schema/TestBinaryField.java

-          for (int i = 0; i < data.length; i++) {
-            byte b = data[i];
-            assertEquals((byte) i, b);
+        for (String field : new String[] {"data", "data_dv"}) {


Please review these code changes with "Hide Whitespaces" toggle on.

We can add it later if it is needed. I don't see a big value in storing multiple binary payloads where you simply can store a single binary payload with multiple values.

serba · 2024-06-26T21:00:23Z

solr/core/src/java/org/apache/solr/schema/BinaryField.java

+    IndexableField fval = createField(field, val);
+
+    if (field.hasDocValues() && !field.multiValued()) {
+      IndexableField docval = new BinaryDocValuesField(field.getName(), getBytesRef(val));


This is the crux of the changes - we want to use Lucene BinaryDocValuesField if user specified docValues="true" in the schema.

mkhludnev · 2024-07-09T14:32:08Z

solr/core/src/java/org/apache/solr/schema/BinaryField.java

+
+      fval = docval;
+    }
+    return Collections.singletonList(fval);


May it be simplified as

Stream.of(createField(field, val), (field.hasDocValues() && !field.multiValued()) ? new BinaryDocValuesField(field.getName(), getBytesRef(val)):null) .filter(obj -> obj != null) .collect(Collectors.toList());

?

UPD: However, Streams may put too much footprint.

Yeah, my understanding is that this code is getting executed every time the field of this type is being indexed.

I borrowed the code from StrField type that supports creating stored and/or docValued Lucene fields for a single Solr field and there were such optimizations to create an ArrayList only in case of when both stored and docValues options are enabled.

I guess it makes sense to keep this code consistent across different field types until there's some appetite in rewriting this in streams or List.of style across the board.

mkhludnev · 2024-07-09T14:39:43Z

solr/core/src/test-files/solr/collection1/conf/bad-schema-unsupported-docValues.xml

I'm not sure why to bother with random field. Shouldn't we just remove this file and test, since now we have binary DV?

I think this test is not testing binary doc values per say, but exception handling for any field type that does not support docValues. I was going through different field types to find the best candidate now and figured RandomSortField is the best one as it will never get DocValues support.

dsmiley

+1 nice straight-forward change!

serba · 2024-07-17T16:16:44Z

@dsmiley @mkhludnev Could you please guys merge this MR if you think it is ready? I don't have write access. Thank you!

Co-authored-by: Alexey Serba <[email protected]> (cherry picked from commit 6ab6c4a)

SOLR-10255 Add support for docValues to solr.BinaryField

d4e8907

github-actions bot added tests cat:search cat:schema labels Jun 26, 2024

serba commented Jun 26, 2024

View reviewed changes

Alexey Serba added 2 commits June 26, 2024 09:49

Merge remote-tracking branch 'upstream/main' into SOLR-10255

6eba352

Remove support for multivalued binary doc values

90ed36b

We can add it later if it is needed. I don't see a big value in storing multiple binary payloads where you simply can store a single binary payload with multiple values.

serba commented Jun 26, 2024

View reviewed changes

mkhludnev approved these changes Jul 9, 2024

View reviewed changes

Alexey Serba added 2 commits July 10, 2024 23:07

Merge remote-tracking branch 'upstream/main' into SOLR-10255

66692af

Add CHANGES.txt entry

2aa8b24

dsmiley approved these changes Jul 16, 2024

View reviewed changes

dsmiley merged commit 6ab6c4a into apache:main Jul 17, 2024
3 checks passed

dsmiley pushed a commit that referenced this pull request Jul 17, 2024

SOLR-10255 Add support for docValues to solr.BinaryField (#2536)

1a36df8

Co-authored-by: Alexey Serba <[email protected]> (cherry picked from commit 6ab6c4a)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-10255 Add support for docValues to solr.BinaryField #2536

SOLR-10255 Add support for docValues to solr.BinaryField #2536

serba commented Jun 26, 2024 •

edited

Loading

serba Jun 26, 2024

serba Jun 26, 2024

serba Jun 26, 2024 •

edited

Loading

serba Jun 26, 2024

serba Jun 26, 2024

mkhludnev Jul 9, 2024 •

edited

Loading

serba Jul 11, 2024

mkhludnev Jul 9, 2024

serba Jul 11, 2024

dsmiley left a comment

serba commented Jul 17, 2024

SOLR-10255 Add support for docValues to solr.BinaryField #2536

SOLR-10255 Add support for docValues to solr.BinaryField #2536

Conversation

serba commented Jun 26, 2024 • edited Loading

Description

Solution

Tests

Checklist

serba Jun 26, 2024

Choose a reason for hiding this comment

serba Jun 26, 2024

Choose a reason for hiding this comment

serba Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

serba Jun 26, 2024

Choose a reason for hiding this comment

serba Jun 26, 2024

Choose a reason for hiding this comment

mkhludnev Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

serba Jul 11, 2024

Choose a reason for hiding this comment

mkhludnev Jul 9, 2024

Choose a reason for hiding this comment

serba Jul 11, 2024

Choose a reason for hiding this comment

dsmiley left a comment

Choose a reason for hiding this comment

serba commented Jul 17, 2024

serba commented Jun 26, 2024 •

edited

Loading

serba Jun 26, 2024 •

edited

Loading

mkhludnev Jul 9, 2024 •

edited

Loading