-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-10255 Add support for docValues to solr.BinaryField #2536
Conversation
return new org.apache.lucene.document.StoredField(field.getName(), getBytesRef(val)); | ||
} | ||
|
||
private static BytesRef getBytesRef(Object val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the code for converting Object val
to BytesRef
for code reuse (and readability)
} | ||
|
||
@Override | ||
public List<IndexableField> createFields(SchemaField field, Object val) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty much borrowed from StrField
type that supports creating stored
and/or docValued
Lucene fields for a single Solr field.
@@ -609,7 +609,7 @@ private Object decodeDVField( | |||
case BINARY: | |||
BinaryDocValues bdv = e.getBinaryDocValues(localId, leafReader, readerOrd); | |||
if (bdv != null) { | |||
return BytesRef.deepCopyOf(bdv.binaryValue()); | |||
return BytesRef.deepCopyOf(bdv.binaryValue()).bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this code was returning BytesRef
objects that were serialized up the stack with a simple .toString
into BytesRef@<hash-code>
string. You can revert this change and see it yourself in a test failure.
Now we return a byte[]
which is correctly supported by Solr/SolrJ client and serialized as base64.
Note that there's no any additional performance overhead as we were already doing deepCopy
into a new byte array and .bytes simply return a reference to a byte array.
for (int i = 0; i < data.length; i++) { | ||
byte b = data[i]; | ||
assertEquals((byte) i, b); | ||
for (String field : new String[] {"data", "data_dv"}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add it later if it is needed. I don't see a big value in storing multiple binary payloads where you simply can store a single binary payload with multiple values.
IndexableField fval = createField(field, val); | ||
|
||
if (field.hasDocValues() && !field.multiValued()) { | ||
IndexableField docval = new BinaryDocValuesField(field.getName(), getBytesRef(val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the crux of the changes - we want to use Lucene BinaryDocValuesField
if user specified docValues="true"
in the schema.
|
||
fval = docval; | ||
} | ||
return Collections.singletonList(fval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May it be simplified as
Stream.of(createField(field, val),
(field.hasDocValues() && !field.multiValued()) ? new BinaryDocValuesField(field.getName(), getBytesRef(val)):null)
.filter(obj -> obj != null)
.collect(Collectors.toList());
?
UPD: However, Streams may put too much footprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my understanding is that this code is getting executed every time the field of this type is being indexed.
I borrowed the code from StrField
type that supports creating stored
and/or docValued
Lucene fields for a single Solr field and there were such optimizations to create an ArrayList
only in case of when both stored
and docValues
options are enabled.
I guess it makes sense to keep this code consistent across different field types until there's some appetite in rewriting this in streams
or List.of
style across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why to bother with random field. Shouldn't we just remove this file and test, since now we have binary DV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is not testing binary doc values per say, but exception handling for any field type that does not support docValues. I was going through different field types to find the best candidate now and figured RandomSortField
is the best one as it will never get DocValues support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 nice straight-forward change!
@dsmiley @mkhludnev Could you please guys merge this MR if you think it is ready? I don't have write access. Thank you! |
Co-authored-by: Alexey Serba <[email protected]> (cherry picked from commit 6ab6c4a)
Description
Add support for docValues to solr.BinaryField
Solution
Lucene has support for BinaryDocValuesField since forever. This is pretty straightforward PR with exposing that Lucene field in Solr.
Tests
Added additional asserts to test solr.BinaryField with
docValues="true"
in existingTestBinaryField
test.Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.