Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-12697 Add pure DocValues support to FieldValueFeature #123

Merged
merged 30 commits into from
May 28, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
2ee8779
[SOLR-12697] add DocValuesFieldValueFeatureScorer to read docValues f…
May 12, 2021
bdce029
[SOLR-12697] formatting changes
May 12, 2021
e6601ee
[SOLR-12697] only apply new scorer to fields that are not stored
May 12, 2021
d6e1477
[SOLR-12697] remove BINARY case because it is not supported
May 12, 2021
5bc995c
[SOLR-12697] only pass fieldType to constructor; determine numberType…
May 13, 2021
4559415
[SOLR-12697] remove - from fieldnames; randomize indexing order for d…
May 13, 2021
ec4cbfb
[SOLR-12697] determine docValuesType before creating DocValuesFieldVa…
May 13, 2021
e6f20f1
split dual-purpose DocValuesFieldValueFeatureScorer into two
cpoerschke May 14, 2021
f16ce3d
add TestFieldValueFeature test coverage (with caveat)
cpoerschke May 14, 2021
e5954eb
[SOLR-12697] remove method to read sorted values from Scorer for nume…
May 16, 2021
da6a635
[SOLR-12697] add fallback feature scorer that always returns the defa…
May 17, 2021
b105627
[SOLR-12697] test that exception is thrown for unsupported dv type, t…
May 19, 2021
e07c432
[SOLR-12697] add tests for parsing different sortedDocValues, add ent…
May 19, 2021
443a396
solr/CHANGES.txt edit
cpoerschke May 20, 2021
2dbd94e
in TestFieldValueFeature reduce potential test interaction
cpoerschke May 20, 2021
9b77154
in FieldValueFeature clarify 'searcher instanceof SolrIndexSearcher' use
cpoerschke May 20, 2021
c1f3a8e
TestFieldValueFeature: replace dvBoolPopularity with dvIsTrendy (form…
cpoerschke May 20, 2021
3c38e91
out-scope TestLTRReRankingPipeline changes
cpoerschke May 20, 2021
53cd2fb
FieldValueFeature: mention stored=true or docValues=true in javadocs
cpoerschke May 20, 2021
e854f50
FieldValueFeature polishes:
cpoerschke May 20, 2021
b9d3cd0
[SOLR-12697] add javadoc to explain which type of FieldValueFeatureSc…
May 20, 2021
da57e9c
Merge remote-tracking branch 'github_tomglk/jira/SOLR-12697' into jir…
cpoerschke May 20, 2021
abb3632
Revert "out-scope TestLTRReRankingPipeline changes"
cpoerschke May 20, 2021
a789b12
fix for SOLR-11134
cpoerschke May 20, 2021
c42be54
[SOLR-12697] out-scope TestLTRReRankingPipeline
May 21, 2021
83bc1ee
apologies, multiple TestFieldValueFeature polishes in one commit, app…
cpoerschke May 24, 2021
385d8b2
add TestFieldValueFeature.testThatDateValuesAreCorrectlyParsed()
cpoerschke May 25, 2021
4348d04
Merge remote-tracking branch 'origin/main' into jira/SOLR-12697
cpoerschke May 25, 2021
ad489d0
small TestLTROnSolrCloud polish:
cpoerschke May 26, 2021
2c3a368
Merge branch 'main' into jira/SOLR-12697
cpoerschke May 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ public class FieldValueFeatureWeight extends FeatureWeight {
private final SchemaField schemaField;

public FieldValueFeatureWeight(IndexSearcher searcher,
SolrQueryRequest request, Query originalQuery, Map<String,String[]> efi) {
SolrQueryRequest request, Query originalQuery, Map<String, String[]> efi) {
super(FieldValueFeature.this, searcher, request, originalQuery, efi);
if (searcher instanceof SolrIndexSearcher) {
schemaField = ((SolrIndexSearcher) searcher).getSchema().getFieldOrNull(field);
Expand All @@ -106,18 +106,29 @@ public FieldValueFeatureWeight(IndexSearcher searcher,

/**
* Return a FeatureScorer that uses docValues or storedFields if no docValues are present
*
* @param context the segment this FeatureScorer is working with
* @return FeatureScorer for the current segment and field
* @throws IOException as defined by abstract class Feature
*/
@Override
public FeatureScorer scorer(LeafReaderContext context) throws IOException {
if (schemaField != null && !schemaField.stored() && schemaField.hasDocValues()) {
return new DocValuesFieldValueFeatureScorer(this, context,
DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS), schemaField.getType());

FieldInfo fieldInfo = context.reader().getFieldInfos().fieldInfo(field);
DocValuesType docValuesType = fieldInfo != null ? fieldInfo.getDocValuesType() : DocValuesType.NONE;

if (DocValuesType.NUMERIC.equals(docValuesType) || DocValuesType.SORTED.equals(docValuesType)) {
return new DocValuesFieldValueFeatureScorer(this, context,
DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS), schemaField.getType(), docValuesType);
// If type is NONE, this segment has no docs with this field. That's not a problem, because we won't call score() anyway
tomglk marked this conversation as resolved.
Show resolved Hide resolved
} else if (!DocValuesType.NONE.equals(docValuesType)) {
throw new IllegalArgumentException("Doc values type " + docValuesType.name() + " of field " + field
+ " is not supported!");
}
}
return new FieldValueFeatureScorer(this, context,
DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS));
DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS));
}

/**
Expand All @@ -135,8 +146,7 @@ public FieldValueFeatureScorer(FeatureWeight weight, LeafReaderContext context,
public float score() throws IOException {

try {
final Document document = context.reader().document(itr.docID(),
fieldAsSet);
final Document document = context.reader().document(itr.docID(), fieldAsSet);
final IndexableField indexableField = document.getField(field);
if (indexableField == null) {
return getDefaultValue();
Expand All @@ -158,10 +168,7 @@ public float score() throws IOException {
}
}
} catch (final IOException e) {
throw new FeatureException(
e.toString() + ": " +
"Unable to extract feature for "
+ name, e);
throw new FeatureException(e.toString() + ": " + "Unable to extract feature for " + name, e);
}
return getDefaultValue();
}
Expand All @@ -177,76 +184,73 @@ public float getMaxScore(int upTo) throws IOException {
*/
public class DocValuesFieldValueFeatureScorer extends FeatureWeight.FeatureScorer {
final LeafReaderContext context;
final DocIdSetIterator docValues;
final FieldType fieldType;
final DocValuesType docValuesType;
DocIdSetIterator docValues;
NumberType fieldNumberType;
DocValuesType docValuesType = DocValuesType.NONE;

public DocValuesFieldValueFeatureScorer(final FeatureWeight weight, final LeafReaderContext context,
final DocIdSetIterator itr, final FieldType fieldType) {
final DocIdSetIterator itr, final FieldType fieldType,
final DocValuesType docValuesType) {
super(weight, itr);
this.context = context;
this.fieldType = fieldType;
this.docValuesType = docValuesType;

try {
FieldInfo fieldInfo = context.reader().getFieldInfos().fieldInfo(field);
// if fieldInfo is null, just use NONE-Type. This causes no problems, because we won't call score() anyway
docValuesType = fieldInfo != null ? fieldInfo.getDocValuesType() : DocValuesType.NONE;
switch (docValuesType) {
case NUMERIC:
docValues = DocValues.getNumeric(context.reader(), field);
fieldNumberType = fieldType.getNumberType();
break;
case SORTED:
docValues = DocValues.getSorted(context.reader(), field);
break;
case BINARY:
case SORTED_NUMERIC:
case SORTED_SET:
case NONE:
default:
docValues = null;
if (DocValuesType.NUMERIC.equals(docValuesType)) {
docValues = DocValues.getNumeric(context.reader(), field);
fieldNumberType = fieldType.getNumberType();
} else if (DocValuesType.SORTED.equals(docValuesType)) {
docValues = DocValues.getSorted(context.reader(), field);
}
} catch (IOException e) {
throw new IllegalArgumentException("Could not read docValues for field " + field + " with docValuesType "
+ docValuesType.name());
+ docValuesType.name());
}
}

@Override
public float score() throws IOException {
if (docValues != null && docValues.advance(itr.docID()) < DocIdSetIterator.NO_MORE_DOCS) {
switch (docValuesType) {
case NUMERIC:
if (NumberType.FLOAT.equals(fieldNumberType)) {
// convert float value that was stored as long back to float
return Float.intBitsToFloat((int) ((NumericDocValues) docValues).longValue());
} else if (NumberType.DOUBLE.equals(fieldNumberType)) {
// handle double value conversion
return (float) Double.longBitsToDouble(((NumericDocValues) docValues).longValue());
}
// just take the long value
return ((NumericDocValues) docValues).longValue();
case SORTED:
int ord = ((SortedDocValues) docValues).ordValue();
// try to interpret bytesRef either as number string or as true / false token
return handleBytesRef(((SortedDocValues) docValues).lookupOrd(ord));
case BINARY:
case SORTED_SET:
case SORTED_NUMERIC:
case NONE:
default:
throw new IllegalArgumentException("Doc values type " + docValuesType.name() + " of field " + field
+ " is not supported!");
}
if (DocValuesType.NUMERIC.equals(docValuesType) &&
((NumericDocValues) docValues).advanceExact(itr.docID())) {
return readNumericDocValues();
} else if (DocValuesType.SORTED.equals(docValuesType) &&
((SortedDocValues) docValues).advanceExact(itr.docID())) {
int ord = ((SortedDocValues) docValues).ordValue();
return readSortedDocValues(((SortedDocValues) docValues).lookupOrd(ord));
}
return FieldValueFeature.this.getDefaultValue();
}

private float handleBytesRef(BytesRef bytesRef) {
/**
* Read the numeric value for a field and convert the different number types to float.
*
* @return The numeric value that the docValues contain for the current document
* @throws IOException if docValues cannot be read
*/
private float readNumericDocValues() throws IOException {
if (NumberType.FLOAT.equals(fieldNumberType)) {
// convert float value that was stored as long back to float
return Float.intBitsToFloat((int) ((NumericDocValues) docValues).longValue());
} else if (NumberType.DOUBLE.equals(fieldNumberType)) {
// handle double value conversion
return (float) Double.longBitsToDouble(((NumericDocValues) docValues).longValue());
}
// just take the long value
return ((NumericDocValues) docValues).longValue();
}

/**
* Interprets the bytesRef either as true / false token or tries to read it as number string
*
* @param bytesRef the value of the field that should be used as score
* @return the input converted to a number
*/
private float readSortedDocValues(BytesRef bytesRef) {
String string = bytesRef.utf8ToString();
if (string.length() == 1
&& (string.charAt(0) == BoolField.TRUE_TOKEN[0] || string.charAt(0) == BoolField.FALSE_TOKEN[0])) {
&& (string.charAt(0) == BoolField.TRUE_TOKEN[0] || string.charAt(0) == BoolField.FALSE_TOKEN[0])) {
// boolean values in the index are encoded with a single char contained in TRUE_TOKEN or FALSE_TOKEN
// (see BoolField)
if (string.charAt(0) == BoolField.TRUE_TOKEN[0]) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
<field name="isTrendy" type="boolean" indexed="true" stored="true" />
<field name="dvIntField" type="int" indexed="false" docValues="true" stored="false" default="-1" multiValued="false"/>
<field name="dvLongField" type="long" indexed="false" docValues="true" stored="false" default="-2" multiValued="false"/>
<field name="dvFloatField" type="float" indexed="false" docValues="false" stored="true" default="-3" multiValued="false"/>
<field name="dvFloatField" type="float" indexed="false" docValues="true" stored="false" default="-3" multiValued="false"/>
<field name="dvDoubleField" type="double" indexed="false" docValues="true" stored="false" multiValued="false"/>
<field name="dvStrNumField" type="string" indexed="false" docValues="true" stored="false" multiValued="false"/>
<field name="dvStrBoolField" type="boolean" indexed="false" docValues="true" stored="false" multiValued="false"/>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ private void indexDocuments(final String collection) throws Exception {
final int collectionSize = 8;
// put documents in random order to check that advanceExact is working correctly
List<Integer> docIds = IntStream.rangeClosed(1, collectionSize).boxed().collect(toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I haven't across this IntStream.rangeClosed thing before!

Collections.shuffle(docIds);
Collections.shuffle(docIds, random());

int docCounter = 1;
for (int docId : docIds) {
Expand Down