-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17195: Add 'minPrefixLength' soft limit #2499
Changes from 4 commits
cfb9831
c565a85
3902a78
b518f4b
c860533
c03afc9
1ad8ce0
df18b2b
b72be9d
4a29dd2
85d1546
36b88f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ | |
import org.apache.lucene.index.Term; | ||
import org.apache.lucene.queries.function.ValueSource; | ||
import org.apache.lucene.queries.function.valuesource.SortedSetFieldSource; | ||
import org.apache.lucene.search.PrefixQuery; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.SortField; | ||
import org.apache.lucene.search.SortedSetSelector; | ||
|
@@ -38,6 +39,7 @@ | |
import org.apache.solr.query.SolrRangeQuery; | ||
import org.apache.solr.response.TextResponseWriter; | ||
import org.apache.solr.search.QParser; | ||
import org.apache.solr.search.QueryUtils; | ||
import org.apache.solr.uninverting.UninvertingReader.Type; | ||
|
||
/** | ||
|
@@ -165,6 +167,20 @@ public Query getFieldTermQuery(QParser parser, SchemaField field, String externa | |
return new TermQuery(new Term(field.getName(), br)); | ||
} | ||
|
||
@Override | ||
public Query getPrefixQuery(QParser parser, SchemaField sf, String termStr) { | ||
final var query = super.getPrefixQuery(parser, sf, termStr); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why override these methods in TextField & StrField with duplicative code when you could modify There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Moving the logic into For now I've acquiesced and moved the logic to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see that the limit doesn't make much sense for those two yet it's also harmless (I think). I prefer a consistent approach with no duplication. Thanks for doing this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well there is some harm, albeit small. A prefix query on a "known safe" field type like Enum might trigger the limit, and cause the "Admin aggravation/annoyance" we've discussed and tried to minimize throughout this PR. That'd require a user to submit a query like |
||
|
||
// Some internal usage (e.g. faceting) creates PrefixQueries without a surrounding QParser, so | ||
// check for null here before using QParser to access the limit value | ||
if (query instanceof PrefixQuery && parser != null) { | ||
final var minPrefixLength = | ||
parser.getReq().getCore().getSolrConfig().prefixQueryMinPrefixLength; | ||
QueryUtils.ensurePrefixQueryObeysMinimumPrefixLength(query, termStr, minPrefixLength); | ||
} | ||
return query; | ||
} | ||
|
||
@Override | ||
public Object toObject(SchemaField sf, BytesRef term) { | ||
return term.utf8ToString(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
import org.apache.lucene.queryparser.classic.ParseException; | ||
import org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser; | ||
import org.apache.lucene.search.MultiTermQuery; | ||
import org.apache.lucene.search.PrefixQuery; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.solr.common.params.CommonParams; | ||
import org.apache.solr.common.params.SolrParams; | ||
|
@@ -134,6 +135,19 @@ protected Query newWildcardQuery(org.apache.lucene.index.Term t) { | |
} | ||
} | ||
|
||
@Override | ||
protected Query getPrefixQuery(String field, String termStr) throws ParseException { | ||
// TODO check the field type and call QueryUtils.ensureBlah here | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not clear why this parser needs modifications like this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed this in testing initially and was surprised as well. It looks like ComplexPhraseQParserPlugin assumes "text" and doesn't rely on FieldType (or its subclasses) to do query-construction in a schema-aware manner. I'd be curious to see what other QParsers act similarly, but I'm not familiar enough with our query-parsing code generally to say whether it makes sense or might be problematic. But definitely orthogonal to this PR either way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hopefully a simple fix to this parser to consult the field type will work. That's the right thing to do. Perhaps whoever wrote this didn't know better or it didn't exist at the time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did find a way to make this change, but ended up backing off of it. (I pushed that change to this PR but then reverted it, if you're curious to what was involved.) A few reasons for that back off:
|
||
final var query = super.getPrefixQuery(field, termStr); | ||
if (query instanceof PrefixQuery) { | ||
final var minPrefixLength = | ||
getReq().getCore().getSolrConfig().prefixQueryMinPrefixLength; | ||
QueryUtils.ensurePrefixQueryObeysMinimumPrefixLength( | ||
query, termStr, minPrefixLength); | ||
} | ||
return query; | ||
} | ||
|
||
private Query setRewriteMethod(org.apache.lucene.search.Query query) { | ||
if (query instanceof MultiTermQuery) { | ||
((MultiTermQuery) query) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ | |
import java.util.Collections; | ||
import java.util.IdentityHashMap; | ||
import java.util.List; | ||
import java.util.Locale; | ||
import java.util.Map; | ||
import java.util.Set; | ||
import org.apache.lucene.search.BooleanClause; | ||
|
@@ -34,6 +35,7 @@ | |
import org.apache.lucene.search.Query; | ||
import org.apache.solr.common.SolrException; | ||
import org.apache.solr.common.params.CommonParams; | ||
import org.apache.solr.core.SolrConfig; | ||
import org.apache.solr.request.SolrQueryRequest; | ||
|
||
/** */ | ||
|
@@ -83,6 +85,25 @@ public static boolean isConstantScoreQuery(Query q) { | |
} | ||
} | ||
|
||
public static void ensurePrefixQueryObeysMinimumPrefixLength( | ||
Query query, String prefix, int minPrefixLength) { | ||
// TODO Should we provide a query-param to disable the limit on a request-by-request basis? I | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes indeed; in fact the QParser default algorithm could lookup a local-param (not request level) like prefixQueryMinimumLength. This will be straightforward once you create the QParser default method as I suggested. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
// can imagine scenarios where advanced users may want to enforce the limit on most fields, | ||
// but ignore it for a few fields that they know to be low-cardinality and therefore "less | ||
dsmiley marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// risky" | ||
if (prefix.length() < minPrefixLength) { | ||
final var message = | ||
String.format( | ||
Locale.ROOT, | ||
"Query [%s] does not meet the minimum prefix length [%d] (actual=[%d]). Please try with a larger prefix, or adjust %s in your solrconfig.xml", | ||
query, | ||
minPrefixLength, | ||
prefix.length(), | ||
SolrConfig.MIN_PREFIX_LENGTH); | ||
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, message); | ||
} | ||
} | ||
|
||
/** | ||
* Returns the original query if it was already a positive query, otherwise return the negative of | ||
* the query (i.e., a positive query). | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,18 @@ | |
--> | ||
<maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses> | ||
|
||
<!-- Minimum acceptable prefix-size for prefix-based queries on string fields. | ||
|
||
Prefix-based queries consume memory in proportion to the number of terms in the index | ||
that start with that prefix. Short prefixes tend to match many many more indexed-terms | ||
and consume more memory as a result, sometimes causing stability issues on the node. | ||
|
||
This setting allows administrators to require that prefixes meet or exceed a specified | ||
minimum length requirement. Prefix queries that don't meet this requirement return an | ||
error to users. | ||
--> | ||
<minPrefixLength>${solr.min.prefixLength:2}</minPrefixLength> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again; name is confusing and I disagree with the choice of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To avoid fragmentation, let's continue XML-tag naming discussion in this thread here. For the governing sys prop, I've changed that to 'solr.query.minPrefixLength' as suggested. |
||
|
||
<!-- Cache specification for Filters or DocSets - unordered set of *all* documents | ||
that match a particular query. | ||
--> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package org.apache.solr.search; | ||
|
||
import org.apache.solr.SolrTestCaseJ4; | ||
import org.apache.solr.common.SolrException; | ||
import org.junit.BeforeClass; | ||
import org.junit.Test; | ||
|
||
/** | ||
* Unit tests for prefix-query functionality - mostly testing the 'minPrefixLength' setting | ||
* available in solrconfig.xml | ||
*/ | ||
public class PrefixQueryTest extends SolrTestCaseJ4 { | ||
|
||
private static final String[] MIN_PREFIX_SUPPORTING_FIELDS = new String[] {"val_s", "t_val"}; | ||
|
||
@BeforeClass | ||
public static void beforeTests() throws Exception { | ||
initCore("solrconfig.xml", "schema.xml"); | ||
|
||
assertU(createDocWithFieldVal("1", "aaa")); | ||
assertU(createDocWithFieldVal("2", "aab")); | ||
assertU(createDocWithFieldVal("3", "aac")); | ||
assertU(createDocWithFieldVal("4", "abc")); | ||
|
||
assertU(createDocWithFieldVal("5", "bbb")); | ||
assertU(createDocWithFieldVal("6", "bbc")); | ||
|
||
assertU("<commit/>"); | ||
} | ||
|
||
// Sanity-check of a few queries we'll use in other tests | ||
@Test | ||
public void testPrefixQueryMatchesExpectedDocuments() { | ||
for (String fieldName : MIN_PREFIX_SUPPORTING_FIELDS) { | ||
assertQ(req(fieldName + ":*"), "//*[@numFound='6']"); | ||
assertQ(req(fieldName + ":aa*"), "//*[@numFound='3']"); | ||
assertQ(req(fieldName + ":bb*"), "//*[@numFound='2']"); | ||
} | ||
} | ||
|
||
@Test | ||
public void testPrefixQueryObeysMinPrefixLimit() { | ||
for (String fieldName : MIN_PREFIX_SUPPORTING_FIELDS) { | ||
assertQEx( | ||
"Prefix query didn't obey limit", | ||
"does not meet the minimum prefix length [2] (actual=[1])", | ||
req(fieldName + ":a*"), | ||
SolrException.ErrorCode.BAD_REQUEST); | ||
} | ||
} | ||
|
||
@Test | ||
public void testPrefixQParserObeysMinPrefixLimit() { | ||
for (String fieldName : MIN_PREFIX_SUPPORTING_FIELDS) { | ||
assertQEx( | ||
"Prefix query didn't obey limit", | ||
"does not meet the minimum prefix length [2] (actual=[1])", | ||
req("q", "{!prefix f=" + fieldName + "}a"), | ||
SolrException.ErrorCode.BAD_REQUEST); | ||
} | ||
} | ||
|
||
@Test | ||
public void testComplexPhraseQParserObeysMinPrefixLimit() { | ||
for (String fieldName : MIN_PREFIX_SUPPORTING_FIELDS) { | ||
assertQEx( | ||
"{!complex} query didn't obey min-prefix limit", | ||
"does not meet the minimum prefix length [2] (actual=[1])", | ||
req("q", "{!complexphrase inOrder=true}" + fieldName + ":\"a*\""), | ||
SolrException.ErrorCode.BAD_REQUEST); | ||
} | ||
} | ||
|
||
@Test | ||
public void testQuestionMarkWildcardsCountTowardsMinimumPrefix() { | ||
// Both of these queries succeed since the '?' wildcard is counted as a part of the prefix | ||
assertQ(req("val_s:a?c*"), "//*[@numFound='2']"); // Matches 'aac' and 'abc' | ||
assertQ(req("val_s:a??*"), "//*[@numFound='4']"); // Matches all documents starting with 'a' | ||
} | ||
|
||
private static String createDocWithFieldVal(String id, String fieldVal) { | ||
return "<add><doc><field name=\"id\">" | ||
+ id | ||
+ "</field><field name=\"val_s\">" | ||
+ fieldVal | ||
+ "</field><field name=\"t_val\">" | ||
+ fieldVal | ||
+ "</field></doc></add>"; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a naming standpoint, this is very ambiguous. Prefix of what? And should this be here at all vs. a request parameter, interpreted by some (not all) query parsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming-wise, I was very much following the model of
maxBooleanClauses
. But in hindsight, I think you're right that this is too ambiguous.For now I'll change this to
minPrefixQueryTermLength
, but if there's another option there you like better, lmk.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the "Term" in that add anything? I suggest removing that component. Not a strong opinion though.