Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegate wildcard query creation to MappedFieldType. #34062

Merged
merged 3 commits into from
Sep 26, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
import com.ibm.icu.text.RawCollationKey;
import com.ibm.icu.text.RuleBasedCollator;
import com.ibm.icu.util.ULocale;

import org.apache.lucene.document.Field;
import org.apache.lucene.document.SortedSetDocValuesField;
import org.apache.lucene.index.IndexOptions;
Expand Down Expand Up @@ -158,18 +157,23 @@ protected BytesRef indexedValueForSearch(Object value) {
@Override
public Query fuzzyQuery(Object value, Fuzziness fuzziness, int prefixLength, int maxExpansions,
boolean transpositions) {
throw new UnsupportedOperationException();
throw new UnsupportedOperationException("[fuzzy] queries are not supported on [" + CONTENT_TYPE + "] fields.");
}

@Override
public Query prefixQuery(String value, MultiTermQuery.RewriteMethod method, QueryShardContext context) {
throw new UnsupportedOperationException();
throw new UnsupportedOperationException("[prefix] queries are not supported on [" + CONTENT_TYPE + "] fields.");
}

@Override
public Query wildcardQuery(String value, QueryShardContext context) {
throw new UnsupportedOperationException("[wildcard] queries are not supported on [" + CONTENT_TYPE + "] fields.");
}

@Override
public Query regexpQuery(String value, int flags, int maxDeterminizedStates,
MultiTermQuery.RewriteMethod method, QueryShardContext context) {
throw new UnsupportedOperationException();
throw new UnsupportedOperationException("[regexp] queries are not supported on [" + CONTENT_TYPE + "] fields.");
}

public static DocValueFormat COLLATE_FORMAT = new DocValueFormat() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,14 @@ public void testPrefixQuery() {
() -> ft.prefixQuery("prefix", null, null));
}

public void testWildcardQuery() {
MappedFieldType ft = createDefaultFieldType();
ft.setName("field");
ft.setIndexOptions(IndexOptions.DOCS);
expectThrows(UnsupportedOperationException.class,
() -> ft.wildcardQuery("foo*", null));
}

public void testRangeQuery() {
MappedFieldType ft = createDefaultFieldType();
ft.setName("field");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,17 @@ public Query termsQuery(List values, QueryShardContext context) {
+ " vs. " + values);
}

@Override
public Query wildcardQuery(String value, QueryShardContext context) {
if (isSameIndex(value, context.getFullyQualifiedIndex().getName())) {
return Queries.newMatchAllQuery();
} else {
return Queries.newMatchNoDocsQuery("Index didn't match. Index queried: " + context.index().getName() + " vs. " + value);
}
}

private boolean isSameIndex(Object value, String indexName) {
String pattern = value instanceof BytesRef ? pattern = ((BytesRef) value).utf8ToString() : value.toString();
String pattern = value instanceof BytesRef ? ((BytesRef) value).utf8ToString() : value.toString();
return Regex.simpleMatch(pattern, indexName);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,10 @@ public Query prefixQuery(String value, @Nullable MultiTermQuery.RewriteMethod me
throw new QueryShardException(context, "Can only use prefix queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
}

public Query wildcardQuery(String value, QueryShardContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same reasoning can apply for wildcard, prefix and regex queries so the default impl should throw a QueryShardException ? Only StringFieldType fields should be able to build a wildcard query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword field applies the normalizer on termQuery. Depending on the normalizer the wildcard and escaped characters could be removed/replaced so I wonder if we should apply the same logic than QueryParserBase#analyzeWildcard for keyword fields. This is out of scope for this pr but it made me realize that we might have a bug here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go further and disallow wildcard queries for all non-keyword or text fields, as some other field types like _index explicitly support wildcard queries.

I missed this part sorry. I think we should explicitly add the support in the _index field type rather than supporting this query on all fields. Currently the support for prefix queries is also broken so we don't really use this ability.

Copy link
Contributor Author

@jtibshirani jtibshirani Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll make sure only string fields support wildcards by default. Maybe I'll add an upgrade note too in case this breaks some types we don't have test coverage for (will make it easier for users to debug + file issues)?

This is out of scope for this pr but it made me realize that we might have a bug here.

Makes sense, I'll make a note to follow-up on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the non-string field types, what do you think should be done with metadata types like IdFieldType, IgnoredFieldType, and RoutingFieldType? My intuition is we should switch them to being string fields to avoid breaking any queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would only break wildcard query on these fields, right ? +1 to make them string fields, prefix and regex query do not work currently because of this so it would be a bug fix. I am also ok to do that in a follow up, the changes in this pr have a different scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

throw new QueryShardException(context, "Can only use wildcard queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
}

public Query regexpQuery(String value, int flags, int maxDeterminizedStates, @Nullable MultiTermQuery.RewriteMethod method, QueryShardContext context) {
throw new QueryShardException(context, "Can only use regexp queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,16 @@
import java.util.List;

import org.apache.lucene.index.Term;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.MatchNoDocsQuery;
import org.apache.lucene.search.TermInSetQuery;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.RegexpQuery;
import org.apache.lucene.search.TermRangeQuery;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.lucene.BytesRefs;
import org.elasticsearch.common.unit.Fuzziness;
Expand Down Expand Up @@ -74,6 +77,16 @@ public Query prefixQuery(String value, MultiTermQuery.RewriteMethod method, Quer
return query;
}

@Override
public Query wildcardQuery(String value, QueryShardContext context) {
Query termQuery = termQuery(value, context);
if (termQuery instanceof MatchNoDocsQuery || termQuery instanceof MatchAllDocsQuery) {
return termQuery;
}
Term term = MappedFieldType.extractTerm(termQuery);
return new WildcardQuery(term);
}

@Override
public Query regexpQuery(String value, int flags, int maxDeterminizedStates,
MultiTermQuery.RewriteMethod method, QueryShardContext context) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@
package org.elasticsearch.index.query;

import org.apache.lucene.index.Term;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.MatchNoDocsQuery;
import org.apache.lucene.search.MultiTermQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.WildcardQuery;
Expand Down Expand Up @@ -185,20 +183,20 @@ public static WildcardQueryBuilder fromXContent(XContentParser parser) throws IO
@Override
protected Query doToQuery(QueryShardContext context) throws IOException {
MappedFieldType fieldType = context.fieldMapper(fieldName);
Term term;

Query query;
if (fieldType == null) {
term = new Term(fieldName, BytesRefs.toBytesRef(value));
Term term = new Term(fieldName, BytesRefs.toBytesRef(value));
query = new WildcardQuery(term);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if the field does not exist we could return a MatchNoDocsQuery ?

Copy link
Contributor Author

@jtibshirani jtibshirani Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

In any case, maybe I could make this change in a follow-up PR, as I was just hoping for a straight refactor here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

Yes we don't have a clear policy for this. The reason I prefer the MatchNoDocsQuery is that we can fold the reason in the message and if users check the Lucene query through the _validate API they can see that this field is not present in the mapping. If we build the same form there is no easy way for the user to understand why a specific query matches no document. Anyway we can discuss this in a follow up, no need to make that change in this pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

} else {
Query termQuery = fieldType.termQuery(value, context);
if (termQuery instanceof MatchNoDocsQuery || termQuery instanceof MatchAllDocsQuery) {
return termQuery;
}
term = MappedFieldType.extractTerm(termQuery);
query = fieldType.wildcardQuery(value, context);
}

WildcardQuery query = new WildcardQuery(term);
MultiTermQuery.RewriteMethod rewriteMethod = QueryParsers.parseRewriteMethod(rewrite, null, LoggingDeprecationHandler.INSTANCE);
QueryParsers.setRewriteMethod(query, rewriteMethod);
if (query instanceof MultiTermQuery) {
MultiTermQuery.RewriteMethod rewriteMethod = QueryParsers.parseRewriteMethod(
rewrite, null, LoggingDeprecationHandler.INSTANCE);
QueryParsers.setRewriteMethod((MultiTermQuery) query, rewriteMethod);
}
return query;
}

Expand Down