-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phrases in span_near #16796
Comments
Hi @cknv All of the span queries are term-level queries, ie the query string needs to be analyzed before the resulting terms are used with span queries. A phrase query would behave completely differently as it would include analysis. Given that you already need to deal directly with terms, I'm not sure what a |
I know that it would include analyzis of the submitted text. That is the whole point. The tl;dr of it is that I have a DSL where I want to add the ablility to search for phrases in near clauses, something like:
These two examples is not the hardest to break down, but it is not hard to imagine worse cases. Now, the problem is that I am reluctant to implement the logic to break down the phrase into terms in my own (python) code, as I can currently spot two options, both of which are not very good:
My problem boils down to the fact that while I can make a DSL on top of yours that support phrases, terms, wildcards, and etc. in the normal clauses that do not care much about positioning. It becomes difficult when I want to add phrases to clauses that translate into span clauses. However, inside Elasticsearch, you know what analyzer to use for a given field and can just use that directly, you can perhaps even figure out what to do if that analyzer differs across multiple indices (or maybe that is something lucene would do better). |
Hi @cknv The bit I'm missing is this: you're already using span clauses, which are term based, so you already need to do the analysis to convert text to terms. e.g. to take your example:
Imagine you're using the So you already have to deal with analysis for the words outside the phrase. Why would the words inside the phrase be any different? I wonder if you shouldn't be looking at creating a plugin based on the surround query parser available in Lucene: https://lucene.apache.org/core/5_4_0/queryparser/index.html?org/apache/lucene/queryparser/surround/parser/package-summary.html |
I have a similar request. This can be implemented with a http://grokbase.com/t/gg/elasticsearch/12bv1ee7ah/forcing-analysis-of-terms-and-span-terms describes pretty much the same. |
It's true that I can deal with terms, but actually now that I think of it, it will break down if I ever decide to use stemming or smiliar modification of words that go into the index. I still think that part of the problem is having to replicate exactly what the different analyzers are doing, not to mention custom ones. As @mcuelenaere pointed out, I think this can provide a lot of help to provide the correct analysis of text into tokens. I am not sure how hard this is in Elasticsearch, but I hope that it could give the span queries a little more ease of use. Allowing developers to focus on whatever product we base upon Elasticsearch instead of having to figure out how to do text analysis. |
The span queries are low level term-oriented queries. They are building blocks that can be used to implement a custom query syntax, similar to the Really exposing them via the query DSL is a bit of an anomaly. Normally they'd be used by a query parser written in Java and living on the server. Analysis is a vital part of the construction of queries which use span queries. I think the solution here is to look for (or write) a custom query parser that supports operators like |
@clintongormley I agree that this is likely the solution. I use ES and did exactly this, see here for an example. |
Related to #11328 |
@elastic/es-search-aggs |
We don't have plan to add query parser for the span queries at the moment. |
How does the interval query handle the issue described here? Don't we still need to break up our search text into phrases before passing them into the intervals query? |
I am using
span_near
queries and while I can put inspan_terms
and evenspan_or
, I am missing aspan_phrase
clause in the query dsl.As I understand it, a regular
phrase
clause, is very similar (at least conceptually) to aspan_near
, where the slop is 0 and the terms are the phrase that has been run through the same analyzer as the field uses.However: while I could construct that
span_near
and itsspan_terms
by hand, I would have to be very careful making the terms (getting it close is easy; getting it right is hard) and I would only be able to cover one analyzer at a time, which is hardly ideal.Alternatively, I could also ask elasticsearch to analyze the phrase for me (according to the fields analyzer), and then use that in the query, but that would cost me an extra round trip. Not to mention that the query construction is actually in a library that is currently blissfully unaware of the actual elasticsearch nodes, it just builds the query.
So having elasticsearch take the phrase and construct a lucene
SpanNearQuery
or something akin to that, would be very nice, and save me a lot of trouble.Maybe the dsl could look something like:
Which I could then embed in my
span_near
, like any other span clause.The text was updated successfully, but these errors were encountered: