Skip to content

Commit

Permalink
Merge pull request #2559 from WheresMyStapler/text-predicate-negations
Browse files Browse the repository at this point in the history
Add negations to all text predicates
  • Loading branch information
porunov authored Apr 24, 2021
2 parents 1bf9fa3 + bc65403 commit 4228d4b
Show file tree
Hide file tree
Showing 16 changed files with 1,315 additions and 148 deletions.
26 changes: 26 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,32 @@ Following, classes are removed and have to be replaced by tinkerpop equivalent:
| `org.janusgraph.channelizers.JanusGraphNioChannelizer` | `org.apache.tinkerpop.gremlin.server.channel.NioChannelizer` |
| `org.janusgraph.channelizers.JanusGraphWsAndHttpChannelizer` | `org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer` |

##### Breaking change Lucene and Solr fuzzy predicates

The text predicates `text.textFuzzy` and `text.textContainsFuzzy` have been updated in both the Lucene and Solr indexing
backends to align with JanusGraph and Elastic. These predicates now inspect the query length to determine the Levenshtein
distance, where previously they used the backend's default max distance of 2:

- 0 for strings of one or two characters (exact match)
- 1 for strings of three, four or five characters
- 2 for strings of more than five characters

**Change Matrix:**

| text | query | previous result | new result |
| --- | --- | --- | --- |
| ah | ah | true | true |
| ah | ai | true | **false** |
| hop | hop | true | true |
| hop | hap | true | true |
| hop | hoop | true | true |
| hop | hooop | true | **false** |
| surprises | surprises | true | true |
| surprises | surprizes | true | true |
| surprises | surpprises | true | true |
| surprises | surpprisess | false | false |


### Version 0.5.3 (Release Date: December 24, 2020)

=== "Maven"
Expand Down
16 changes: 16 additions & 0 deletions docs/index-backend/text-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,15 @@ g.V().has('booksummary', textContainsRegex('.*corn.*'))
g.V().has('booksummary', textContainsFuzzy('unicorn'))
```

The Elasticsearch backend extends this functionality and includes support for negations
of the above predicates, as well as phrase matching:

- `textNotContains`: is true if no words inside the text string match the query string
- `textNotContainsPrefix`: is true if no words inside the text string begin with the query string
- `textNotContainsRegex`: is true if no words inside the text string match the given regular expression
- `textNotContainsFuzzy`: is true if no words inside the text string are similar to the query string (based on Levenshtein edit distance)
- `textNotContainsPhrase`: is true if the text string does not contain the sequence of words in the query string

String search predicates (see below) may be used in queries, but those
require filtering in memory which can be very costly.

Expand Down Expand Up @@ -111,6 +120,13 @@ g.V().has('bookname', textRegex('.*corn.*'))
g.V().has('bookname', textFuzzy('unicorn'))
```

The Elasticsearch backend extends this functionality and includes support for negations
of the above text predicates:

- `textNotPrefix`: if the string value does not start with the given query string
- `textNotRegex`: if the string value does not match the given regular expression in its entirety
- `textNotFuzzy`: if the string value is not similar to the given query string (based on Levenshtein edit distance)

Full-text search predicates may be used in queries, but those require
filtering in memory which can be very costly.

Expand Down
11 changes: 10 additions & 1 deletion docs/interactions/search-predicates.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,22 @@ The `Text` enum specifies the [Text Search](../index-backend/text-search.md) use

* Text search predicates which match against the individual words inside a text string after it has been tokenized. These predicates are not case sensitive.
- `textContains`: is true if (at least) one word inside the text string matches the query string
- `textNotContains`: is true if no words inside the text string match the query string
- `textContainsPrefix`: is true if (at least) one word inside the text string begins with the query string
- `textNotContainsPrefix`: is true if no words inside the text string begin with the query string
- `textContainsRegex`: is true if (at least) one word inside the text string matches the given regular expression
- `textContainsFuzzy`: is true if (at least) one word inside the text string is similar to the query String (based on Levenshtein edit distance)
- `textNotContainsRegex`: is true if no words inside the text string match the given regular expression
- `textContainsFuzzy`: is true if (at least) one word inside the text string is similar to the query string (based on Levenshtein edit distance)
- `textNotContainsFuzzy`: is true if no words inside the text string are similar to the query string (based on Levenshtein edit distance)
- `textContainsPhrase`: is true if the text string contains the exact sequence of words in the query string
- `textNotContainsPhrase`: is true if the text string does not contain the sequence of words in the query string
* String search predicates which match against the entire string value
- `textPrefix`: if the string value starts with the given query string
- `textNotPrefix`: if the string value does not start with the given query string
- `textRegex`: if the string value matches the given regular expression in its entirety
- `textNotRegex`: if the string value does not match the given regular expression in its entirety
- `textFuzzy`: if the string value is similar to the given query string (based on Levenshtein edit distance)
- `textNotFuzzy`: if the string value is not similar to the given query string (based on Levenshtein edit distance)

See [Text Search](../index-backend/text-search.md) for more information about full-text and string search.

Expand Down
Loading

0 comments on commit 4228d4b

Please sign in to comment.