Skip to content

Commit

Permalink
Update full text index (#2131)
Browse files Browse the repository at this point in the history
  • Loading branch information
cooper-lzy authored Jun 25, 2023
1 parent f9a7f4c commit e52f822
Show file tree
Hide file tree
Showing 4 changed files with 163 additions and 119 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,21 @@ Before using the full-text index, make sure that you have deployed a Elasticsear

Before using the full-text index, make sure that you know the [restrictions](../../4.deployment-and-installation/6.deploy-text-based-index/1.text-based-index-restrictions.md).

## Natural language full-text search
## Full Text Queries

A natural language search interprets the search string as a phrase in natural human language. The search is case-sensitive and by default prefixes the string with a match. For example, there are three vertices with the tag `player`. The tag `player` contains the property `name`. The `name` of these three vertices are `Kevin Durant`, `Tim Duncan`, and `David Beckham`. Now that the full-text index of `player.name` is established, only `David Beckham` will be queried when using the prefix search statement `LOOKUP ON player WHERE PREFIX(player.name,"D");`.
Full-text queries enable you to search for parsed text fields, using a parser with strict syntax to return content based on the query string provided. For details, see [Query string query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query).

## Syntax

### Create full-text indexes

```ngql
CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} ([<prop_name>]);
CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} (<prop_name> [,<prop_name>]...) [ANALYZER="<analyzer_name>"];
```

- Composite indexes with multiple properties are supported when creating full-text indexes.
- `<analyzer_name>` is the name of the analyzer. The default value is `standard`. To use other analyzers (e.g. [IK Analysis](https://github.com/medcl/elasticsearch-analysis-ik)), you need to make sure that the corresponding analyzer is installed in Elasticsearch in advance.

### Show full-text indexes

```ngql
Expand All @@ -48,30 +51,17 @@ DROP FULLTEXT INDEX <index_name>;
### Use query options

```ngql
LOOKUP ON {<tag> | <edge_type>} WHERE <expression> [YIELD <return_list>];
<expression> ::=
PREFIX | WILDCARD | REGEXP | FUZZY
LOOKUP ON {<tag> | <edge_type>} WHERE ES_QUERY(<index_name>, "<text>") YIELD <return_list> [| LIMIT [<offset>,] <number_rows>];
<return_list>
<prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...]
<prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...] [, id(vertex) [AS <prop_alias>]] [, score() AS <score_alias>]
```

- PREFIX(schema_name.prop_name, prefix_string, row_limit, timeout)

- WILDCARD(schema_name.prop_name, wildcard_string, row_limit, timeout)

- REGEXP(schema_name.prop_name, regexp_string, row_limit, timeout)

- FUZZY(schema_name.prop_name, fuzzy_string, fuzziness, operator, row_limit, timeout)

- `fuzziness` (optional): Maximum edit distance allowed for matching. The default value is `AUTO`. For other valid values and more information, see [Elasticsearch document](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/common-options.html#fuzziness).

- `operator` (optional): Boolean logic used to interpret the text. Valid values are `OR` (default) and `AND`.
- `index_name`: The name of the full-text index.

- `row_limit` (optional): Specifies the number of rows to return. The default value is `100`.
- `text`: Search conditions. For supported syntax, see [Query string syntax](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax).

- `timeout` (optional): Specifies the timeout time. The default value is `200ms`.
- `score()`: The score calculated by doing N degree expansion for the eligible vertices. The default value is `1.0`. The higher the score, the higher the degree of match. The return value is sorted by default from highest to lowest score. For details, see [Search and Scoring in Lucene](https://lucene.apache.org/core/9_6_0/core/org/apache/lucene/search/package-summary.html#package.description).

## Examples

Expand All @@ -80,99 +70,166 @@ LOOKUP ON {<tag> | <edge_type>} WHERE <expression> [YIELD <return_list>];
nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));
// This example signs in the text service.
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP);
nebula> SIGN IN TEXT SERVICE (192.168.8.100:9200, HTTP);
// This example checks the text service status.
nebula> SHOW TEXT SEARCH CLIENTS;
+-----------------+-----------------+------+
| Type | Host | Port |
+-----------------+-----------------+------+
| "ELASTICSEARCH" | "192.168.8.100" | 9200 |
+-----------------+-----------------+------+
// This example switches the graph space.
nebula> USE basketballplayer;
// This example adds the listener to the NebulaGraph cluster.
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.5:9789;
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.100:9789;
// This example checks the listener status. When the status is `Online`, the listener is ready.
nebula> SHOW LISTENER;
+--------+-----------------+------------------------+-------------+
| PartId | Type | Host | Host Status |
+--------+-----------------+------------------------+-------------+
| 1 | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE" |
| 2 | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE" |
| 3 | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE" |
+--------+-----------------+------------------------+-------------+
// This example creates the tag.
nebula> CREATE TAG IF NOT EXISTS player(name string, age int);
nebula> CREATE TAG IF NOT EXISTS player(name string, city string);
// This example creates the full-text index. The index name starts with "nebula_".
nebula> CREATE FULLTEXT TAG INDEX nebula_index_1 ON player(name);
// This example creates a single-attribute full-text index.
nebula> CREATE FULLTEXT TAG INDEX fulltext_index_1 ON player(name) ANALYZER="standard";
// This example creates a multi-attribute full-text indexe.
nebula> CREATE FULLTEXT TAG INDEX fulltext_index_2 ON player(name,city) ANALYZER="standard";
// This example rebuilds the full-text index.
nebula> REBUILD FULLTEXT INDEX;
// This example shows the full-text index.
nebula> SHOW FULLTEXT INDEXES;
+------------------+-------------+-------------+--------+
| Name | Schema Type | Schema Name | Fields |
+------------------+-------------+-------------+--------+
| "nebula_index_1" | "Tag" | "player" | "name" |
+------------------+-------------+-------------+--------+
+--------------------+-------------+-------------+--------------+------------+
| Name | Schema Type | Schema Name | Fields | Analyzer |
+--------------------+-------------+-------------+--------------+------------+
| "fulltext_index_1" | "Tag" | "player" | "name" | "standard" |
| "fulltext_index_2" | "Tag" | "player" | "name, city" | "standard" |
+--------------------+-------------+-------------+--------------+------------+
// This example inserts the test data.
nebula> INSERT VERTEX player(name, age) VALUES \
"Russell Westbrook": ("Russell Westbrook", 30), \
"Chris Paul": ("Chris Paul", 33),\
"Boris Diaw": ("Boris Diaw", 36),\
"David West": ("David West", 38),\
"Danny Green": ("Danny Green", 31),\
"Tim Duncan": ("Tim Duncan", 42),\
"James Harden": ("James Harden", 29),\
"Tony Parker": ("Tony Parker", 36),\
"Aron Baynes": ("Aron Baynes", 32),\
"Ben Simmons": ("Ben Simmons", 22),\
"Blake Griffin": ("Blake Griffin", 30);
nebula> INSERT VERTEX player(name, city) VALUES \
"Russell Westbrook": ("Russell Westbrook", "Los Angeles"), \
"Chris Paul": ("Chris Paul", "Houston"),\
"Boris Diaw": ("Boris Diaw", "Houston"),\
"David West": ("David West", "Philadelphia"),\
"Danny Green": ("Danny Green", "Philadelphia"),\
"Tim Duncan": ("Tim Duncan", "New York"),\
"James Harden": ("James Harden", "New York"),\
"Tony Parker": ("Tony Parker", "Chicago"),\
"Aron Baynes": ("Aron Baynes", "Chicago"),\
"Ben Simmons": ("Ben Simmons", "Phoenix"),\
"Blake Griffin": ("Blake Griffin", "Phoenix");
// These examples run test queries.
nebula> LOOKUP ON player WHERE PREFIX(player.name, "B") YIELD id(vertex);
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Chris") YIELD id(vertex);
+--------------+
| id(VERTEX) |
+--------------+
| "Chris Paul" |
+--------------+
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Harden") YIELD properties(vertex);
+----------------------------------------------------------------+
| properties(VERTEX) |
+----------------------------------------------------------------+
| {_vid: "James Harden", city: "New York", name: "James Harden"} |
+----------------------------------------------------------------+
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Da*") YIELD properties(vertex);
+------------------------------------------------------------------+
| properties(VERTEX) |
+------------------------------------------------------------------+
| {_vid: "David West", city: "Philadelphia", name: "David West"} |
| {_vid: "Danny Green", city: "Philadelphia", name: "Danny Green"} |
+------------------------------------------------------------------+
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex);
+---------------------+
| id(VERTEX) |
+---------------------+
| "Russell Westbrook" |
| "Boris Diaw" |
| "Aron Baynes" |
| "Ben Simmons" |
| "Blake Griffin" |
+---------------------+
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex) | LIMIT 2,3;
+-----------------+
| id(VERTEX) |
+-----------------+
| "Boris Diaw" |
| "Aron Baynes" |
| "Ben Simmons" |
| "Blake Griffin" |
+-----------------+
nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") YIELD player.name, player.age;
+-----------------+-----+
| name | age |
+-----------------+-----+
| "Chris Paul" | 33 |
| "Boris Diaw" | 36 |
| "Blake Griffin" | 30 |
+-----------------+-----+
nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") YIELD player.name, player.age | YIELD count(*);
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex) | YIELD count(*);
+----------+
| count(*) |
+----------+
| 3 |
| 5 |
+----------+
nebula> LOOKUP ON player WHERE REGEXP(player.name, "R.*") YIELD player.name, player.age;
+---------------------+-----+
| name | age |
+---------------------+-----+
| "Russell Westbrook" | 30 |
+---------------------+-----+
nebula> LOOKUP ON player WHERE REGEXP(player.name, ".*") YIELD id(vertex);
+---------------------+
| id(VERTEX) |
+---------------------+
| "Danny Green" |
| "David West" |
...
nebula> LOOKUP ON player WHERE FUZZY(player.name, "Tim Dunncan", AUTO, OR) YIELD player.name;
+--------------+
| name |
+--------------+
| "Tim Duncan" |
+--------------+
// This example drops the full-text index.
nebula> DROP FULLTEXT INDEX nebula_index_1;
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex), score() AS score;
+---------------------+-------+
| id(VERTEX) | score |
+---------------------+-------+
| "Russell Westbrook" | 1.0 |
| "Boris Diaw" | 1.0 |
| "Aron Baynes" | 1.0 |
| "Ben Simmons" | 1.0 |
| "Blake Griffin" | 1.0 |
+---------------------+-------+
// For documents containing a word `b`, its score will be multiplied by a weighting factor of 4, while for documents containing a word `c`, the default weighting factor of 1 is used.
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*^4 OR *c*") YIELD id(vertex), score() AS score;
+---------------------+-------+
| id(VERTEX) | score |
+---------------------+-------+
| "Russell Westbrook" | 4.0 |
| "Boris Diaw" | 4.0 |
| "Aron Baynes" | 4.0 |
| "Ben Simmons" | 4.0 |
| "Blake Griffin" | 4.0 |
| "Chris Paul" | 1.0 |
| "Tim Duncan" | 1.0 |
+---------------------+-------+
// When using a multi-attribute full-text index query, the conditions are matched within all properties of the index.
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_2,"*h*") YIELD properties(vertex);
+------------------------------------------------------------------+
| properties(VERTEX) |
+------------------------------------------------------------------+
| {_vid: "Chris Paul", city: "Houston", name: "Chris Paul"} |
| {_vid: "Boris Diaw", city: "Houston", name: "Boris Diaw"} |
| {_vid: "David West", city: "Philadelphia", name: "David West"} |
| {_vid: "James Harden", city: "New York", name: "James Harden"} |
| {_vid: "Tony Parker", city: "Chicago", name: "Tony Parker"} |
| {_vid: "Aron Baynes", city: "Chicago", name: "Aron Baynes"} |
| {_vid: "Ben Simmons", city: "Phoenix", name: "Ben Simmons"} |
| {_vid: "Blake Griffin", city: "Phoenix", name: "Blake Griffin"} |
| {_vid: "Danny Green", city: "Philadelphia", name: "Danny Green"} |
+------------------------------------------------------------------+
// When using multi-attribute full-text index queries, you can specify different text for different properties for the query.
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_2,"name:*b* AND city:Houston") YIELD properties(vertex);
+-----------------------------------------------------------+
| properties(VERTEX) |
+-----------------------------------------------------------+
| {_vid: "Boris Diaw", city: "Houston", name: "Boris Diaw"} |
+-----------------------------------------------------------+
// Delete single-attribute full-text index.
nebula> DROP FULLTEXT INDEX fulltext_index_1;
```
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,31 @@

!!! caution

This topic introduces the restrictions for full-text indexes. Please read the restrictions very carefully before using the full-text indexes.
- This topic introduces the restrictions for full-text indexes. Please read the restrictions very carefully before using the full-text indexes.
- Version 3.5.0 redoes the full-text index function, which is not compatible with the previous versions, and requires deleting the previous index data and rebuilding the index.

For now, full-text search has the following limitations:

- Currently, full-text search supports `LOOKUP` statements only.

- The full-text index name must starts with `nebula_` and can contain only numbers, lowercase letters, and underscores.
- The full-text index name can contain only numbers, lowercase letters, and underscores.

- If there is a full-text index on the tag/edge type, the tag/edge type cannot be deleted or modified.

- The type of properties must be `STRING` or `FIXED_STRING`.

- Full-text index can not be applied to search multiple tags/edge types.

- Sorting for the returned results of the full-text search is not supported. Data is returned in the order of data insertion.

- Full-text index can not search properties with value `NULL`.

- Altering Elasticsearch indexes is not supported at this time.

- The pipe operator is not supported.
- Modifying the analyzer is not supported. You have to delete the index data and then specify the analyzer when you rebuild the index.

- `WHERE` clauses supports full-text search only working on single terms.

- Make sure that you start the Elasticsearch cluster and Nebula Graph at the same time. If not, the data writing on the Elasticsearch cluster can be incomplete.

- It may take a while for Elasticsearch to create indexes. If Nebula Graph warns no index is found, wait for the index to take effect (however, the waiting time is unknown and there is no code to check).
- It may take a while for Elasticsearch to create indexes. If Nebula Graph warns no index is found, you can check the status of the indexing task.

- NebulaGraph clusters deployed with K8s do not have native support for the full-text search feature. However, you can manually deploy the feature yourself.
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,9 @@ Before you start using the full-text index, please make sure that you know the [

To deploy an Elasticsearch cluster, see [Kubernetes Elasticsearch deployment](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-deploy-elasticsearch.html) or [Elasticsearch installation](https://www.elastic.co/guide/en/elasticsearch/reference/7.15/targz.html).

!!! compatibility
!!! note

For NebulaGraph 3.4 and later versions, no additional templates need to be created.

!!! caution

The full-text index name must starts with `nebula_`.
To support external network access to Elasticsearch, set `network.host` to `0.0.0.0` in `config/elasticsearch.yml`.

You can configure the Elasticsearch to meet your business needs. To customize the Elasticsearch, see [Elasticsearch Document](https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html).

Expand All @@ -33,7 +29,7 @@ SIGN IN TEXT SERVICE (<elastic_ip:port>, {HTTP | HTTPS} [,"<username>", "<passwo
### Example

```ngql
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP);
nebula> SIGN IN TEXT SERVICE (192.168.8.100:9200, HTTP);
```

!!! Note
Expand All @@ -58,13 +54,11 @@ SHOW TEXT SEARCH CLIENTS;

```ngql
nebula> SHOW TEXT SEARCH CLIENTS;
+-------------+------+
| Host | Port |
+-------------+------+
| "127.0.0.1" | 9200 |
| "127.0.0.1" | 9200 |
| "127.0.0.1" | 9200 |
+-------------+------+
+-----------------+-----------------+------+
| Type | Host | Port |
+-----------------+-----------------+------+
| "ELASTICSEARCH" | "192.168.8.100" | 9200 |
+-----------------+-----------------+------+
```

## Sign out to the text search clients
Expand Down
Loading

0 comments on commit e52f822

Please sign in to comment.