Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookup logic refactor(indexScan bug) #2745

Closed
czpmango opened this issue Aug 30, 2021 · 13 comments · Fixed by #3285
Closed

lookup logic refactor(indexScan bug) #2745

czpmango opened this issue Aug 30, 2021 · 13 comments · Fixed by #3285
Assignees
Labels
type/enhancement Type: make the code neat or more efficient
Milestone

Comments

@czpmango
Copy link
Contributor

czpmango commented Aug 30, 2021

Lookup for string type leads to wrong result.

This is my test:

(czp@nebula) [nba]> create space t2(vid_type=Fixed_string(30))
Execution succeeded (time spent 1812/2076 us)

(czp@nebula) [nba]> use t2
[ERROR (-1005)]: SpaceNotFound: 

(czp@nebula) [nba]> use t2
Execution succeeded (time spent 1900/2158 us)

(czp@nebula) [t2]> create tag person(name string)
Execution succeeded (time spent 1286/1526 us)

(czp@nebula) [t2]> create tag index p1 on person(name(3))
Execution succeeded (time spent 4182/4453 us)

(czp@nebula) [t2]> insert vertex person(name) values "1":("abc1")
Execution succeeded (time spent 1880/2109 us)

(czp@nebula) [t2]> insert vertex person(name) values "2":("abc2")
Execution succeeded (time spent 2007/2265 us)

(czp@nebula) [t2]> rebuild tag index p1
+------------+
| New Job Id |
+------------+
| 141        |
+------------+
Got 1 rows (time spent 1961/2292 us)

(czp@nebula) [t2]> show jobs
+--------+---------------------+------------+----------------------------+----------------------------+
| Job Id | Command             | Status     | Start Time                 | Stop Time                  |
+--------+---------------------+------------+----------------------------+----------------------------+
| 141    | "REBUILD_TAG_INDEX" | "FINISHED" | 2021-08-30T03:23:26.000000 | 2021-08-30T03:23:26.000000 |
+--------+---------------------+------------+----------------------------+----------------------------+
Got 1 rows (time spent 2046/2462 us)

(czp@nebula) [t2]> lookup on person where person.name=="abc"
+----------+
| VertexID |
+----------+
| "1"      |
+----------+
| "2"      |
+----------+
Got 2 rows (time spent 3309/3619 us)

(czp@nebula) [t2]> match (v:person) where v.name == "abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 7572/7896 us)

(czp@nebula) [t2]> lookup on person where person.name=="abc" yield person.name
+----------+-------------+
| VertexID | person.name |
+----------+-------------+
| "1"      | "abc"       |
+----------+-------------+
| "2"      | "abc"       |
+----------+-------------+
Got 2 rows (time spent 3279/3648 us)

(czp@nebula) [t2]> match (v:person) where v.name >= "abc" return v
+-----------------------------+
| v                           |
+-----------------------------+
| ("1" :person{name: "abc1"}) |
+-----------------------------+
| ("2" :person{name: "abc2"}) |
+-----------------------------+
Got 2 rows (time spent 7432/7804 us)

(czp@nebula) [t2]> match (v:person{name:"abc1"}) return v
+---+
| v |
+---+
+---+
Empty set (time spent 6614/6916 us)

(czp@nebula) [t2]> match (v:person) where v.name>"abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 6338/6665 us)

(czp@nebula) [t2]>match (v:person) where v.name=="abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 7427/7935 us)
(czp@nebula) [t2]> match (v:person) where v.name<="abc2" return v
+---+
| v |
+---+
+---+
Empty set (time spent 7728/8050 us)

@czpmango czpmango added priority/hi-pri Priority: high type/bug Type: something is unexpected labels Aug 30, 2021
@CPWstatic
Copy link
Contributor

match (v:person) where v.name == "abc" return v

Seems like the index has not been fully built up when this query applied.

@CPWstatic CPWstatic added this to the v2.6.0 milestone Aug 30, 2021
@czpmango czpmango changed the title lookup bug IndexScan bug Aug 30, 2021
@czpmango
Copy link
Contributor Author

czpmango commented Aug 30, 2021

match (v:person) where v.name == "abc" return v

Seems like the index has not been fully built up when this query applied.

(czp@nebula) [t2]> rebuild tag index p1
+------------+
| New Job Id |
+------------+
| 141        |
+------------+
Got 1 rows (time spent 1961/2292 us)

(czp@nebula) [t2]> show jobs
+--------+---------------------+------------+----------------------------+----------------------------+
| Job Id | Command             | Status     | Start Time                 | Stop Time                  |
+--------+---------------------+------------+----------------------------+----------------------------+
| 141    | "REBUILD_TAG_INDEX" | "FINISHED" | 2021-08-30T03:23:26.000000 | 2021-08-30T03:23:26.000000 |
+--------+---------------------+------------+----------------------------+----------------------------+
Got 1 rows (time spent 2046/2462 us)

@critical27
Copy link
Contributor

critical27 commented Aug 30, 2021

This is a known bug because of data exceed index length. Forum has report before. This involves a recheck if index length is fully occupied.

@czpmango
Copy link
Contributor Author

This is a known bug because of data exceed index length. Forum has report before. This involves a recheck if index length is fully occupied.

ACK.


Take a look this test:

(czp@nebula) [t2]> match (v:person) where v.name>"abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 6338/6665 us)

Truncation to build an index causes data that meets the criteria to be excluded.

@critical27
Copy link
Contributor

(czp@nebula) [t2]> match (v:person) where v.name>"abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 6338/6665 us)


Truncation to build an index causes data that meets the criteria to be excluded.

OK

@cangfengzhs
Copy link
Contributor

Take a look this test:

(czp@nebula) [t2]> match (v:person) where v.name>"abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 6338/6665 us)

Truncation to build an index causes data that meets the criteria to be excluded.

    case Value::Type::STRING: {
      if (!col.type.type_length_ref().has_value()) {
        return Value::kNullBadType;
      }
      std::vector<unsigned char> bytes(v.getStr().begin(), v.getStr().end());
      bytes.resize(*col.get_type().type_length_ref());
      for (size_t i = bytes.size();; i--) {
        if (i > 0) {
          if (bytes[i - 1]++ != 255) break;
        } else {
          return Value(std::string(*col.get_type().type_length_ref(), '\377'));
        }
      }
      return Value(std::string(bytes.begin(), bytes.end()));
    }

It's a logical error. String A gt "abc" is not mean ge "abd"

@Sophie-Xie Sophie-Xie added the need to discuss Solution: issue or PR without a clear conclusion on whether to handle it label Sep 1, 2021
@cangfengzhs
Copy link
Contributor

a complex case

(root@nebula) [t2]> fetch prop on t "1","2","3","4","5";
+-----------------------------------------------------------------------+
| vertices_                                                             |
+-----------------------------------------------------------------------+
| ("2" :t{address: "adds", age: -5, height: 10.0, id: 2, name: "namf"}) |
+-----------------------------------------------------------------------+
| ("1" :t{address: "addr", age: 10, height: -1.0, id: 1, name: "name"}) |
+-----------------------------------------------------------------------+
| ("3" :t{address: "adc", age: 100, height: 0.0, id: 3, name: "nam"})   |
+-----------------------------------------------------------------------+
| ("4" :t{address: "add", age: 10, height: 100.0, id: 4, name: "namd"}) |
+-----------------------------------------------------------------------+
| ("5" :t{address: "addr", age: 10, height: -1.0, id: 5, name: "name"}) |
+-----------------------------------------------------------------------+
Got 5 rows (time spent 4474/5446 us)

Thu, 02 Sep 2021 15:33:52 CST

(root@nebula) [t2]>  lookup on t where t.name=="nam" and t.address>="add";
+----------+
| VertexID |
+----------+
| "1"      |
+----------+
| "2"      |
+----------+
| "4"      |
+----------+
| "5"      |
+----------+
Got 4 rows (time spent 7843/8484 us)

Thu, 02 Sep 2021 15:33:57 CST

(root@nebula) [t2]> show create tag index t_name_address_index;
+------------------------+---------------------------------------------------+
| Tag Index Name         | Create Tag Index                                  |
+------------------------+---------------------------------------------------+
| "t_name_address_index" | "CREATE TAG INDEX `t_name_address_index` ON `t` ( |
|                        |  `name`(3),                                       |
|                        |  `address`(3)                                     |
|                        | )"                                                |
+------------------------+---------------------------------------------------+
Got 1 rows (time spent 1804/2399 us)

Thu, 02 Sep 2021 15:34:00 CST

@cangfengzhs
Copy link
Contributor

a float case

(root@nebula) [t2]> fetch prop on t "6","7","8","9";
+------------------------------------------------------------------------------+
| vertices_                                                                    |
+------------------------------------------------------------------------------+
| ("7" :t{address: "addr", age: 10, height: 1.2, id: 7, name: "name"})         |
+------------------------------------------------------------------------------+
| ("6" :t{address: "addr", age: 10, height: 1.1, id: 6, name: "name"})         |
+------------------------------------------------------------------------------+
| ("8" :t{address: "addr", age: 10, height: 1.11, id: 8, name: "name"})        |
+------------------------------------------------------------------------------+
| ("9" :t{address: "addr", age: 10, height: 1.100000001, id: 9, name: "name"}) |
+------------------------------------------------------------------------------+
Got 4 rows (time spent 3892/4784 us)

Thu, 02 Sep 2021 16:02:02 CST

(root@nebula) [t2]>  lookup on t where t.height>=1.1 and t.id>=6;
+----------+
| VertexID |
+----------+
| "6"      |
+----------+
| "7"      |
+----------+
| "8"      |
+----------+
| "9"      |
+----------+
Got 4 rows (time spent 6918/7565 us)

Thu, 02 Sep 2021 16:02:07 CST

(root@nebula) [t2]>  lookup on t where t.height>1.1 and t.id>=6;
+----------+
| VertexID |
+----------+
| "7"      |
+----------+
| "8"      |
+----------+
Got 2 rows (time spent 7009/7613 us)

Thu, 02 Sep 2021 16:02:12 CST

@czpmango
Copy link
Contributor Author

czpmango commented Sep 2, 2021

Take a look this test:

(czp@nebula) [t2]> match (v:person) where v.name>"abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 6338/6665 us)

Truncation to build an index causes data that meets the criteria to be excluded.

    case Value::Type::STRING: {
      if (!col.type.type_length_ref().has_value()) {
        return Value::kNullBadType;
      }
      std::vector<unsigned char> bytes(v.getStr().begin(), v.getStr().end());
      bytes.resize(*col.get_type().type_length_ref());
      for (size_t i = bytes.size();; i--) {
        if (i > 0) {
          if (bytes[i - 1]++ != 255) break;
        } else {
          return Value(std::string(*col.get_type().type_length_ref(), '\377'));
        }
      }
      return Value(std::string(bytes.begin(), bytes.end()));
    }

It's a logical error. String A gt "abc" is not mean ge "abd"

Truncation to build an index causes data that meets the criteria to be excluded. The design of the implementation layer needs to be reconsidered.

@czpmango
Copy link
Contributor Author

czpmango commented Sep 2, 2021

a complex case

(root@nebula) [t2]> fetch prop on t "1","2","3","4","5";
+-----------------------------------------------------------------------+
| vertices_                                                             |
+-----------------------------------------------------------------------+
| ("2" :t{address: "adds", age: -5, height: 10.0, id: 2, name: "namf"}) |
+-----------------------------------------------------------------------+
| ("1" :t{address: "addr", age: 10, height: -1.0, id: 1, name: "name"}) |
+-----------------------------------------------------------------------+
| ("3" :t{address: "adc", age: 100, height: 0.0, id: 3, name: "nam"})   |
+-----------------------------------------------------------------------+
| ("4" :t{address: "add", age: 10, height: 100.0, id: 4, name: "namd"}) |
+-----------------------------------------------------------------------+
| ("5" :t{address: "addr", age: 10, height: -1.0, id: 5, name: "name"}) |
+-----------------------------------------------------------------------+
Got 5 rows (time spent 4474/5446 us)

Thu, 02 Sep 2021 15:33:52 CST

(root@nebula) [t2]>  lookup on t where t.name=="nam" and t.address>="add";
+----------+
| VertexID |
+----------+
| "1"      |
+----------+
| "2"      |
+----------+
| "4"      |
+----------+
| "5"      |
+----------+
Got 4 rows (time spent 7843/8484 us)

Thu, 02 Sep 2021 15:33:57 CST

(root@nebula) [t2]> show create tag index t_name_address_index;
+------------------------+---------------------------------------------------+
| Tag Index Name         | Create Tag Index                                  |
+------------------------+---------------------------------------------------+
| "t_name_address_index" | "CREATE TAG INDEX `t_name_address_index` ON `t` ( |
|                        |  `name`(3),                                       |
|                        |  `address`(3)                                     |
|                        | )"                                                |
+------------------------+---------------------------------------------------+
Got 1 rows (time spent 1804/2399 us)

Thu, 02 Sep 2021 15:34:00 CST

Just the same thing.

@czpmango
Copy link
Contributor Author

czpmango commented Sep 2, 2021

a float case

(root@nebula) [t2]> fetch prop on t "6","7","8","9";
+------------------------------------------------------------------------------+
| vertices_                                                                    |
+------------------------------------------------------------------------------+
| ("7" :t{address: "addr", age: 10, height: 1.2, id: 7, name: "name"})         |
+------------------------------------------------------------------------------+
| ("6" :t{address: "addr", age: 10, height: 1.1, id: 6, name: "name"})         |
+------------------------------------------------------------------------------+
| ("8" :t{address: "addr", age: 10, height: 1.11, id: 8, name: "name"})        |
+------------------------------------------------------------------------------+
| ("9" :t{address: "addr", age: 10, height: 1.100000001, id: 9, name: "name"}) |
+------------------------------------------------------------------------------+
Got 4 rows (time spent 3892/4784 us)

Thu, 02 Sep 2021 16:02:02 CST

(root@nebula) [t2]>  lookup on t where t.height>=1.1 and t.id>=6;
+----------+
| VertexID |
+----------+
| "6"      |
+----------+
| "7"      |
+----------+
| "8"      |
+----------+
| "9"      |
+----------+
Got 4 rows (time spent 6918/7565 us)

Thu, 02 Sep 2021 16:02:07 CST

(root@nebula) [t2]>  lookup on t where t.height>1.1 and t.id>=6;
+----------+
| VertexID |
+----------+
| "7"      |
+----------+
| "8"      |
+----------+
Got 2 rows (time spent 7009/7613 us)

Thu, 02 Sep 2021 16:02:12 CST

This issue has nothing to do with float.

@cangfengzhs
Copy link
Contributor

a float case

(root@nebula) [t2]> fetch prop on t "6","7","8","9";
+------------------------------------------------------------------------------+
| vertices_                                                                    |
+------------------------------------------------------------------------------+
| ("7" :t{address: "addr", age: 10, height: 1.2, id: 7, name: "name"})         |
+------------------------------------------------------------------------------+
| ("6" :t{address: "addr", age: 10, height: 1.1, id: 6, name: "name"})         |
+------------------------------------------------------------------------------+
| ("8" :t{address: "addr", age: 10, height: 1.11, id: 8, name: "name"})        |
+------------------------------------------------------------------------------+
| ("9" :t{address: "addr", age: 10, height: 1.100000001, id: 9, name: "name"}) |
+------------------------------------------------------------------------------+
Got 4 rows (time spent 3892/4784 us)

Thu, 02 Sep 2021 16:02:02 CST

(root@nebula) [t2]>  lookup on t where t.height>=1.1 and t.id>=6;
+----------+
| VertexID |
+----------+
| "6"      |
+----------+
| "7"      |
+----------+
| "8"      |
+----------+
| "9"      |
+----------+
Got 4 rows (time spent 6918/7565 us)

Thu, 02 Sep 2021 16:02:07 CST

(root@nebula) [t2]>  lookup on t where t.height>1.1 and t.id>=6;
+----------+
| VertexID |
+----------+
| "7"      |
+----------+
| "8"      |
+----------+
Got 2 rows (time spent 7009/7613 us)

Thu, 02 Sep 2021 16:02:12 CST

This issue has nothing to do with float.

I don't quite agree.In fact, there are two bugs:

  1. When analyzing the expression, the condition of the string type will be truncated according to the index schema
  2. If the condition contains an open interval, IndexScanRule will convert the open interval into an incorrect closed interval. This also appeared in your comment.It’s just that you only mentioned the string type, but the float type has the same problem and needs to be fixed together.
(czp@nebula) [t2]> match (v:person) where v.name>"abc" return v
+---+
| v |
+---+
+---+
Empty set (time spent 6338/6665 us)

This query gets empty set because `v.name>"abc" ` will be converted to `v.name>="abd"`.

@Sophie-Xie Sophie-Xie removed the need to discuss Solution: issue or PR without a clear conclusion on whether to handle it label Sep 8, 2021
@Sophie-Xie Sophie-Xie modified the milestones: v2.6.0, v2.7.0 Sep 15, 2021
@Sophie-Xie
Copy link
Contributor

Sophie-Xie commented Sep 15, 2021

Technical Committee discuss it. It‘s modification is very complicated, so it will be fixed in the version at the end of the year.

@Sophie-Xie Sophie-Xie modified the milestones: v2.7.0, v3.0.0 Oct 15, 2021
@Sophie-Xie Sophie-Xie changed the title IndexScan bug lookup logic refactor(indexScan bug) Oct 18, 2021
@Sophie-Xie Sophie-Xie added type/enhancement Type: make the code neat or more efficient and removed type/bug Type: something is unexpected priority/hi-pri Priority: high labels Oct 18, 2021
@Sophie-Xie Sophie-Xie moved this to Backlog in Nebula Graph v3.0.0 Oct 28, 2021
@cangfengzhs cangfengzhs linked a pull request Nov 8, 2021 that will close this issue
7 tasks
@Sophie-Xie Sophie-Xie moved this from Backlog to Reviewing in Nebula Graph v3.0.0 Nov 10, 2021
Repository owner moved this from Reviewing to Done in Nebula Graph v3.0.0 Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Type: make the code neat or more efficient
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants