Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bitnine Tech] - When it executes VLE(Variable Length Edge) query, why it occurs an error just the first try? #365

Closed
Hyundong-Seo opened this issue Nov 23, 2022 · 36 comments
Labels
question Further information is requested

Comments

@Hyundong-Seo
Copy link

vle_execute_error
The first time I try the VLE query, I always get an insert_vertex_edge error.
A second attempt at the same query executes the query.

ldbc=# SELECT * from cypher('test', $$
  EXPLAIN ANALYZE
  MATCH (v:comment)-[e:hascreatorcomment*1..3]->(v2:person)
  WHERE v.id = '962072674361'
  RETURN v, v2
$$) as (v agtype, v2 agtype);
ERROR:  insert_vertex_edge: failed to insert
@Hyundong-Seo Hyundong-Seo added the question Further information is requested label Nov 23, 2022
@jrgemignani
Copy link
Contributor

jrgemignani commented Nov 23, 2022

Thank you for your question!

For any issues like this, please provide the following information to allow us to look into it -

  1. The version of PostgreSQL you are using.
  2. The version of AGE you are using.
  3. The OS that you are using.
  4. A simple explanation of the dataset being used.
  5. A small sample of the dataset that causes the issue so that we can reproduce it.
  6. An example showing the issue.
  7. What your are trying to do.

Unfortunately, until I know more, I cannot give you an answer.

john

@Hyundong-Seo Hyundong-Seo changed the title When it executes VLE(Variable Length Edge) query, why it occurs an error just the first try? [Bitnine Tech] - When it executes VLE(Variable Length Edge) query, why it occurs an error just the first try? Nov 24, 2022
@Hyundong-Seo
Copy link
Author

The information tested is as follows.

  1. PG11.17
  2. AGE - 1.1.0 / commit version - 4e9110d
  3. Ubuntu 20.04.5 LTS
  4. The start vertex is about COMMENT, the end vertex is about PERSON and the edge is about simple relation between comment and person.
  5. I don't know whether you want to see a dataset like this or other type.
    Edges greater than 2 depths may or may not exist.
    [start vertex - comment]
    {
    "id": 845661880713217,
    "label": "comment",
    "properties": {
    "id": "1236950581249",
    "id": 1236950581249,
    "length": "3",
    "content": "yes",
    "locationIP": "92.39.58.88",
    "browserUsed": "Chrome",
    "creationDate": "2011-08-17T14:26:59.961+0000"
    }
    }
    [edge - hascreatorcomment]
    {
    "id": 3940649673949185,
    "label": "hascreatorcomment",
    "end_id": 1981319953259400,
    "start_id": 845661880713217,
    "properties": {}
    }
    [end vertex - person]
    {
    "id": 1981319953259400,
    "label": "person",
    "properties": {
    "id": "10995116284808",
    "id": 10995116284808,
    "gender": "male",
    "birthday": "1982-02-04",
    "lastName": "Condariuc",
    "firstName": "Andrei",
    "locationIP": "92.39.58.88",
    "browserUsed": "Chrome",
    "creationDate": "2010-12-26T14:40:36.649+0000"
    }
    }
  6. I just executed it because I wanted to see the plan of the VLE lookup query, but an error occurred.
    But if I execute directly the same query again, it works.
    [the first try]
    ldbc=# SELECT * from cypher('test', $$
    ldbc$# EXPLAIN ANALYZE
    ldbc$# MATCH (v:comment)-[e:hascreatorcomment1..3]->(v2:person)
    ldbc$# WHERE v.id = '962072674361'
    ldbc$# RETURN v, v2
    ldbc$# $$) as (v agtype, v2 agtype);
    ERROR: insert_vertex_edge: failed to insert
    [The second try]
    ldbc=# SELECT * from cypher('test', $$
    EXPLAIN ANALYZE
    MATCH (v:comment)-[e:hascreatorcomment
    1..3]->(v2:person)
    WHERE v.id = '962072674361'
    RETURN v, v2
    $$) as (v agtype, v2 agtype);
    QUERY PLAN


Nested Loop (cost=0.01..1860999881.03 rows=33830636617 width=64) (actual time=3.128..9990.038 rows=1 loops=1)
Join Filter: age_match_vle_terminal_edge(v.id, v2.id, _age_default_alias_0.edges)
Rows Removed by Join Filter: 9891
-> Nested Loop (cost=0.01..314225.21 rows=10260000 width=267) (actual time=1.212..9976.277 rows=1 loops=1)
-> Seq Scan on comment v (cost=0.00..109025.20 rows=10260 width=235) (actual time=0.420..9975.479 rows=1 loops=1)
Filter: (agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(id, _label_name('41527'::oid, id), properties), '"id"'::agtype]) = '"962072674361"'
::agtype)
Rows Removed by Filter: 2052168
-> Function Scan on age_vle _age_default_alias_0 (cost=0.01..10.01 rows=1000 width=32) (actual time=0.785..0.787 rows=1 loops=1)
-> Materialize (cost=0.00..514.38 rows=9892 width=263) (actual time=0.014..10.410 rows=9892 loops=1)
-> Seq Scan on person v2 (cost=0.00..464.92 rows=9892 width=263) (actual time=0.009..1.965 rows=9892 loops=1)
Planning Time: 0.594 ms
Execution Time: 9991.038 ms
(12 rows)
7. I have already checked by executing the query what I want to do, but I wonder why the error occurs only on the first try.

I hope this answer has been sufficient.

Hyundong

@jrgemignani
Copy link
Contributor

jrgemignani commented Nov 29, 2022

Does the command error if you don't use EXPLAIN at all?

@Hyundong-Seo
Copy link
Author

@jrgemignani I get the same error, even if I don't use EXPLAIN

@jrgemignani
Copy link
Contributor

@Hyundong-Seo, I am unable to reproduce your issue. So, I will need more information -

  1. What is the dataset that you are using? How is it loaded? Where is it from?
  2. Where is the v.id that you are looking for from?

For that id - WHERE v.id = '962072674361' - is that from the vertex's internal id or is it from some property that is stored and named id?

I should note that if it is from the vertex's id, you need to use the id() function -

psql-11.5-5432-pgsql=# SELECT * from cypher('graph1', $$ MATCH (v:comment)-[e:hascreatorcomment]->(v2:person)
  WHERE v.id = 844424930131969 RETURN v, v2 $$) as (v agtype, v2 agtype);
 v | v2
---+----
(0 rows)

psql-11.5-5432-pgsql=# SELECT * from cypher('graph1', $$ MATCH (v:comment)-[e:hascreatorcomment]->(v2:person)
  WHERE id(v) = 844424930131969 RETURN v, v2 $$) as (v agtype, v2 agtype);
                                   v                                   |                                  v2

-----------------------------------------------------------------------+------------------------------------------------------------
-----------
 {"id": 844424930131969, "label": "comment", "properties": {}}::vertex | {"id": 1407374883553281, "label": "person", "properties": {
}}::vertex
(1 row)

psql-11.5-5432-pgsql=#

@jrgemignani
Copy link
Contributor

@Hyundong-Seo ?

@Hyundong-Seo
Copy link
Author

@jrgemignani
I'm sorry to response too late.

  1. What is the dataset that you are using? How is it loaded? Where is it from?
    -> This is the csv file used when using agensgraph.
  2. Where is the v.id that you are looking for from?
    -> it is from properties not from the vertex's internal id, like this
    {"id": 845387002806329, "label": "comment", "properties": {"id": "962072674361", "id": 962072674361, "length": "83", "content": "About Bertolt Brecht rn; 10 FebruarAbout France Pierre-Louis About Danc
    ing on My", "locationIP": "213.55.127.9", "browserUsed": "Internet Explorer", "creationDate": "2011-04-14T05:20:46.802+0000"}}

When I tried it again, but it was executed today.
thank you for your reply

@jrgemignani
Copy link
Contributor

That happens sometimes and it is hard to figure out why. If you happen to find something like this again, that is repeatable with a specific dataset, please let us know.

@avowkind
Copy link

avowkind commented Apr 9, 2024

I know this is closed. but I am getting the same issue.

After a long time of having no problem with VLE requests just today - on one instance of my database and not on another a request with * on the edge request causes the ERROR: "insert_vertex_edge: failed to insert" to appear.

This occurs on a variety of requests that usually work just fine.

I have found that if I establish an ongoing session e.g. using psql that I get this error the first time the request is run. Subsequent runs give the correct result without an error.

Requests going in through my App API always fail because each is run in a new session.

Why would running the query twice fix the problem?

@jrgemignani
Copy link
Contributor

jrgemignani commented Apr 9, 2024

@avowkind What version of AGE are you using? What PG version?

@jrgemignani
Copy link
Contributor

From the code it looks to only give that specific error if the start or end vertex for the edge don't exist at the time of the vle command execution.

@avowkind
Copy link

avowkind commented Apr 9, 2024

is it worth creating a new bug report?

Describe the bug
A clear and concise description of what the bug is.

It is possible to get the database into a state where a request of the form:

SELECT * FROM cypher('fishpond', $$
MATCH (p:PropertyType)<-[edge:typeOf*]-(node:PropertyType)
WHERE p.id = 'fish_weight_g'
RETURN node, edge
$$) as ( node agtype, edge agtype);

returns the error

ERROR,XX000,"insert_vertex_edge: failed to insert"

While the same request with [edge:typeOf] i.e no variable length path on the edge succeeds

However, a second attempt in the same session will succeed.

Is is an instance of #365 ( which was closed with insufficient information).

How are you accessing AGE (Command line, driver, etc.)?

  • Can be reproduced in psql, PGAdmin and nodejs driver.

What data setup do we need to do?

  • I'm not sure I can reproduce this issue in a small example as it only occurred after some time of use with a large dataset.
  • however the same issue does appear in a db setup from a backup of the production db. i.e. I can provide an example if required although this is

What is the necessary configuration info needed?

  • docker image: apache/age:PG15_latest

What is the command that caused the error?

SELECT * FROM cypher('fishpond', $$
MATCH (p:PropertyType)<-[edge:typeOf*]-(node:PropertyType)
WHERE p.id = 'fish_weight_g'
RETURN node, edge
$$) as ( node agtype, edge agtype);
ERROR,XX000,"insert_vertex_edge: failed to insert"

Expected behavior
on repeating the request we get back

"{""id"": 844424930131972, ""label"": ""PropertyType"", ""properties"": {""id"": ""average_fish_weight_g"", ""name"": ""Average fish weight"", ""published"": true, ""relations"": [{""id"": ""fish_weight_g"", ""relation"": ""typeOf""}], ""resultType"": ""measurement"", ""description"": ""The average weight of the fish in the tank""}}::vertex"	"{""id"": 1125899906842627, ""label"": ""typeOf"", ""end_id"": 844424930131971, ""start_id"": 844424930131972, ""properties"": {}}::edge"
"{""id"": 844424930131974, ""label"": ""PropertyType"", ""properties"": {""id"": ""culled_fish_weight_g"", ""name"": ""Fish Weight (culled)"", ""units"": ""g"", ""published"": true, ""relations"": [{""id"": ""fish_weight_g"", ""relation"": ""typeOf""}], ""resultType"": ""measurement"", ""description"": ""Weight of fish after culling""}}::vertex"	"{""id"": 1125899906842628, ""label"": ""typeOf"", ""end_id"": 844424930131971, ""start_id"": 844424930131974, ""properties"": {}}::edge"
"{""id"": 844424930131975, ""label"": ""PropertyType"", ""properties"": {""id"": ""mortality_fish_weight_g"", ""name"": ""Fish Weight (mortality)"", ""units"": ""g"", ""published"": true, ""relations"": [{""id"": ""fish_weight_g"", ""relation"": ""typeOf""}], ""resultType"": ""measurement"", ""description"": ""Weight of dead fish""}}::vertex"	"{""id"": 1125899906842629, ""label"": ""typeOf"", ""end_id"": 844424930131971, ""start_id"": 844424930131975, ""properties"": {}}::edge"
"{""id"": 844424930131976, ""label"": ""PropertyType"", ""properties"": {""id"": ""sampled_live_fish_weight_g"", ""name"": ""Fish Weight (sampled)"", ""units"": ""g"", ""published"": true, ""relations"": [{""id"": ""fish_weight_g"", ""relation"": ""typeOf""}], ""resultType"": ""measurement"", ""description"": ""Weight of live sampled fish""}}::vertex"	"{""id"": 1125899906842630, ""label"": ""typeOf"", ""end_id"": 844424930131971, ""start_id"": 844424930131976, ""properties"": {}}::edge"
"{""id"": 844424930131977, ""label"": ""PropertyType"", ""properties"": {""id"": ""morphometric_fish_weight_g"", ""name"": ""Fish Weight (morphometric)"", ""units"": ""g"", ""published"": true, ""relations"": [{""id"": ""fish_weight_g"", ""relation"": ""typeOf""}], ""resultType"": ""measurement"", ""description"": ""Weight of live fish in tank calculated from morphometric imaging""}}::vertex"	"{""id"": 1125899906842631, ""label"": ""typeOf"", ""end_id"": 844424930131971, ""start_id"": 844424930131977, ""properties"": {}}::edge"

@jrgemignani
Copy link
Contributor

@avowkind If you can provide a relatively simple dataset with a repeatable example, that would be really helpful in debugging this :)

@avowkind
Copy link

avowkind commented Apr 9, 2024

I can provide the database backup (its 18mb) but its not an especially complicated model.

@jrgemignani
Copy link
Contributor

Anything that is repeatable is helpful. I wrote the VLE code and have thoroughly tested it and never seen that error message. So, I'm a bit blind to what might be going on.

@jrgemignani jrgemignani reopened this Apr 9, 2024
@jrgemignani
Copy link
Contributor

@avowkind I reopened this issue. Hopefully, we'll get to the bottom of it this time :)

@jrgemignani
Copy link
Contributor

@avowkind Are there updates to the vertices and edges going on in the background or in another process when this happens?

@avowkind
Copy link

are there updates to the vertices and edges running?
No - I'm able to reproduce on a back up copy with no other users or sessions running.

Short video showing the issue.

Apache.Age.bug.mp4

However I do have the gut feeling that this is something to do with a query or activity that left the db in an unusual state. Is it possible to have an edge without a node for example?

But why would this work on the second try?

@jrgemignani
Copy link
Contributor

jrgemignani commented Apr 10, 2024

@avowkind The second try makes sense. Try the following right before executing the command -

-- delete all graph contexts
-- should return true
SELECT * FROM cypher('ag_graph_1', $$ RETURN delete_global_graphs(NULL) $$) AS (result agtype);

For you that would be just -

RETURN delete_global_graphs(NULL);

Oh, and paste in the output.

@avowkind
Copy link

When run for the first time in the session I get a result of 'false'. The following query still fails.

in fact it ensures that the script fails every time instead of just the first time.

@avowkind
Copy link

Further to the 2nd try element

in the following code the delete_global_graphs ensures that the PropertyType query fails. but the following Feature query succeeds although it is on a different entity and edge type.

LOAD 'age';
SET search_path = ag_catalog, "$user", public;

-- delete all graph contexts
-- should return true
SELECT * FROM cypher('fishpond', $$ RETURN delete_global_graphs(NULL) $$) AS (result agtype);

SELECT * FROM cypher('fishpond', $$ 
MATCH (p:PropertyType)<-[edge:typeOf*0..]-(node:PropertyType)
WHERE p.id = 'fish_weight_g'
RETURN node, edge 
$$) as ( node agtype, edge agtype);

SELECT * FROM cypher('fishpond', $$ 
MATCH (p:Feature)<-[edge:partOf*0..]-(node:Feature)
WHERE p.id = 'pfr:nelson:tank:A01'
RETURN node, edge 
$$) as ( node agtype, edge agtype);

@jrgemignani
Copy link
Contributor

@avowkind That it fails consistently when using that function actually helps narrow it down.

@jrgemignani
Copy link
Contributor

@avowkind I know why it fails and then works -

Technically speaking, it shouldn't work as that error is a fatal error, that's why it fails the first run. However, the context isn't cleaned up, so it works the second run because a context already exists and it doesn't need to build a new one. As to why it is getting an edge that doesn't have a valid start or end id and why that edge it got doesn't exist,... I'm still looking into that.

For me, it is failing on an edge found in "featureOfInterest" -

(gdb) p {edge_id, edge_vertex_start_id, edge_vertex_end_id}
$4 = {3096224744137124, 2533274790715812, 1407374883553780}
fishpond=# SELECT * FROM cypher('fishpond', $$MATCH ()-[edge:featureOfInterest]->() where id(edge) > 3096224744137122 and id(edge) < 3096224744137126 RETURN edge ORDER BY id(edge) $$) as (edge agtype);
                                                                   edge

------------------------------------------------------------------------------------------------------------------------------------
------
 {"id": 3096224744137123, "label": "featureOfInterest", "end_id": 1407374883553780, "start_id": 2533274790715811, "properties": {}}:
:edge
 {"id": 3096224744137125, "label": "featureOfInterest", "end_id": 1407374883555256, "start_id": 2533274790715813, "properties": {}}:
:edge
(2 rows)

fishpond=#

@avowkind
Copy link

avowkind commented Apr 10, 2024 via email

@avowkind
Copy link

avowkind commented Apr 10, 2024 via email

@jrgemignani
Copy link
Contributor

@avowkind The odd thing is that, at least for me, the edge does not exist when I do a select. Yet, the load logic clearly gets that edge, which doesn't seem to exist. This is a weird case for sure.

@jrgemignani
Copy link
Contributor

Only the end vertex for the edge exists.

fishpond=# SELECT * FROM cypher('fishpond', $$ MATCH (u) WHERE id(u) = 2533274790715812 RETURN u $$) as (u agtype);
 u
---
(0 rows)

fishpond=# SELECT * FROM cypher('fishpond', $$ MATCH (u) WHERE id(u) = 1407374883553780 RETURN u $$) as (u agtype);
                                                                                                                       u

------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
 {"id": 1407374883553780, "label": "Feature", "properties": {"id": "pfr:nelson:tank:C09", "name": "C09", "relations": [{"id": "pfr:n
elson:tank_group:C", "relation": "partOf"}], "description": "C09", "featureGroup": "pfr:fg:tank"}}::vertex
(1 row)

fishpond=#

@avowkind
Copy link

avowkind commented Apr 10, 2024 via email

@jrgemignani
Copy link
Contributor

@avowkind Yeah, the VLE goes on the assumption that everything is set up correctly, meaning every edge that exists has vertices that exist. If it comes across issues, it just quits because it might be a data corruption issue.

I'm going to put a PR together to have the VLE highlight these cases when the VLE loads. This way a user can find them. I will also fix it so that it just gives warning messages and continues.

Thoughts?

@avowkind
Copy link

avowkind commented Apr 11, 2024 via email

@jrgemignani
Copy link
Contributor

jrgemignani commented Apr 11, 2024

Thanks - this was a good debug experience.

Yes, it was :)

I wonder that we went anywhere near edges that link to observations.

This has to do with how the VLE load and context system works. In the beginning, I tried to limit what it loaded. However, that ended up requiring more contexts, more reloads, and more complexity. So, I opted for just loading everything which meant a bigger context, but only one per graph and fewer loads.

The big difference with the VLE is that it has a graph representation, not tables. This allows graph algorithms to work more directly with it.

@jrgemignani
Copy link
Contributor

@avowkind I have modified the VLE to help identify these "dangling" edges -

fishpond=# load 'age'; set search_path TO ag_catalog;
LOAD
SET
fishpond=# SELECT * FROM cypher('fishpond', $$
MATCH (p:PropertyType)<-[edge:typeOf*0..]-(node:PropertyType)
WHERE p.id = 'fish_weight_g'
RETURN node, edge
$$) as ( node agtype, edge agtype);
WARNING:  edge: [id: 3096224744137124, start: 2533274790715812, end: 1407374883553780, label: featureOfInterest] missing start vertex
WARNING:  ignored dangling edge
WARNING:  edge: [id: 3377699720847780, start: 2533274790715812, end: 2251799813685250, label: in] missing start vertex
WARNING:  ignored dangling edge

The remainder of the output was left out. But, the query will run with this modification.

I may change some of the wording before it goes to a PR.

@jrgemignani
Copy link
Contributor

@avowkind I have a PR for the master branch #1742 that will add in the new messaging and ignore these errors in the VLE. I will try to get it to the other branches by later today so that you can try it out.

@jrgemignani
Copy link
Contributor

jrgemignani commented Apr 12, 2024

@avowkind There is also a PR for PG15 #1744 that should be merged and available shortly. Then you will be able to test it out.

@jrgemignani
Copy link
Contributor

@avowkind PG15 should have a docker build that is updated now.

@jrgemignani
Copy link
Contributor

@avowkind All supported branches are now updated. As this is now understood and resolved, I will close the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants