Skip to content
This repository has been archived by the owner on Sep 3, 2021. It is now read-only.

ORDER BY Performance #239

Closed
appinteractive opened this issue May 2, 2019 · 9 comments · Fixed by #247
Closed

ORDER BY Performance #239

appinteractive opened this issue May 2, 2019 · 9 comments · Fixed by #247

Comments

@appinteractive
Copy link

Hey @johnymontana we have real issues with performace when using order by in combination with dynamic fields as they are applied to ALL results BEFORE the order by! That leads to an query execution time of 30s in our case.

Without the order by it works fine! @johnymontana Is there any way we can work around that issue without throwing two queries at the api? (one to get the id's in sorted and paginated way and a second to get the data by id's)

Here is the generated query:

MATCH (`post`:`Post` {deleted:false, disabled:false}) 
RETURN `post` { 
  .deleted, 
  .disabled, 
  .createdAt, 
  .id, 
  .title, 
  .contentExcerpt, 
  .createdAt, 
  .disabled, 
  .deleted, 
  .slug, 
  .image,
  author: head([(`post`)<-[:`WROTE`]-(`post_author`:`User`) | post_author { 
    .deleted , 
    .disabled , 
    .createdAt , 
    .id , 
    .avatar , 
    .slug , 
    .name , 
    .disabled , 
    .deleted , 
    contributionsCount: apoc.cypher.runFirstColumn("MATCH (this)-[:WROTE]->(r:Post)\nWHERE (NOT exists(r.deleted) OR r.deleted = false)\nAND (NOT exists(r.disabled) OR r.disabled = false)\nRETURN COUNT(r)", {this: post_author, cypherParams: {currentUserId: null}}, false),
    shoutedCount: apoc.cypher.runFirstColumn("MATCH (this)-[:SHOUTED]->(r:Post) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(DISTINCT r)", {this: post_author, cypherParams: {currentUserId: null}}, false),
    commentsCount: apoc.cypher.runFirstColumn("MATCH (this)-[:WROTE]->(r:Comment) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(r)", {this: post_author, cypherParams: {currentUserId: null}}, false),
    followedByCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:FOLLOWS]-(r:User) RETURN COUNT(DISTINCT r)", {this: post_author, cypherParams: {currentUserId: null}}, false),
    followedByCurrentUser: apoc.cypher.runFirstColumn("MATCH (this)<-[:FOLLOWS]-(u:User {id: {currentUserId: null}.currentUserId})\nRETURN COUNT(u) >= 1", {this: post_author, cypherParams: {currentUserId: null}}, false),
    location: head([ post_author_location IN apoc.cypher.runFirstColumn("MATCH (this)-[:IS_IN]->(l:Location) RETURN l", {this: post_author, cypherParams: {currentUserId: null}}, true) | post_author_location { .id }]) ,
    badges: [(`post_author`)<-[:`REWARDED`]-(`post_author_badges`:`Badge`) | post_author_badges { .id, .key, .icon }] }]) ,
  commentsCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:COMMENTS]-(r:Comment) WHERE NOT r.deleted = true AND NOT r.disabled = true  RETURN COUNT(r)", {this: post, cypherParams: {currentUserId: null}}, false),
  categories: [(`post`)-[:`CATEGORIZED`]->(`post_categories`:`Category`) | post_categories { .id , .id , .name , .icon }] ,
  shoutedCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:SHOUTED]-(r:User) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(DISTINCT r)", {this: post, cypherParams: {currentUserId: null}}, false)
} AS `post` 
ORDER BY post.createdAt DESC 
SKIP 0 
LIMIT 10
@jexp
Copy link
Contributor

jexp commented May 3, 2019

If you run this query manually in Neo4j browser.

Which of the 7 fields causes the main slowdown?

Can you also share your graphql query?

Would it be possible to combine the queries into one field that has multiple results (returns a map).

Do you have indexes on deleted/disabled?

I would also recommend to move deleted/disabled to labels as those are much faster to fetch.
Especially if you often query for negations, then using something like :Active (non deleted), :Enabled (non disabled) would make the query much more efficient.

then you could use queries like:

MATCH (this)<-[:SHOUTED]-(r:User:Active:Enabled) RETURN COUNT(DISTINCT r)

If you're on 3.5 having an index on created at can help too for the ordering if you add a filter that determines the type.

@jexp
Copy link
Contributor

jexp commented May 3, 2019

@johnymontana to enable index-backed-order-by in 3.5 we could add a sort hint, aka. WHERE post.createdAt > 0 (or where post.createdAt > date(0)` to the generated query.

@roschaefer
Copy link
Contributor

Hey folks, I investigated the issue on our side and was able to track down the performance issue:

Generated query:

MATCH (`post`:`Post` {deleted:false, disabled:false}) RETURN `post` { .deleted , .disabled , .createdAt , .id , .id , .title , .contentExcerpt , .createdAt , .disabled , .deleted , .slug , .image ,author: head([(`post`)<-[:`WROTE`]-(`post_author`:`User`) | post_author { .deleted , .disabled , .createdAt , .id , .id , .avatar , .slug , .name , .disabled , .deleted ,contributionsCount: apoc.cypher.runFirstColumn("MATCH (this)-[:WROTE]->(r:Post)\nWHERE (NOT exists(r.deleted) OR r.deleted = false)\nAND (NOT exists(r.disabled) OR r.disabled = false)\nRETURN COUNT(r)", {this: post_author, cypherParams: {currentUserId: null}}, false),shoutedCount: apoc.cypher.runFirstColumn("MATCH (this)-[:SHOUTED]->(r:Post) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(DISTINCT r)", {this: post_author, cypherParams: {currentUserId: null}}, false),commentsCount: apoc.cypher.runFirstColumn("MATCH (this)-[:WROTE]->(r:Comment) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(r)", {this: post_author, cypherParams: {currentUserId: null}}, false),followedByCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:FOLLOWS]-(r:User) RETURN COUNT(DISTINCT r)", {this: post_author, cypherParams: {currentUserId: null}}, false),followedByCurrentUser: apoc.cypher.runFirstColumn("MATCH (this)<-[:FOLLOWS]-(u:User {id: {currentUserId: null}.currentUserId})\nRETURN COUNT(u) >= 1", {this: post_author, cypherParams: {currentUserId: null}}, false),location: head([ post_author_location IN apoc.cypher.runFirstColumn("MATCH (this)-[:IS_IN]->(l:Location) RETURN l", {this: post_author, cypherParams: {currentUserId: null}}, true) | post_author_location { .id }]) ,badges: [(`post_author`)<-[:`REWARDED`]-(`post_author_badges`:`Badge`) | post_author_badges { .id , .id , .key , .icon }] }]) ,commentsCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:COMMENTS]-(r:Comment) WHERE NOT r.deleted = true AND NOT r.disabled = true  RETURN COUNT(r)", {this: post, cypherParams: {currentUserId: null}}, false),categories: [(`post`)-[:`CATEGORIZED`]->(`post_categories`:`Category`) | post_categories { .id , .id , .name , .icon }] ,shoutedCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:SHOUTED]-(r:User) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(DISTINCT r)", {this: post, cypherParams: {currentUserId: null}}, false)} AS `post` ORDER BY `post`.createdAt DESC SKIP 0 LIMIT 10

Query without any performance issue:

MATCH (`post`:`Post` {deleted:false, disabled:false}) WITH `post` ORDER BY `post`.createdAt DESC  RETURN `post` { .deleted , .disabled , .createdAt , .id , .id , .title , .contentExcerpt , .createdAt , .disabled , .deleted , .slug , .image ,author: head([(`post`)<-[:`WROTE`]-(`post_author`:`User`) | post_author { .deleted , .disabled , .createdAt , .id , .id , .avatar , .slug , .name , .disabled , .deleted ,contributionsCount: apoc.cypher.runFirstColumn("MATCH (this)-[:WROTE]->(r:Post)\nWHERE (NOT exists(r.deleted) OR r.deleted = false)\nAND (NOT exists(r.disabled) OR r.disabled = false)\nRETURN COUNT(r)", {this: post_author, cypherParams: {currentUserId: null}}, false),shoutedCount: apoc.cypher.runFirstColumn("MATCH (this)-[:SHOUTED]->(r:Post) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(DISTINCT r)", {this: post_author, cypherParams: {currentUserId: null}}, false),commentsCount: apoc.cypher.runFirstColumn("MATCH (this)-[:WROTE]->(r:Comment) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(r)", {this: post_author, cypherParams: {currentUserId: null}}, false),followedByCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:FOLLOWS]-(r:User) RETURN COUNT(DISTINCT r)", {this: post_author, cypherParams: {currentUserId: null}}, false),followedByCurrentUser: apoc.cypher.runFirstColumn("MATCH (this)<-[:FOLLOWS]-(u:User {id: {currentUserId: null}.currentUserId})\nRETURN COUNT(u) >= 1", {this: post_author, cypherParams: {currentUserId: null}}, false),location: head([ post_author_location IN apoc.cypher.runFirstColumn("MATCH (this)-[:IS_IN]->(l:Location) RETURN l", {this: post_author, cypherParams: {currentUserId: null}}, true) | post_author_location { .id }]) ,badges: [(`post_author`)<-[:`REWARDED`]-(`post_author_badges`:`Badge`) | post_author_badges { .id , .id , .key , .icon }] }]) ,commentsCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:COMMENTS]-(r:Comment) WHERE NOT r.deleted = true AND NOT r.disabled = true  RETURN COUNT(r)", {this: post, cypherParams: {currentUserId: null}}, false),categories: [(`post`)-[:`CATEGORIZED`]->(`post_categories`:`Category`) | post_categories { .id , .id , .name , .icon }] ,shoutedCount: apoc.cypher.runFirstColumn("MATCH (this)<-[:SHOUTED]-(r:User) WHERE NOT r.deleted = true AND NOT r.disabled = true RETURN COUNT(DISTINCT r)", {this: post, cypherParams: {currentUserId: null}}, false)} AS `post` SKIP 0 LIMIT 10

Here's the diff:

Screenshot (89)

Query plan before (terrible performance):

plan

Query plan after:

plan

Suggested Fix

I had a look into the code and the naive fix would be a re-ordering of this line of code. Does it have any unwanted side effects to move the ORDER BY before all the dynamic fields?

I would be happy to provide a PR if it's that easy.

@jexp
Copy link
Contributor

jexp commented May 3, 2019

I think it makes sense in general as the sort order is kept.

What version are you running on? Please provide a PR that would be great!

@jexp
Copy link
Contributor

jexp commented May 3, 2019

I think it makes sense in general as the sort order is kept.

What version are you running on? Please provide a PR that would be great!

We could even move the skip limit as the filtering is happening before in predicate.

We should do that in the other transpilers too.

@johnymontana
Copy link
Contributor

Thanks @roschaefer and @appinteractive for digging into this! The only possible issue I see there is when we are ordering on a @cypher directive field. The difference is ordering on the node object (that won't contain the @cypher computed fields) vs ordering on the projected object that will contain the @cypher computed fields.

In general ordering on the computed fields is not a good idea anyway for performance reasons, but would be good to make sure we don't break that use case since it is supported now.

@appinteractive
Copy link
Author

Any idea how to make both possible? Is there a way to decide based on the field type (computed/not computed) ?

@johnymontana
Copy link
Contributor

I think we could accomplish that by adding some logic in computeOrderBy to check if any of the fields in orderBy have a @cypher directive on them and then return a boolean orderOnComputedField as well as the ORDER BY ... cypher string.

Then in translate.js it would be something like this:

const query =
    `MATCH (${safeVariableName}:${safeLabelName}${
      argString ? ` ${argString}` : ''
    }) ${predicate}` ${orderOnComputedField ? 'WITH * ' + orderByValue : ""}+
    `RETURN ${safeVariableName} {${subQuery}} AS ${safeVariableName}${orderOnComputedField ? "" : orderByValue}${outerSkipLimit}`;

@johnymontana
Copy link
Contributor

Also, this brings up the need of having some basic performance tests as part of the test suite so we can at least track performance regressions/improvements. I'll create a separate issue for that.

roschaefer added a commit to roschaefer/neo4j-graphql-js that referenced this issue May 25, 2019
roschaefer added a commit to Human-Connection/Human-Connection that referenced this issue Oct 22, 2019
@mattwr18 @aonomike
You must never `ORDER BY` a property with a `@cypher` directive. Reason:
The order by performance will be terribly poor.

See my issue:
neo4j-graphql/neo4j-graphql-js#239
And my PR:
neo4j-graphql/neo4j-graphql-js#247
roschaefer added a commit to Human-Connection/Human-Connection that referenced this issue Oct 22, 2019
@mattwr18 @aonomike
You must never `ORDER BY` a property with a `@cypher` directive. Reason:
The order by performance will be terribly poor.

See my issue:
neo4j-graphql/neo4j-graphql-js#239
And my PR:
neo4j-graphql/neo4j-graphql-js#247
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants