Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index on path, workspace_name is not used in Client::getNode #151

Open
ACSI-IT opened this issue Oct 23, 2013 · 17 comments
Open

index on path, workspace_name is not used in Client::getNode #151

ACSI-IT opened this issue Oct 23, 2013 · 17 comments

Comments

@ACSI-IT
Copy link

ACSI-IT commented Oct 23, 2013

Jackalope\Transport\DoctrineDBAL\Client::getNode is doing a LIKE query on phpcr_nodes.path, but EXPLAIN shows it is not using the index on (path, workspace_name).

I traced this to the forward slashes in :pathd, adding (escaping) '\' before these forward slashes makes MySQL use its index. significantly improving performance on path queries.

see:
https://github.com/ACSI-IT/jackalope-doctrine-dbal/commit/95f73f01ab0cdc94121a94c4b9425621a5479ade

Not sure if this is the cleanest solution, could you provide some feedback.
Let me know if you want me to create a PR for this commit.

Axel

@ACSI-IT
Copy link
Author

ACSI-IT commented Oct 23, 2013

We're on
PHP 5.4.19
mysql Ver 14.14 Distrib 5.1.69

@dbu
Copy link
Member

dbu commented Oct 28, 2013

interesting find, thanks for reporting. i lack the knowhow of mysql details on this. is the forward slash problem a bug or some sort of feature? could we fix that with a schema change maybe?

the escaping thing feels hacky to me unless this is not persisted by mysql... what exactly is stored in the db this way? and do the = queries still work with the \ or do you add \ also to the query?

the "like" part is an optimization to support fetchDepth - one quickfix could be to avoid the like alltogether if fetchDepth is 0. does the depth check happen before the like or after? that might be more efficient too.

@ACSI-IT
Copy link
Author

ACSI-IT commented Nov 5, 2013

@dbu
I agree those slashes are hacky, I have a feeling it might be a quirk of MySQL, not using that index if the first character is not in [a-zA-z0-9] or something.
I haven't had the time to thoroughly try that though.

This is the query I was playing with:


SELECT * FROM phpcr_nodes
WHERE (path LIKE :pathd OR path = :path)
AND workspace_name = :workspace
AND depth <= ((SELECT depth FROM phpcr_nodes WHERE path = :path AND workspace_name = :workspace) + :fetchDepth)
ORDER BY sort_order ASC
array (
':path' => '/routes',
':pathd' => '/routes/%',
':workspace' => 'default',
':fetchDepth' => 0,
)

EXPLAIN SELECT * FROM phpcr_nodes
WHERE (path LIKE '/routes/%' OR path = '/routes')
AND workspace_name = 'default'
AND depth <= ((SELECT depth FROM phpcr_nodes WHERE path = '/routes' AND workspace_name = 'default') + 0)
ORDER BY sort_order ASC

EXPLAIN SELECT * FROM phpcr_nodes
WHERE (path LIKE 'phpcr:///routes/%' OR path = 'phpcr:///routes')
AND workspace_name = 'default'
AND depth <= ((SELECT depth FROM phpcr_nodes WHERE path = 'phpcr:///routes' AND workspace_name = 'default') + 0)
ORDER BY sort_order ASC


Explain shows it wont use an index unless I add those slashes to :pathd ('\/routes')
or
add 'phpcr://' in front of :pathd

I was thinking, how much effort would it be if paths we're stored as URI, something like
'phpcr:///my-node/childnode'

thanks,
Axel

@dbu
Copy link
Member

dbu commented Nov 12, 2013

the drawback of having the URI is that it adds some overhead and we would need to update all existing database entries. it feals unelegant, but if we can't do anything else it could be a solution.

i wonder if its mysql specific or could have to do with accidentally triggering some regular expression logic. also unsure whether it might just be a bug affect mysql up to some version... i asked around on twitter now, maybe we get some more inputs.

@beberlei
Copy link
Member

if every field starts with / mysql will probably determine that the indexability sucks and use full table scan instead.

@dbu
Copy link
Member

dbu commented Nov 12, 2013

@beberlei indeed every field of the path column will start with /. but why can that not be indexed? and could the same be indexed if we use a phpcr:// prefix? we don't talk full text search indexing or anything, just an index that would help with the equality and like queries.

@dbu
Copy link
Member

dbu commented Dec 28, 2013

tried to get some input with a tweet.

@lsmith77
Copy link
Member

@ACSI-IT can you try using FORCE INDEX instead?
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html

check if this is faster than without (where its not using the index) to see if mysql is indeed wrong for using a table scan

@dbu
Copy link
Member

dbu commented Dec 28, 2013

@ACSI-IT i assume that a performance difference is only visible if there are a lot of nodes, on less than 100s of nodes, connection time and php bootstrap is probably larger than mysql not using the index.

@zibok
Copy link

zibok commented Dec 28, 2013

What is the type of index uses on path column? Maybe adding a length can help?

@cryptocompress
Copy link
Contributor

WHERE (path LIKE :pathd OR path = :path)

try to remove OR...

@marcj
Copy link

marcj commented Dec 28, 2013

Forcing the usage of a index is really not a good idea as it can slow down the query. I'm not a jackalope developer and I've never used it, thus I don't know the exact schema or the content.

The thing is, when the index over path and workspace_name has a low cardinality it would not make sense to use a b-tree index, as b-tree is not made for low cardinality. When MySQL detects that this index has one of those low cardinalities then it won't use the index.

How much data do you have in your database? I don't think the performance increases that much when you change the data value of path column that way. What cardinality and how much data do you have?

@dbu
Copy link
Member

dbu commented Dec 28, 2013

Ah maybe its the OR. We remove that in #159 when its not needed. Is that better?

@marcj
Copy link

marcj commented Dec 28, 2013

@dbu, there's no benefit with omitting that OR since those conditions in the path group use the same column.

I can't see anything that might break the usage of that index in this query, except the cardinality issue. However, that issue should be resolved itself when data amount (and therefore probably cardinality) increases.

@ACSI-IT, how much data is in this table? With which data did you test that scenario?

@dbu
Copy link
Member

dbu commented Mar 2, 2014

ping @ACSI-IT

@lsmith77
Copy link
Member

lsmith77 commented Mar 2, 2014

@marcj forcing the index may still make sense if we know that the MySQL query optimiser is wrong. then again of course this may change over time, meaning we may need to make this configurable somehow and we could then automatically set the option when generating the DIC.

@dbu
Copy link
Member

dbu commented May 2, 2015

@dantleech another one about performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants