You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using this library to batch parse a large number of HTML documents in parallel (via antchfx/htmlquery) and it will always result either in a nil pointer dereference or wrong result.
The issue is reuse of query objects due to shallow cloning of functionQuery objects.
At this point the all cloned queries will reference the same original child query nodes that will be mutated concurrently when Select is called (for example attributeQuery.iterator in attributeQuery.Select).
Here is a self-contained example with htmlquery, which uses a global cache of xpath.Expr objects, that hits the problem fairly quickly on my machine.
Note that it will run correctly if you remove the contains clause, and just use =, for example.
My current workaround is to disable query cache in htmlquery, but what really needs to be done is to refactor functionQuery so it can be cloned correctly.
The text was updated successfully, but these errors were encountered:
I'm using this library to batch parse a large number of HTML documents in parallel (via
antchfx/htmlquery
) and it will always result either in a nil pointer dereference or wrong result.The issue is reuse of query objects due to shallow cloning of
functionQuery
objects.htmlquery
callsxpath.Expr.Select
, which creates a new node iterator containing a cloned query.functionQuery
is cloned but the clone is shallow, because the function is actually a closure that captured references to the original query objects when it was constructed.Select
is called (for exampleattributeQuery.iterator
inattributeQuery.Select
).Here is a self-contained example with
htmlquery
, which uses a global cache ofxpath.Expr
objects, that hits the problem fairly quickly on my machine.Note that it will run correctly if you remove the
contains
clause, and just use=
, for example.My current workaround is to disable query cache in
htmlquery
, but what really needs to be done is to refactorfunctionQuery
so it can be cloned correctly.The text was updated successfully, but these errors were encountered: