Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the library is not thread-safe #37

Closed
dportabella opened this issue Nov 17, 2017 · 5 comments
Closed

the library is not thread-safe #37

dportabella opened this issue Nov 17, 2017 · 5 comments

Comments

@dportabella
Copy link

this code fails because of a thread-safe problem.
it works if removing the .par call (which makes a parallel seq).
it works if removing the import kantan.xpath.nekohtml._.

how to use kantan in multiple threads?

  import kantan.xpath.implicits._
  import kantan.xpath.nekohtml._

  (1 to 10).par.foreach { n =>
    val node = "<html><body>text</body></html>".asNode
    println(node)
  }

fails with:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 12
	at java.util.ArrayList.add(ArrayList.java:459)
	at org.apache.xerces.util.ParserConfigurationSettings.addRecognizedProperties(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.<init>(Unknown Source)
	at org.apache.xerces.parsers.AbstractXMLDocumentParser.<init>(Unknown Source)
	at org.apache.xerces.parsers.AbstractDOMParser.<init>(Unknown Source)
	at org.apache.xerces.parsers.DOMParser.<init>(Unknown Source)
	at kantan.xpath.nekohtml.NekoParser.parse(NekoParser.scala:25)
	at kantan.xpath.XmlSource$$anonfun$inputSource$1.apply(XmlSource.scala:99)
	at kantan.xpath.XmlSource$$anonfun$inputSource$1.apply(XmlSource.scala:99)
	at kantan.xpath.XmlSource$$anon$1.asNode(XmlSource.scala:91)
	at kantan.xpath.XmlSource$$anonfun$contramapResult$1$$anonfun$apply$2.apply(XmlSource.scala:73)
	at kantan.xpath.XmlSource$$anonfun$contramapResult$1$$anonfun$apply$2.apply(XmlSource.scala:73)
	at kantan.codecs.Result$Success.flatMap(Result.scala:248)
	at kantan.xpath.XmlSource$$anonfun$contramapResult$1.apply(XmlSource.scala:73)
	at kantan.xpath.XmlSource$$anonfun$contramapResult$1.apply(XmlSource.scala:73)
	at kantan.xpath.XmlSource$$anon$1.asNode(XmlSource.scala:91)
	at kantan.xpath.ops.XmlSourceOps.asNode(XmlSourceOps.scala:31)
	at playground.Test106$$anonfun$1.apply$mcVI$sp(Test106.scala:8)
	at scala.collection.parallel.immutable.ParRange$ParRangeIterator.foreach(ParRange.scala:91)
	at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
	at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:159)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
@nrinaudo
Copy link
Owner

Confirmed, thanks for reporting!

I wasn't aware that nekohtml (or xerces, maybe?) was not thread safe. What a nightmare. Let me look into this, there's a trivial but very inefficient fix, I'd like to try and do something clean.

nrinaudo added a commit that referenced this issue Nov 17, 2017
Yes, in Java world, it's ok for XML parser configuration to be changed in a non-thread safe way during parsing.

(╯°□°)╯︵ ┻━┻
@nrinaudo
Copy link
Owner

@dportabella how urgently do you need the fix? PR #38 fixes the issue. If you need it as soon as possible, I can merge and cut a release when I get home tonight. Otherwise, I'd like to take a bit of time to write a non-regression test for this before merging.

nrinaudo added a commit that referenced this issue Nov 17, 2017
Yes, in Java world, it's ok for XML parser configuration to be changed in a non-thread safe way during parsing.

(╯°□°)╯︵ ┻━┻
@dportabella
Copy link
Author

thx! :)
it's not urgent, i have my own temp workaround.

nrinaudo added a commit that referenced this issue Nov 17, 2017
[#37] Synchronize parsing on the HTML configuration.
@nrinaudo
Copy link
Owner

The fix has been released in v0.3.1, available on sonatype now, and on maven central as soon as it syncs. Could you confirm that it works for you and close the issue if it does?

@dportabella
Copy link
Author

cool, it works for me, thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants