"instance" linked in "For instance" #43

holtzermann17 · 2013-05-20T18:11:07Z

Target of the link is: http://planetmath.org/substitutionsinpropositionallogic

Source article (place where the link lives) is: http://planetmath.org/topicentryoncomplexanalysis

For instance, putting imaginary numbers into the power series for the exponential function, we find...

dginev · 2013-07-11T09:48:32Z

Moving to 3.0 milestone.

dginev · 2014-01-01T14:17:40Z

I am revisiting the accuracy issues at the moment. This report reduces to a linguistic deficiency - NNexus does not currently recognize prepositional phrases. It is easy to image a document where "instance" is used as a term, and additionalyl "for instance" is used separately to provide an example. So this is a legitimate bug that requires enhancing NNexus with more linguistic capabilities.

With the exception of phrases containing pronouns, most propositional phrases form a closed set in English and are relatively well capture by Wiktionary (they have 701 of them here).

I was reading recently that the mantra which works for a lot of startups is "do the simplest approach first", so introducing a hardcoded list of phrases to avoid (ignoring pronoun variation for the moment) could be the easiest solution here. The "correct" solution of course is to have part of speech information and only treat regular Noun Phrases (NPs) as concept candidates. But we don't have a reliably part-of-speech tagger for mathematical texts yet.

dginev · 2014-01-01T16:12:45Z

So, on the POS tagger front, the conventionally accepted "best" free tool is the Stanford tagger. An important discovery I just made is that someone has gone through the effort of creating a self-contained Perl wrapper around the Stanford Core NLP tools (133 MB in size!) and published it on CPAN. So that makes it easy to acquire a tagger as a dependency. Currently trying that out.

dginev · 2014-01-01T16:17:57Z

But it also requires a Java SDK, so NNexus gets a total of ~200 MB heavier in size. Interesting to see if we gain anything in result.

holtzermann17 · 2014-01-02T13:48:06Z

@dginev - if we ever get around to integrating the "recommender system" that I worked on in my Day Job (2013 edition), https://github.com/kmi/decipher we would also have a Java dependency there. I can imagine having a dedicated (virtual) server for running web services.

dginev · 2014-04-20T23:15:33Z

I have found a possibly perfect match for augmenting NNexus with POS tags, namely the SENNA toolkit. It is both efficient and has state-of-art precision and recall, which makes it a perfect fit. Using native C I could process a large arXiv document (6500 words) in 3 seconds, including the parsing overhead.

So I have the feeling for regular NNexus jobs the POS parsing might be only an insignificant hit to the overall runtime. I am currently writing a Perl wrapper for the library, in order to easily leverage SENNA in NNexus. My other experiments were performed in the context of LLaMaPUn and my general PhD work.

ghost assigned dginev May 20, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"instance" linked in "For instance" #43

"instance" linked in "For instance" #43

holtzermann17 commented May 20, 2013

dginev commented Jul 11, 2013

dginev commented Jan 1, 2014

dginev commented Jan 1, 2014

dginev commented Jan 1, 2014

holtzermann17 commented Jan 2, 2014

dginev commented Apr 20, 2014

"instance" linked in "For instance" #43

"instance" linked in "For instance" #43

Comments

holtzermann17 commented May 20, 2013

dginev commented Jul 11, 2013

dginev commented Jan 1, 2014

dginev commented Jan 1, 2014

dginev commented Jan 1, 2014

holtzermann17 commented Jan 2, 2014

dginev commented Apr 20, 2014