Note: This syntaxnet build contains The Great Models Move change.
When Google declared that The World’s Most Accurate Parser i.e., SyntaxNet goes open-source, it grabbed widespread attention from machine-learning developers and researchers who were interested in core applications of NLU like automatic extraction of information, translation etc. Following gif shows how syntaxnet internally builds the dependency tree:
Predominantly one will find two approaches to use SyntaxNet:
- Using demo.sh script provided by syntaxnet
- Invoke the same from python as a subprocess as shown below. This approach is obviously inefficient, non-scalable and over-kill as it internally calls other python scripts.
import subprocess
import os
os.chdir(r"../models/syntaxnet")
subprocess.call([
"echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh"
], shell = True)
+ I wanted a proper scalable python application where one can do `import syntaxnet`
+ and use it as shown below:
import syntaxnet
from syntaxnet import gen_parser_ops...
+ I could manage to get this done and hence sharing my project here. Please find below as to how I got this!!
- After The Great Models Move, Tensorflow categorized SyntaxNet as RESEARCH MODEL.
- As mentioned here, Tensorflow team will no more provide guaranteed support to SyntaxNet and they encouraged Individual researchers to support research models.
Apart from having high struggles in installation and huge learning curve, no official support and lack of clear documentation
led forums talking about myraid of issues on SyntaxNet without proper solutions. Some of them were as basic as:
- A lot of trouble understanding documentation around both syntaxnet and related tools
- How to use Parsey McParseface model in python application
- Confusing I/O handling in SyntaxNet because of the uncommon .conll file format it uses for input and output.
- How to use/export the output (ascii tree or conll ) in a format that is easy to parse
This endevour addresses to make the life of SyntaxNet enthusiasts easier. It primarily saves all those hours to get Google's SyntaxNet Parsey McParseface
up and running in a way it should be. For this, am providing two things as part of this project:
- One line (~5mins) SyntaxNet 0.2 installation
- Syntaxnet Parsey McParseface wrapper for POS tagging and dependency parsing
Iam sharing the osx syntaxnet package distribution i.e., syntaxnet-0.2-cp27-cp27m-macosx_10_6_intel.whl file
in this git repo that I've got successfully built using bazel
build tool with all tests passing after pulling the latest code from syntaxnet git repository. This will setup syntaxnet 0.2 version
with a simple command in barely 5 minutes as shown below:
git clone https://github.com/spoddutur/syntaxnet.git
cd <CLONED_SYNTAXNET_PROJ_DIR>
sudo pip install syntaxnet-0.2-cp27-cp27m-macosx_10_6_intel.whl
Here comes the most interesting (a.k.a challenging) part i.e., How to use syntaxnet in a python application. It should no more be of any trouble after this point :)
my_parser_eval.py
is the file that contains the python-wrapper which I implemented to wrap SyntaxNet. The list of API's exposed in this wrapper are listed below:
1. Api to initialise parser:
`tagger = my_parser_eval.SyntaxNetProcess("brain_tagger")`
("brain_tagger" will initialise pos tagger. change it to "brain_parser" for dependency parsing)
2. Api to input data to parser:
`my_parser_eval._write_input("<YOUR_ENGLISH_SENTENCE_INPUT>")`
3. Api to invoke parser:
`tagger.eval()`
3. Api to read parser's output in conll format:
`my_parser_eval._read_output()`
4. Api to pretty print parser's output as tree:
`my_parser_eval.pretty_print()`
- I wrote
main.py
(a sample python code) to demo this wrapper. It performssyntaxnet's dependency parsing
. - Input to main.py: English sentence text
- Output from main.py: Dependency graph tree
1. git clone https://github.com/spoddutur/syntaxnet.git
2. cd <syntaxnet-git-clone-directory>
3. python main.py
4. That's it!! It prints syntaxnet dependency parser output for given input english sentence
- /models: Originally cloned from syntaxnet git repository https://github.com/tensorflow/models . But this folder will additionally contain the bazel build “bazel-bin" folder with the needed runfiles.
- custom_context.pbtxt: Custom context file used in setting context for parser.
- my_parser_eval.py: python wrapper for “brain-tagger” POS tagger and “brain-parser” dependency parser. This file is heavily inspired from the original parser_eval.py that syntaxnet provides with quiet some modifications aand enhancements.
- main.py: Demo sample usage
- /data: folder where parser’s intermediate input’s and output’s are dumped.
- .whl: osx package distribution of the final successful syntaxnet built using which you can setup
syntaxnet 0.2 version
in barely 5 minutes