Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One good example beginning to end #24

Open
bitfinity opened this issue Aug 23, 2014 · 2 comments
Open

One good example beginning to end #24

bitfinity opened this issue Aug 23, 2014 · 2 comments

Comments

@bitfinity
Copy link

Your tool looks like what I'm looking for, but the documentation is so limited, I can't use it. Just one screencast or example would do the trick.
All I want to know is how to train something to use with NER. You suggest using WebAnnotator, and you provide code to load trees out of the files saved from WebAnnotator, but you stop there. Why not follow through with a complete example that shows how to extract the content based on that model?
Thanks,
-jim

@kmike
Copy link
Member

kmike commented Aug 23, 2014

Hi,

A good tutorial is definitely missing. I have a complete example (an IPython notebook) in works, but haven't finished it yet.

But I'd love to hear more feedback about the existing tutorial: http://webstruct.readthedocs.org/en/latest/tutorial.html. It is a bit outdated (it is easier to use CRFsuite instead of Wapiti), but it lists all required steps in order. You load trees, then convert them to HTML tokens, then extract features, then feed them into a sequence labelling toolkit, then train the model, and then use it to extract entities from a webpage - there is a chapter for each step in the tutorial. But as it turned out this is not clear at all.

Could you please share your experience with this tutorial? What is not clear? How can I improve it? Don't try too hard - if something is unclear please ask here, this will help making the tutorial better.

@kmike
Copy link
Member

kmike commented Aug 23, 2014

There is a couple of "shortcuts" available:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants