You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first error I get is TypeError when trying to use extract something with ner:
Traceback (most recent call last):
File "/home/dex/projects/project/project/spiders/test.py", line 54, in <module>
main()
File "/home/dex/projects/project/project/spiders/test.py", line 51, in main
print(ner.extract_from_url('http://scrapinghub.com/contact'))
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/model.py", line 58, in extract_from_url
return self.extract(data)
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/model.py", line 46, in extract
groups = IobEncoder.group(zip(html_tokens, tags))
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/sequence_encoding.py", line 128, in group
return list(cls.iter_group(data, strict))
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/sequence_encoding.py", line 136, in iter_group
if iob_tag.startswith('I-') and tag != iob_tag[2:]:
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
It seems like python3 support issue as it's expects bytes but get a string?
Second error is when trying to build a ner straight from model without fitting it first:
Traceback (most recent call last):
File "/home/dex/projects/project/project/spiders/test.py", line 53, in <module>
main()
File "/home/dex/projects/project/project/spiders/test.py", line 50, in main
print(ner.extract_from_url('http://scrapinghub.com/contact'))
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/model.py", line 58, in extract_from_url
return self.extract(data)
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/model.py", line 45, in extract
html_tokens, tags = self.extract_raw(bytes_data)
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/model.py", line 67, in extract_raw
tags = self.model.predict([html_tokens])[0]
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/sklearn/utils/metaestimators.py", line 54, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/sklearn/pipeline.py", line 327, in predict
return self.steps[-1][-1].predict(Xt)
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/wapiti.py", line 211, in predict
sequences = self._to_wapiti_sequences(X)
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/wapiti.py", line 230, in _to_wapiti_sequences
X = self.feature_encoder.transform(X)
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/wapiti.py", line 313, in transform
return [self.transform_single(feature_dicts) for feature_dicts in X]
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/wapiti.py", line 313, in <listcomp>
return [self.transform_single(feature_dicts) for feature_dicts in X]
File "/home/dex/.virtualenvs/people/lib/python3.6/site-packages/webstruct/wapiti.py", line 308, in transform_single
line = ' '.join(_tostr(dct.get(key)) for key in self.feature_names_)
TypeError: 'NoneType' object is not iterable
The errors seem to be very vague and I don't even know where to start debugging this. Am I missing something?
Tutorial is outdated and not complete though; the recommended way is to use crfsuite, not wapiti, and tutorial should have shown how to use Pattern features, as they are important to get good quality.
Hey @kmike thanks for letting me know. I did manage to get it working with crfsuit and it's doing pretty well for my use-case! I'd update the docs but I feel that my knowledge is a bit limited on this subject for the time being.
Could you elaborate more on Pattern features? Or point me to some material?
I've been following webstruct tutorial and I'm getting few peculiar errors.
From the tutorial I end up with code along the lines of this:
The first error I get is TypeError when trying to use extract something with
ner
:It seems like python3 support issue as it's expects bytes but get a string?
Second error is when trying to build a
ner
straight from model without fitting it first:Results in:
The errors seem to be very vague and I don't even know where to start debugging this. Am I missing something?
I'm running:
webstruct
- 0.5scikit-learn
- 0.18.2scipy
- 0.19libwapiti
- 0.2.1The text was updated successfully, but these errors were encountered: