Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: TypeError when trying to use TextCategorizer #1728

Closed
shgidi opened this issue Dec 15, 2017 · 4 comments
Closed

Bug: TypeError when trying to use TextCategorizer #1728

shgidi opened this issue Dec 15, 2017 · 4 comments
Labels
docs Documentation and website usage General spaCy usage

Comments

@shgidi
Copy link

shgidi commented Dec 15, 2017

when running the following code (from the code exmple here https://spacy.io/api/textcategorizer):

import spacy
from spacy.pipeline import TextCategorizer
nlp = spacy.load('en')
​
textcat = TextCategorizer(nlp.vocab)
doc = nlp(u"This is a sentence.")
processed = textcat(doc)

I get the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-42-f23a44f17862> in <module>()
      5 textcat = TextCategorizer(nlp.vocab)
      6 doc = nlp(u"This is a sentence.")
----> 7 processed = textcat(doc)

pipeline.pyx in spacy.pipeline.TextCategorizer.__call__()

pipeline.pyx in spacy.pipeline.TextCategorizer.predict()

TypeError: 'bool' object is not callable

Info about spaCy

  • spaCy version: 2.0.1
  • Platform: Linux-4.10.0-42-generic-x86_64-with-debian-stretch-sid
  • Python version: 3.6.3
  • Models: en
@ines ines added docs Documentation and website usage General spaCy usage labels Dec 16, 2017
@ines
Copy link
Member

ines commented Dec 16, 2017

Thanks – I think this is actually an error in the docs. Will fix this! The cause of the error is the same as described in #1702:

The model for the parser is not initialized until you either load the weights (with .from_disk() or .from_bytes() methods, or initialize with .begin_training(). You can also create a model with the parser.Model() class method.

Since the TextCategorizer has no model loaded in, calling it directly will result in an error. So in a real-world use case, you would either load in the weights, or add it to the pipeline:

textcat.from_disk('/path/to/model')
nlp.add_pipe(textcat)
doc = nlp(u"This is a sentence.")
print(doc.cats)

The example in the API docs should probably show an example with weights loaded in, since it's supposed to show the more abstract and standalone use of the class.

@safwank
Copy link

safwank commented Jan 2, 2018

@ines is there a pre-trained model that I can download and use, just like the language models?

I know I can train one myself as per https://github.com/explosion/spacy/blob/master/examples/training/train_textcat.py, but I'm wondering if there's one that I can readily use.

@ines
Copy link
Member

ines commented Jan 3, 2018

@safwank Not yet! Text classification is pretty specific, though, which makes it much harder to provide general-purpose models like the language models. However, it might be nice to offer an example model people can try out.

If you're interested in an end-to-end workflow of training a text classifier, check out this video tutorial we've recorded for our annotation tool Prodigy:
https://prodi.gy/docs/video-insults-classifier

The workflow focuses on collecting the annotations to train the classifier – but under the hood, it updates spaCy's TextCategorizer with the collected annotations, and saves out a spaCy model with the new category available via doc.cats.

@ines ines closed this as completed in 4963535 Jan 3, 2018
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

3 participants