Changes:
- When translation fails, raise a
TranslationError
(:issue:`76`). Thanks :user:`jschnurr`.
Bug fixes:
Translator.translate
will detect language of input text by default (:issue:`85`). Thanks again :user:`jschnurr`.- Fix matching of tagged phrases with CFG in
ConllExtractor
. Thanks :user:`lragnarsson`. - Fix inflection of a few irregular English nouns. Thanks :user:`jonmcoe`.
Bug fixes:
- Fix
DecisionTreeClassifier.pprint
for compatibility with nltk>=3.0.2. - Translation no longer adds erroneous whitespace around punctuation characters (:issue:`83`). Thanks :user:`AdrianLC` for reporting and thanks :user:`jschnurr` for the patch.
- TextBlob now depends on NLTK 3. The vendorized version of NLTK has been removed.
- Fix bug that raised a SyntaxError when translating text with non-ascii characters on Python 3.
- Fix bug that showed "double-escaped" unicode characters in translator output (issue #56). Thanks Evan Dempsey.
- Backwards-incompatible: Completely remove
import text.blob
. You shouldimport textblob
instead. - Backwards-incompatible: Completely remove
PerceptronTagger
. Installtextblob-aptagger
instead. - Backwards-incompatible: Rename
TextBlobException
toTextBlobError
andMissingCorpusException
toMissingCorpusError
. - Backwards-incompatible:
Format
classes are passed a file object rather than a file path. - Backwards-incompatible: If training a classifier with data from a file, you must pass a file object (rather than a file path).
- Updated English sentiment corpus.
- Add
feature_extractor
parameter toNaiveBayesAnalyzer
. - Add
textblob.formats.get_registry()
andtextblob.formats.register()
which allows users to register custom data source formats. - Change
BaseClassifier.detect
from astaticmethod
to aclassmethod
. - Improved docs.
- Tested on Python 3.4.
- Fix display (
__repr__
) of WordList slices on Python 3. - Add download_corpora module. Corpora must now be downloaded using
python -m textblob.download_corpora
.
- Sentiment analyzers return namedtuples, e.g.
Sentiment(polarity=0.12, subjectivity=0.34)
. - Memory usage improvements to NaiveBayesAnalyzer and basic_extractor (default feature extractor for classifiers module).
- Add
textblob.tokenizers.sent_tokenize
andtextblob.tokenizers.word_tokenize
convenience functions. - Add
textblob.classifiers.MaxEntClassifer
. - Improved NLTKTagger.
- Fix bug in spelling correction that stripped some punctuation (Issue #48).
- Various improvements to spelling correction: preserves whitespace characters (Issue #12); handle contractions and punctuation between words. Thanks @davidnk.
- Make
TextBlob.words
more memory-efficient. - Translator now sends POST instead of GET requests. This allows for larger bodies of text to be translated (Issue #49).
- Update pattern tagger for better accuracy.
- Fix bug that caused
ValueError
upon sentence tokenization. This removes modifications made to the NLTK sentence tokenizer. - Add
Word.lemmatize()
method that allows passing in a part-of-speech argument. Word.lemma
returns correct part of speech for Word objects that have theirpos
attribute set. Thanks @RomanYankovsky.
- Backwards-incompatible: Renamed package to
textblob
. This avoids clashes with other namespaces called text. TextBlob should now be imported withfrom textblob import TextBlob
. - Update pattern resources for improved parser accuracy.
- Update NLTK.
- Allow Translator to connect to proxy server.
- PerceptronTagger completely deprecated. Install the
textblob-aptagger
extension instead.
- Bugfix updates.
- Fix bug in feature extraction for
NaiveBayesClassifier
. basic_extractor
is now case-sensitive, e.g. contains(I) != contains(i)- Fix
repr
output when a TextBlob contains non-ascii characters. - Fix part-of-speech tagging with
PatternTagger
on Windows. - Suppress warning about not having scikit-learn installed.
- Wordnet integration.
Word
objects havesynsets
anddefinitions
properties. Thetext.wordnet
module allows you to createSynset
andLemma
objects directly. - Move all English-specific code to its own module,
text.en
. - Basic extensions framework in place. TextBlob has been refactored to make it easier to develop extensions.
- Add
text.classifiers.PositiveNaiveBayesClassifier
. - Update NLTK.
NLTKTagger
now working on Python 3.- Fix
__str__
behavior.print(blob)
should now print non-ascii text correctly in both Python 2 and 3. - Backwards-incompatible: All abstract base classes have been moved to the
text.base
module. - Backwards-incompatible:
PerceptronTagger
will now be maintained as an extension,textblob-aptagger
. Instantiating atext.taggers.PerceptronTagger()
will raise aDeprecationWarning
.
- Word tokenization fix: Words that stem from a contraction will still have an apostrophe, e.g.
"Let's" => ["Let", "'s"]
. - Fix bug with comparing blobs to strings.
- Add
text.taggers.PerceptronTagger
, a fast and accurate POS tagger. Thanks @syllog1sm. - Note for Python 3 users: You may need to update your corpora, since NLTK master has reorganized its corpus system. Just run
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | python
again. - Add
download_corpora_lite.py
script for getting the minimum corpora requirements for TextBlob's basic features.
- Fix bug that resulted in a
UnicodeEncodeError
when tagging text with non-ascii characters. - Add
DecisionTreeClassifier
. - Add
labels()
andtrain()
methods to classifiers.
- Classifiers can be trained and tested on CSV, JSON, or TSV data.
- Add basic WordNet lemmatization via the
Word.lemma
property. WordList.pluralize()
andWordList.singularize()
methods returnWordList
objects.
- Add Naive Bayes classification. New
text.classifiers
module,TextBlob.classify()
, andSentence.classify()
methods. - Add parsing functionality via the
TextBlob.parse()
method. Thetext.parsers
module currently has one implementation (PatternParser
). - Add spelling correction. This includes the
TextBlob.correct()
andWord.spellcheck()
methods. - Update NLTK.
- Backwards incompatible:
clean_html
has been deprecated, just as it has in NLTK. Use Beautiful Soup'ssoup.get_text()
method for HTML-cleaning instead. - Slight API change to language translation: if
from_lang
isn't specified, attempts to detect the language. - Add
itokenize()
method to tokenizers that returns a generator instead of a list of tokens.
- Unicode fixes: This fixes a bug that sometimes raised a
UnicodeEncodeError
upon creating accessingsentences
for TextBlobs with non-ascii characters. - Update NLTK
- Important patch update for NLTK users: Fix bug with importing TextBlob if local NLTK is installed.
- Fix bug with computing start and end indices of sentences.
- Fix bug that disallowed display of non-ascii characters in the Python REPL.
- Backwards incompatible: Restore
blob.json
property for backwards compatibility with textblob<=0.3.10. Add ato_json()
method that takes the same arguments asjson.dumps
. - Add
WordList.append
andWordList.extend
methods that append Word objects.
- Language translation and detection API!
- Add
text.sentiments
module. Contains thePatternAnalyzer
(default implementation) as well as aNaiveBayesAnalyzer
. - Part-of-speech tags can be accessed via
TextBlob.tags
orTextBlob.pos_tags
. - Add
polarity
andsubjectivity
helper properties.
- New
text.tokenizers
module withWordTokenizer
andSentenceTokenizer
. Tokenizer instances (from either textblob itself or NLTK) can be passed to TextBlob's constructor. Tokens are accessed through the newtokens
property. - New
Blobber
class for creating TextBlobs that share the same tagger, tokenizer, and np_extractor. - Add
ngrams
method. - Backwards-incompatible:
TextBlob.json()
is now a method, not a property. This allows you to pass arguments (the same that you would pass tojson.dumps()
). - New home for documentation: https://textblob.readthedocs.org/
- Add parameter for cleaning HTML markup from text.
- Minor improvement to word tokenization.
- Updated NLTK.
- Fix bug with adding blobs to bytestrings.
- Bundled NLTK no longer overrides local installation.
- Fix sentiment analysis of text with non-ascii characters.
- Updated nltk.
- ConllExtractor is now Python 3-compatible.
- Improved sentiment analysis.
- Blobs are equal (with ==) to their string counterparts.
- Added instructions to install textblob without nltk bundled.
- Dropping official 3.1 and 3.2 support.
- Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to
noun_phrases
(instead of training them every time you import TextBlob). - Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
- NPExtractor and Tagger objects can be passed to TextBlob's constructor.
- Fix bug with POS-tagger not tagging one-letter words.
- Rename text/np_extractor.py -> text/np_extractors.py
- Add run_tests.py script.
- Every word in a
Blob
orSentence
is aWord
instance which has methods for inflection, e.gword.pluralize()
andword.singularize()
. - Updated the
np_extractor
module. Now has an new implementation,ConllExtractor
that uses the Conll2000 chunking corpus. Only works on Py2.