Releases: bab2min/tomotopy
Releases · bab2min/tomotopy
0.13.0
- New features
- Major features of Topic Model Viewer
tomotopy.viewer.open_viewer()
are ready now. tomotopy.LDAModel.get_hash()
is added. You can get 128bit hash value of the model.- Add an argument
ngram_list
totomotopy.utils.SimpleTokenizer
.
- Major features of Topic Model Viewer
- Bug fixes
- Fixed inconsistent
spans
bug afterCorpus.concat_ngrams
is called. - Optimized the bottleneck of
tomotopy.LDAModel.load()
andtomotopy.LDAModel.save()
and improved its speed more than 10 times.
- Fixed inconsistent
0.12.7
0.12.6
- New features
- Added some convenience features to
tomotopy.LDAModel.train
andtomotopy.LDAModel.set_word_prior
. LDAModel.train
now has new argumentscallback
,callback_interval
andshow_progres
to monitor the training progress.LDAModel.set_word_prior
now can acceptDict[int, float]
type as its argumentprior
.
- Added some convenience features to
0.12.5
0.12.4
0.12.3
New features
- Now, inserting an empty document using
tomotopy.LDAModel.add_doc()
just ignores it instead of raising an exception. If the newly added argumentignore_empty_words
is set to False, an exception is raised as before. (#161) tomotopy.HDPModel.purge_dead_topics()
method is added to remove non-live topics from the model. (#152)
Bug fixes
- Fixed an issue that prevents setting user defined values for nuSq in
tomotopy.SLDAModel
(by @jucendrero). (#174) - Fixed an issue where
tomotopy.utils.Coherence
did not work fortomotopy.DTModel
. (#164) - Fixed an issue that often crashed when calling
make_doc()
before callingtrain()
. (#166) - Resolved the problem that the results of
tomotopy.DMRModel
andtomotopy.GDMRModel
are different even when the seed is fixed. (#63) - The parameter optimization process of
tomotopy.DMRModel
andtomotopy.GDMRModel
has been improved. - Fixed an issue that sometimes crashed when calling
tomotopy.PTModel.copy()
.
0.12.2
- An issue where calling
convert_to_lda
oftomotopy.HDPModel
withmin_cf > 0
,min_df > 0
orrm_top > 0
causes a crash has been fixed. - A new argument
from_pseudo_doc
is added totomotopy.Document.get_topics
andtomotopy.Document.get_topic_dist
.
This argument is only valid for documents ofPTModel
, it enables to control a source for computing topic distribution. - A default value for argument
p
oftomotopy.PTModel
has been changed. The new default value isk * 10
. - Using documents generated by
make_doc
without callinginfer
doesn't cause a crash anymore, but just print warning messages. - An issue where the internal C++ code isn't compiled at clang c++17 environment has been fixed.
0.12.1
- An issue where
tomotopy.LDAModel.set_word_prior()
causes a crash has been fixed. - Now
tomotopy.LDAModel.perplexity
andtomotopy.LDAModel.ll_per_word
return the accurate value whenTermWeight
is notONE
. tomotopy.LDAModel.used_vocab_weighted_freq
was added, which returns term-weighted frequencies of words.- Now
tomotopy.LDAModel.summary()
shows not only the entropy of words, but also the entropy of term-weighted words.
0.12.0
- Now
tomotopy.DMRModel
andtomotopy.GDMRModel
support multiple values of metadata (see https://github.com/bab2min/tomotopy/blob/main/examples/dmr_multi_label.py ) - The performance of
tomotopy.GDMRModel
was improved. - A
copy()
method has been added for all topic models to do a deep copy. - An issue was fixed where words that are excluded from training (by
min_cf
,min_df
) have incorrect topic id. Now all excluded words have-1
as topic id. - Now all exceptions and warnings that generated by
tomotopy
follow standard Python types. - Compiler requirements have been raised to C++14.