Skip to content

Releases: bab2min/tomotopy

0.11.0

26 Mar 09:22
Compare
Choose a tag to compare
  • A new topic model tomotopy.PTModel for short texts was added into the package.
  • An issue was fixed where tomotopy.HDPModel.infer causes a segmentation fault sometimes.
  • A mismatch of numpy API version was fixed.
  • Now asymmetric document-topic priors are supported.
  • Serializing topic models to bytes in memory is supported.
  • An argument normalize was added to get_topic_dist(), get_topic_word_dist() and get_sub_topic_dist() for controlling normalization of results.
  • Now tomotopy.DMRModel.lambdas and tomotopy.DMRModel.alpha give correct values.
  • Categorical metadata supports for tomotopy.GDMRModel were added (see https://github.com/bab2min/tomotopy/blob/main/examples/gdmr_both_categorical_and_numerical.py ).
  • Python3.5 support was dropped.

0.10.2

16 Feb 06:59
Compare
Choose a tag to compare
  • An issue was fixed where tomotopy.CTModel.train fails with large K.
  • An issue was fixed where tomotopy.utils.Corpus loses their uid values.

0.10.1

14 Feb 13:56
f2c1310
Compare
Choose a tag to compare
  • An issue was fixed where tomotopy.utils.Corpus.extract_ngrams craches with empty input.
  • An issue was fixed where tomotopy.LDAModel.infer raises exception with valid input.
  • An issue was fixed where tomotopy.HLDAModel.infer generates wrong tomotopy.Document.path.
  • Since a new parameter freeze_topics for tomotopy.HLDAModel.train was added, you can control whether to create a new topic or not when training.

0.10.0

19 Dec 15:54
Compare
Choose a tag to compare
  • The interface of tomotopy.utils.Corpus and of tomotopy.LDAModel.docs were unified. Now you can access the document in corpus with the same manner.
  • getitem of tomotopy.utils.Corpus was improved. Not only indexing by int, but also by Iterable[int], slicing are supported. Also indexing by uid is supported.
  • New methods tomotopy.utils.Corpus.extract_ngrams and tomotopy.utils.Corpus.concat_ngrams were added. They extracts n-gram collocations using PMI and concatenates them into a single words.
  • A new method tomotopy.LDAModel.add_corpus was added, and tomotopy.LDAModel.infer can receive corpus as input.
  • A new module tomotopy.coherence was added. It provides the way to calculate coherence of the model.
  • A paramter window_size was added to tomotopy.label.FoRelevance.
  • An issue was fixed where NaN often occurs when training tomotopy.HDPModel.
  • Now Python3.9 is supported.
  • A dependency to py-cpuinfo was removed and the initializing of the module was improved.

0.9.1

08 Aug 09:32
0ef8c6f
Compare
Choose a tag to compare
  • Memory leaks of version 0.9.0 was fixed.
  • tomotopy.CTModel.summary() was fixed.

0.9.0

04 Aug 15:25
Compare
Choose a tag to compare
  • The tomotopy.LDAModel.summary() method, which prints human-readable summary of the model, has been added.
  • The random number generator of package has been replaced with EigenRand. It speeds up the random number generation and solves the result difference between platforms.
  • Due to above, even if seed is the same, the model training result may be different from the version before 0.9.0.
  • Fixed a training error in tomotopy.HDPModel.
  • tomotopy.DMRModel.alpha now shows Dirichlet prior of per-document topic distribution by metadata.
  • tomotopy.DTModel.get_count_by_topics() has been modified to return a 2-dimensional ndarray.
  • tomotopy.DTModel.alpha has been modified to return the same value as tomotopy.DTModel.get_alpha().
  • Fixed an issue where the metadata value could not be obtained for the document of tomotopy.GDMRModel.
  • tomotopy.HLDAModel.alpha now shows Dirichlet prior of per-document depth distribution.
  • tomotopy.LDAModel.global_step has been added.
  • tomotopy.MGLDAModel.get_count_by_topics() now returns the word count for both global and local topics.
  • tomotopy.PAModel.alpha, tomotopy.PAModel.subalpha, and tomotopy.PAModel.get_count_by_super_topic() have been added.

0.8.2

15 Jul 00:24
0c22d7f
Compare
Choose a tag to compare
  • New properties tomotopy.DTModel.num_timepoints and tomotopy.DTModel.num_docs_by_timepoint have been added.
  • A bug which causes different results with the different platform even if seeds were the same was partially fixed.
    As a result of this fix, now tomotopy in 32 bit yields different training results from earlier version.

0.8.1

09 Jun 00:01
Compare
Choose a tag to compare
  • A bug where tomotopy.LDAModel.used_vocabs returned an incorrect value was fixed.
  • Now tomotopy.CTModel.prior_cov returns a covariance matrix with shape [k, k].
  • Now tomotopy.CTModel.get_correlations with empty arguments returns a correlation matrix with shape [k, k].

0.8.0

06 Jun 07:52
be0107c
Compare
Choose a tag to compare
  • Since NumPy was introduced in tomotopy, many methods and properties of tomotopy return not just list, but numpy.ndarray now.
  • Tomotopy has a new dependency NumPy >= 1.10.0.
  • A wrong estimation of tomotopy.HDPModel.infer was fixed.
  • A new method about converting HDPModel to LDAModel was added.
  • New properties including tomotopy.LDAModel.used_vocabs, tomotopy.LDAModel.used_vocab_freq and tomotopy.LDAModel.used_vocab_df were added into topic models.
  • A new g-DMR topic model(tomotopy.GDMRModel) was added.
  • An error at initializing tomotopy.label.FoRelevance in macOS was fixed.
  • An error that occured when using tomotopy.utils.Corpus created without raw parameters was fixed.

0.7.1

08 May 07:11
Compare
Choose a tag to compare
  • tomotopy.Document.path was added for tomotopy.HLDAModel.
  • A memory corruption bug in tomotopy.label.PMIExtractor was fixed.
  • A compile error in gcc 7 was fixed.