Skip to content

Commit

Permalink
Merge pull request #37 from bab2min/develop
Browse files Browse the repository at this point in the history
bug #36 fixed
  • Loading branch information
bab2min authored Mar 28, 2020
2 parents 084b190 + ef510ca commit 3830c3d
Show file tree
Hide file tree
Showing 34 changed files with 340 additions and 287 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pull_request_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.5, 3.6, 3.7, 3.8]
architecture: [x86, x64]

steps:
Expand Down
17 changes: 13 additions & 4 deletions README.kr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ tomotopy 란?

더 자세한 정보는 https://bab2min.github.io/tomotopy/index.kr.html 에서 확인하시길 바랍니다.

tomotopy의 가장 최신버전은 0.6.1 입니다.
tomotopy의 가장 최신버전은 0.6.2 입니다.

시작하기
---------------
Expand All @@ -39,7 +39,13 @@ tomotopy의 가장 최신버전은 0.6.1 입니다.

$ pip install tomotopy

Linux에서는 c++11 코드를 컴파일하기 위해 gcc 5 이상이 필수적으로 설치되어 있어야 합니다.
지원하는 운영체제 및 Python 버전은 다음과 같습니다:

* Python 3.5 이상이 설치된 Linux (x86-64)
* Python 3.5 이상이 설치된 macOS 10.13나 그 이후 버전
* Python 3.5 이상이 설치된 Windows 7이나 그 이후 버전 (x86, x86-64)
* Python 3.5 이상이 설치된 다른 운영체제: 이 경우는 c++11 호환 컴파일러를 통한 소스코드 컴파일이 필요합니다.

설치가 끝난 뒤에는 다음과 같이 Python3에서 바로 import하여 tomotopy를 사용할 수 있습니다.
::

Expand Down Expand Up @@ -210,10 +216,13 @@ tomotopy의 Python3 예제 코드는 https://github.com/bab2min/tomotopy/blob/ma

역사
-------
* 0.6.1 (2020-03-22)
* 0.6.2 (2020-03-28)
* `save`와 `load`에 관련된 치명적인 버그가 수정되었습니다. 해당 버그로 0.6.0 및 0.6.1 버전은 릴리즈에서 삭제되었습니다.

* 0.6.1 (2020-03-22) (삭제됨)
* 모듈 로딩과 관련된 버그가 수정되었습니다.

* 0.6.0 (2020-03-22)
* 0.6.0 (2020-03-22) (삭제됨)
* 대량의 문헌을 관리하기 위한 `tomotopy.utils.Corpus`가 추가되었습니다.
* 어휘-주제 분포의 사전 확률을 조절할 수 있는 `tomotopy.LDAModel.set_word_prior` 메소드가 추가되었습니다.
* 문헌 빈도를 기반으로 어휘를 필터링할 수 있도록 토픽 모델의 생성자에 `min_df`가 추가되었습니다.
Expand Down
17 changes: 13 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The current version of `tomoto` supports several major topic models including

Please visit https://bab2min.github.io/tomotopy to see more information.

The most recent version of tomotopy is 0.6.1.
The most recent version of tomotopy is 0.6.2.

Getting Started
---------------
Expand All @@ -40,7 +40,13 @@ You can install tomotopy easily using pip. (https://pypi.org/project/tomotopy/)

$ pip install tomotopy

For Linux, it is neccesary to have gcc 5 or more for compiling C++11 codes.
The supported OS and Python versions are:

* Linux (x86-64) with Python >= 3.5
* macOS >= 10.13 with Python >= 3.5
* Windows 7 or later (x86, x86-64) with Python >= 3.5
* Other OS with Python >= 3.5: Compilation from source code required (with c++11 compatible compiler)

After installing, you can start tomotopy by just importing.
::

Expand Down Expand Up @@ -215,10 +221,13 @@ meaning you can use it for any reasonable purpose and remain in complete ownersh

History
-------
* 0.6.1 (2020-03-22)
* 0.6.2 (2020-03-28)
* A critical bug related to `save` and `load` was fixed. Version 0.6.0 and 0.6.1 have been removed from releases.

* 0.6.1 (2020-03-22) (removed)
* A bug related to module loading was fixed.

* 0.6.0 (2020-03-22)
* 0.6.0 (2020-03-22) (removed)
* `tomotopy.utils.Corpus` class that manages multiple documents easily was added.
* `tomotopy.LDAModel.set_word_prior` method that controls word-topic priors of topic models was added.
* A new argument `min_df` that filters words based on document frequency was added into every topic model's __init__.
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
setup(
name='tomotopy',

version='0.6.1',
version='0.6.2',

description='Tomoto, The Topic Modeling Tool for Python',
long_description=long_description,
Expand Down
8 changes: 4 additions & 4 deletions src/TopicModel/CT.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@

namespace tomoto
{
template<TermWeight _TW, size_t _Flags = 0>
struct DocumentCTM : public DocumentLDA<_TW, _Flags>
template<TermWeight _tw, size_t _Flags = 0>
struct DocumentCTM : public DocumentLDA<_tw, _Flags>
{
using BaseDocument = DocumentLDA<_TW, _Flags>;
using DocumentLDA<_TW, _Flags>::DocumentLDA;
using BaseDocument = DocumentLDA<_tw, _Flags>;
using DocumentLDA<_tw, _Flags>::DocumentLDA;
Eigen::Matrix<Float, -1, -1> beta; // Dim: (K, betaSample)
Eigen::Matrix<Float, -1, 1> smBeta; // Dim: K
DEFINE_SERIALIZER_AFTER_BASE_WITH_VERSION(BaseDocument, 0, smBeta);
Expand Down
20 changes: 10 additions & 10 deletions src/TopicModel/CTModel.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,28 @@ Implementation of CTM using Gibbs sampling by bab2min

namespace tomoto
{
template<TermWeight _TW>
struct ModelStateCTM : public ModelStateLDA<_TW>
template<TermWeight _tw>
struct ModelStateCTM : public ModelStateLDA<_tw>
{
};

template<TermWeight _TW, size_t _Flags = flags::partitioned_multisampling,
template<TermWeight _tw, size_t _Flags = flags::partitioned_multisampling,
typename _Interface = ICTModel,
typename _Derived = void,
typename _DocType = DocumentCTM<_TW>,
typename _ModelState = ModelStateCTM<_TW>>
class CTModel : public LDAModel<_TW, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, CTModel<_TW, _Flags>, _Derived>::type,
typename _DocType = DocumentCTM<_tw>,
typename _ModelState = ModelStateCTM<_tw>>
class CTModel : public LDAModel<_tw, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, CTModel<_tw, _Flags>, _Derived>::type,
_DocType, _ModelState>
{
protected:
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, CTModel<_TW>, _Derived>::type;
using BaseClass = LDAModel<_TW, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, CTModel<_tw>, _Derived>::type;
using BaseClass = LDAModel<_tw, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
friend BaseClass;
friend typename BaseClass::BaseClass;
using WeightType = typename BaseClass::WeightType;

const char* TMID = "CTM\0";
static constexpr char TMID[] = "CTM\0";

size_t numBetaSample = 10;
size_t numTMNSample = 5;
Expand Down
8 changes: 4 additions & 4 deletions src/TopicModel/DMR.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@

namespace tomoto
{
template<TermWeight _TW, size_t _Flags = 0>
struct DocumentDMR : public DocumentLDA<_TW, _Flags>
template<TermWeight _tw, size_t _Flags = 0>
struct DocumentDMR : public DocumentLDA<_tw, _Flags>
{
using BaseDocument = DocumentLDA<_TW, _Flags>;
using DocumentLDA<_TW, _Flags>::DocumentLDA;
using BaseDocument = DocumentLDA<_tw, _Flags>;
using DocumentLDA<_tw, _Flags>::DocumentLDA;
size_t metadata = 0;

DEFINE_SERIALIZER_AFTER_BASE_WITH_VERSION(BaseDocument, 0, metadata);
Expand Down
28 changes: 14 additions & 14 deletions src/TopicModel/DMRModel.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,29 @@ Implementation of DMR using Gibbs sampling by bab2min

namespace tomoto
{
template<TermWeight _TW>
struct ModelStateDMR : public ModelStateLDA<_TW>
template<TermWeight _tw>
struct ModelStateDMR : public ModelStateLDA<_tw>
{
Eigen::Matrix<Float, -1, 1> tmpK;
};

template<TermWeight _TW, size_t _Flags = flags::partitioned_multisampling,
template<TermWeight _tw, size_t _Flags = flags::partitioned_multisampling,
typename _Interface = IDMRModel,
typename _Derived = void,
typename _DocType = DocumentDMR<_TW>,
typename _ModelState = ModelStateDMR<_TW>>
class DMRModel : public LDAModel<_TW, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, DMRModel<_TW, _Flags>, _Derived>::type,
typename _DocType = DocumentDMR<_tw>,
typename _ModelState = ModelStateDMR<_tw>>
class DMRModel : public LDAModel<_tw, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, DMRModel<_tw, _Flags>, _Derived>::type,
_DocType, _ModelState>
{
protected:
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, DMRModel<_TW>, _Derived>::type;
using BaseClass = LDAModel<_TW, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, DMRModel<_tw>, _Derived>::type;
using BaseClass = LDAModel<_tw, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
friend BaseClass;
friend typename BaseClass::BaseClass;
using WeightType = typename BaseClass::WeightType;

const char* TMID = "DMR\0";
static constexpr char TMID[] = "DMR\0";

Eigen::Matrix<Float, -1, -1> lambda;
Eigen::Matrix<Float, -1, -1> expLambda;
Expand Down Expand Up @@ -362,11 +362,11 @@ namespace tomoto
};

/* This is for preventing 'undefined symbol' problem in compiling by clang. */
template<TermWeight _TW, size_t _Flags,
template<TermWeight _tw, size_t _Flags,
typename _Interface, typename _Derived, typename _DocType, typename _ModelState>
constexpr Float DMRModel<_TW, _Flags, _Interface, _Derived, _DocType, _ModelState>::maxLambda;
constexpr Float DMRModel<_tw, _Flags, _Interface, _Derived, _DocType, _ModelState>::maxLambda;

template<TermWeight _TW, size_t _Flags,
template<TermWeight _tw, size_t _Flags,
typename _Interface, typename _Derived, typename _DocType, typename _ModelState>
constexpr size_t DMRModel<_TW, _Flags, _Interface, _Derived, _DocType, _ModelState>::maxBFGSIteration;
constexpr size_t DMRModel<_tw, _Flags, _Interface, _Derived, _DocType, _ModelState>::maxBFGSIteration;
}
10 changes: 5 additions & 5 deletions src/TopicModel/DTM.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@

namespace tomoto
{
template<TermWeight _TW, size_t _Flags = 0>
struct DocumentDTM : public DocumentLDA<_TW, _Flags>
template<TermWeight _tw, size_t _Flags = 0>
struct DocumentDTM : public DocumentLDA<_tw, _Flags>
{
using BaseDocument = DocumentLDA<_TW, _Flags>;
using DocumentLDA<_TW, _Flags>::DocumentLDA;
using WeightType = typename std::conditional<_TW == TermWeight::one, int32_t, float>::type;
using BaseDocument = DocumentLDA<_tw, _Flags>;
using DocumentLDA<_tw, _Flags>::DocumentLDA;
using WeightType = typename std::conditional<_tw == TermWeight::one, int32_t, float>::type;
};

class IDTModel : public ILDAModel
Expand Down
18 changes: 9 additions & 9 deletions src/TopicModel/DTModel.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,29 +12,29 @@ Implementation of Dynamic Topic Model using Gibbs sampling by bab2min

namespace tomoto
{
template<TermWeight _TW>
template<TermWeight _tw>
struct ModelStateDTM
{
using WeightType = typename std::conditional<_TW == TermWeight::one, int32_t, float>::type;
using WeightType = typename std::conditional<_tw == TermWeight::one, int32_t, float>::type;

Eigen::Matrix<Float, -1, 1> zLikelihood;
Eigen::Matrix<WeightType, -1, -1> numByTopic; // Dim: (Topic, T)
Eigen::Matrix<WeightType, -1, -1> numByTopicWord; // Dim: (Topic, Vocabs * T)
DEFINE_SERIALIZER(numByTopic, numByTopicWord);
};

template<TermWeight _TW, size_t _Flags = flags::partitioned_multisampling,
template<TermWeight _tw, size_t _Flags = flags::partitioned_multisampling,
typename _Interface = IDTModel,
typename _Derived = void,
typename _DocType = DocumentDTM<_TW>,
typename _ModelState = ModelStateDTM<_TW>>
class DTModel : public LDAModel<_TW, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, DTModel<_TW, _Flags>, _Derived>::type,
typename _DocType = DocumentDTM<_tw>,
typename _ModelState = ModelStateDTM<_tw>>
class DTModel : public LDAModel<_tw, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, DTModel<_tw, _Flags>, _Derived>::type,
_DocType, _ModelState>
{
protected:
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, DTModel<_TW>, _Derived>::type;
using BaseClass = LDAModel<_TW, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, DTModel<_tw>, _Derived>::type;
using BaseClass = LDAModel<_tw, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
friend BaseClass;
friend typename BaseClass::BaseClass;
using WeightType = typename BaseClass::WeightType;
Expand Down
8 changes: 4 additions & 4 deletions src/TopicModel/GDMR.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@

namespace tomoto
{
template<TermWeight _TW, size_t _Flags = 0>
struct DocumentGDMR : public DocumentDMR<_TW, _Flags>
template<TermWeight _tw, size_t _Flags = 0>
struct DocumentGDMR : public DocumentDMR<_tw, _Flags>
{
using BaseDocument = DocumentDMR<_TW, _Flags>;
using DocumentDMR<_TW, _Flags>::DocumentDMR;
using BaseDocument = DocumentDMR<_tw, _Flags>;
using DocumentDMR<_tw, _Flags>::DocumentDMR;
std::vector<Float> metadataC;

DEFINE_SERIALIZER_AFTER_BASE_WITH_VERSION(BaseDocument, 0, metadataC);
Expand Down
18 changes: 9 additions & 9 deletions src/TopicModel/GDMRModel.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,27 @@

namespace tomoto
{
template<TermWeight _TW>
struct ModelStateGDMR : public ModelStateDMR<_TW>
template<TermWeight _tw>
struct ModelStateGDMR : public ModelStateDMR<_tw>
{
/*Eigen::Matrix<Float, -1, 1> alphas;
Eigen::Matrix<Float, -1, 1> terms;
std::vector<std::vector<Float>> slpCache;
std::vector<size_t> ndimCnt;*/
};

template<TermWeight _TW, size_t _Flags = flags::partitioned_multisampling,
template<TermWeight _tw, size_t _Flags = flags::partitioned_multisampling,
typename _Interface = IGDMRModel,
typename _Derived = void,
typename _DocType = DocumentGDMR<_TW, _Flags>,
typename _ModelState = ModelStateGDMR<_TW>>
class GDMRModel : public DMRModel<_TW, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, GDMRModel<_TW>, _Derived>::type,
typename _DocType = DocumentGDMR<_tw, _Flags>,
typename _ModelState = ModelStateGDMR<_tw>>
class GDMRModel : public DMRModel<_tw, _Flags, _Interface,
typename std::conditional<std::is_same<_Derived, void>::value, GDMRModel<_tw>, _Derived>::type,
_DocType, _ModelState>
{
protected:
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, GDMRModel<_TW>, _Derived>::type;
using BaseClass = DMRModel<_TW, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
using DerivedClass = typename std::conditional<std::is_same<_Derived, void>::value, GDMRModel<_tw>, _Derived>::type;
using BaseClass = DMRModel<_tw, _Flags, _Interface, DerivedClass, _DocType, _ModelState>;
friend BaseClass;
friend typename BaseClass::BaseClass;
friend typename BaseClass::BaseClass::BaseClass;
Expand Down
10 changes: 5 additions & 5 deletions src/TopicModel/HDP.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@

namespace tomoto
{
template<TermWeight _TW>
struct DocumentHDP : public DocumentLDA<_TW>
template<TermWeight _tw>
struct DocumentHDP : public DocumentLDA<_tw>
{
/*
For DocumentHDP, the topic in numByTopic, Zs indicates 'table id', not 'topic id'.
To get real 'topic id', check the topic field of numTopicByTable.
*/
using BaseDocument = DocumentLDA<_TW>;
using DocumentLDA<_TW>::DocumentLDA;
using WeightType = typename DocumentLDA<_TW>::WeightType;
using BaseDocument = DocumentLDA<_tw>;
using DocumentLDA<_tw>::DocumentLDA;
using WeightType = typename DocumentLDA<_tw>::WeightType;
struct TableTopicInfo
{
WeightType num;
Expand Down
Loading

0 comments on commit 3830c3d

Please sign in to comment.