Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Lda training visualization in visdom #1399

Merged
merged 36 commits into from
Aug 30, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
bb65439
save log params in a dict
parulsethi Jun 7, 2017
9d2e78d
remove redundant line
parulsethi Jun 7, 2017
33818ec
add diff log
parulsethi Jun 7, 2017
281222c
remove diff log
parulsethi Jun 8, 2017
c507bbb
write params to log directory
parulsethi Jun 8, 2017
6f75ccc
add convergence, remove alpha
parulsethi Jun 9, 2017
d9db4e2
calculate perplexity/diff instead of using log function
parulsethi Jun 9, 2017
cd5f822
add docstrings and comments
parulsethi Jun 9, 2017
f4728e0
add coherence/diff labels in graphs
parulsethi Jun 12, 2017
40cf092
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
parulsethi Jun 16, 2017
d4f69f5
optional measures for viz
parulsethi Jun 16, 2017
fde7d4d
add coherence params to lda init
parulsethi Jun 16, 2017
3f18076
added Lda Visom viz notebook
parulsethi Jun 26, 2017
546908e
add option to specify env
parulsethi Jun 26, 2017
651a61a
made requested changes
parulsethi Jun 28, 2017
13dfddc
Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim…
parulsethi Jul 8, 2017
1376d90
add generic callback API
parulsethi Jul 8, 2017
44c8e58
modified Notebook for new API
parulsethi Jul 8, 2017
92949a3
fix flake8
parulsethi Jul 8, 2017
5b22e4d
correct lee corpus division
parulsethi Jul 12, 2017
c369fc5
added docstrings
parulsethi Jul 17, 2017
a32960d
fix flake8
parulsethi Jul 18, 2017
48526d9
add shell example
parulsethi Jul 18, 2017
adf2a60
fix queue import for both py2/py3
parulsethi Jul 19, 2017
a272090
store metrics in model instance
parulsethi Aug 2, 2017
d3389bb
add nb example for getting metrics after train
parulsethi Aug 3, 2017
96949f7
merge develop
parulsethi Aug 8, 2017
7d0f0ec
made rquested changes
parulsethi Aug 8, 2017
dcc64a1
use dict for saving metrics
parulsethi Aug 9, 2017
47434f9
use str method for metric classes
parulsethi Aug 10, 2017
30c9b64
correct a notebook description
parulsethi Aug 10, 2017
e55af47
remove child-classes str method
parulsethi Aug 10, 2017
df5e01f
made requested changes
parulsethi Aug 23, 2017
b334c50
Merge branch 'develop' into tensorboard_logs
parulsethi Aug 24, 2017
c54e6bf
add visdom screenshot
parulsethi Aug 24, 2017
5f3d902
Merge branch 'tensorboard_logs' of https://github.com/parulsethi/gens…
parulsethi Aug 24, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 4 additions & 13 deletions docs/notebooks/Training_visualizations.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
"\n",
"**Coherence**\n",
"\n",
"Coherence is a measure used to evaluate topic models. A good model will generate coherent topics, i.e., topics with high topic coherence scores. Good topics are topics that can be described by a short label based on the topic terms they spit out. \n",
"Coherence measures are generally based on the idea of computing the sum of pairwise scores of top *n* top words w<sub>1</sub>, ...,w<sub>n</sub> used to describe the topic. There are four coherence measure available in gensim: `u_mass, c_v, c_uci, c_npmi`. A good model will generate coherent topics, i.e., topics with high topic coherence scores. Good topics can be described by a short label based on the topic terms they spit out. \n",
"\n",
"<img src=\"Coherence.gif\">\n",
"\n",
Expand All @@ -140,11 +140,11 @@
"\n",
"**Perplexity**\n",
"\n",
"Perplexity is a measurement of how well a probability distribution or probability model predicts a sample. In LDA, topics are described by a probability distribution over vocabulary words. So, perplexity can be used to compare probabilistic models like LDA.\n",
"Perplexity is a measurement of how well a probability distribution or probability model predicts a sample. In LDA, topics are described by a probability distribution over vocabulary words. So, perplexity can be used to evaluate the topic-term distribution output by LDA.\n",
"\n",
"<img src=\"Perplexity.gif\">\n",
"\n",
"For a good model, perplexity should be as low as possible.\n",
"For a good model, perplexity should be low.\n",
"\n",
"\n",
"**Topic Difference**\n",
Expand All @@ -153,7 +153,7 @@
"\n",
"<img src=\"Diff.gif\">\n",
"\n",
"In the heatmap, X-axis define the Epoch no. and Y-axis define the distance between the identical topic from consecutive epochs. For ex. a particular cell in the heatmap with values (x=3, y=5, z=0.4) represent the distance(=0.4) between the topic 5 from 3rd epoch and topic 5 from 2nd epoch. With increasing epochs, the distance between the identical topics should decrease.\n",
"In the heatmap, X-axis define the Epoch no. and Y-axis define the distance between identical topics from consecutive epochs. For ex. a particular cell in the heatmap with values (x=3, y=5, z=0.4) represent the distance(=0.4) between the topic 5 from 3rd epoch and topic 5 from 2nd epoch. With increasing epochs, the distance between the identical topics should decrease.\n",
" \n",
" \n",
"**Convergence**\n",
Expand Down Expand Up @@ -293,15 +293,6 @@
"source": [
"model.metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
28 changes: 21 additions & 7 deletions gensim/models/callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,14 @@ class Metric(object):
"""
Base Metric class for topic model evaluation metrics
"""
def __init__(self):
pass
def __str__(self):
"""
Return a string representation of Metric class
"""
if self.title is not None:
return self.title
else:
return type(self).__name__[:-6]

def set_parameters(self, **parameters):
"""
Expand Down Expand Up @@ -87,6 +93,9 @@ def __init__(self, corpus=None, texts=None, dictionary=None, coherence=None, win
self.viz_env = viz_env
self.title = title

def __str__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, if you have no method in child-class, this method will be called from parent class -> no need to call __str__ from each callback explicitly.

return super(CoherenceMetric, self).__str__()

def get_value(self, **kwargs):
"""
Args:
Expand Down Expand Up @@ -125,6 +134,9 @@ def __init__(self, corpus=None, logger=None, viz_env=None, title=None):
self.viz_env = viz_env
self.title = title

def __str__(self):
return super(PerplexityMetric, self).__str__()

def get_value(self, **kwargs):
"""
Args:
Expand Down Expand Up @@ -168,6 +180,9 @@ def __init__(self, distance="jaccard", num_words=100, n_ann_terms=10, diagonal=T
self.viz_env = viz_env
self.title = title

def __str__(self):
return super(DiffMetric, self).__str__()

def get_value(self, **kwargs):
"""
Args:
Expand Down Expand Up @@ -211,6 +226,9 @@ def __init__(self, distance="jaccard", num_words=100, n_ann_terms=10, diagonal=T
self.viz_env = viz_env
self.title = title

def __str__(self):
return super(ConvergenceMetric, self).__str__()

def get_value(self, **kwargs):
"""
Args:
Expand Down Expand Up @@ -272,11 +290,8 @@ def on_epoch_end(self, epoch, topics=None):

# plot all metrics in current epoch
for i, metric in enumerate(self.metrics):
label = str(metric)
value = metric.get_value(topics=topics, model=self.model, other_model=self.previous)
if metric.title is not None:
label = metric.title
else:
label = type(metric).__name__[:-6]

current_metrics[label] = value

Expand Down Expand Up @@ -311,4 +326,3 @@ def on_epoch_end(self, epoch, topics=None):
self.previous = copy.deepcopy(self.model)

return current_metrics