Skip to content

Commit

Permalink
added google analytics to gensim website
Browse files Browse the repository at this point in the history
  • Loading branch information
piskvorky committed Jun 22, 2011
1 parent c9dd9d3 commit f9560e5
Show file tree
Hide file tree
Showing 46 changed files with 556 additions and 75 deletions.
8 changes: 4 additions & 4 deletions docs/_sources/dist_lsi.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ our choice is incidental) and try::
>>> corpus = corpora.MmCorpus('/tmp/deerwester.mm') # load a corpus of nine documents, from the Tutorials
>>> id2word = corpora.Dictionary.load('/tmp/deerwester.dict')

>>> lsi = models.LsiModel(corpus, id2word=id2word, num_topics=200, chunks=1, distributed=True) # run distributed LSA on nine documents
>>> lsi = models.LsiModel(corpus, id2word=id2word, num_topics=200, chunksize=1, distributed=True) # run distributed LSA on nine documents

This uses the corpus and feature-token mapping created in the :doc:`tut1` tutorial.
If you look at the log in your Python session, you should see a line similar to::
Expand All @@ -81,15 +81,15 @@ To check the LSA results, let's print the first two latent topics::
topic #1(2.542): -0.623*"graph" + -0.490*"trees" + -0.451*"minors" + -0.274*"survey" + 0.167*"system"

Success! But a corpus of nine documents is no challenge for our powerful cluster...
In fact, we had to lower the job size (`chunks` parameter above) to a single document
In fact, we had to lower the job size (`chunksize` parameter above) to a single document
at a time, otherwise all documents would be processed by a single worker all at once.

So let's run LSA on **one million documents** instead::

>>> # inflate the corpus to 1M documents, by repeating its documents over&over
>>> corpus1m = utils.RepeatCorpus(corpus, 1000000)
>>> # run distributed LSA on 1 million documents
>>> lsi1m = models.LsiModel(corpus1m, id2word=id2word, num_topics=200, chunks=10000, distributed=True)
>>> lsi1m = models.LsiModel(corpus1m, id2word=id2word, num_topics=200, chunksize=10000, distributed=True)

>>> lsi1m.printTopics(num_topics=2, num_words=5)
topic #0(1113.628): 0.644*"system" + 0.404*"user" + 0.301*"eps" + 0.265*"time" + 0.265*"response"
Expand Down Expand Up @@ -133,7 +133,7 @@ the corpus iterator with::
Now we're ready to run distributed LSA on the English Wikipedia::

>>> # extract 400 LSI topics, using a cluster of nodes
>>> lsi = gensim.models.lsimodel.LsiModel(corpus=mm, id2word=id2word, num_topics=400, chunks=20000, distributed=True)
>>> lsi = gensim.models.lsimodel.LsiModel(corpus=mm, id2word=id2word, num_topics=400, chunksize=20000, distributed=True)

>>> # print the most contributing words (both positively and negatively) for each of the first ten topics
>>> lsi.print_topics(10)
Expand Down
4 changes: 2 additions & 2 deletions docs/_sources/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Gensim -- Vector Space Modelling for Humans

.. admonition:: What's new?

* 19/06/2011: version 0.8.0 is out! Faster & better: :doc:`CHANGELOG<changes_080>`
* 19/06/2011: version 0.8.0 is out! Faster & better: :doc:`changes walkthrough<changes_080>`
* 12/02/2011: faster and leaner **Latent Semantic Indexing (LSI)** and **Latent Dirichlet Allocation (LDA)**:

* :doc:`Processing the English Wikipedia <wiki>`, 3.2 million documents (`NIPS workshop paper <http://arxiv.org/abs/1102.5597>`_)
Expand All @@ -18,7 +18,7 @@ For an **overview** of what you can (or cannot) do with `gensim`, go to the :doc

For **installation** and **troubleshooting**, see the :doc:`installation <install>` page and the `gensim discussion group <http://groups.google.com/group/gensim/>`_.

For **examples** on how to use it, try the :doc:`tutorials <tutorial>`.
For **examples** on how to convert text to vectors and work with the result, try the :doc:`tutorials <tutorial>`.

When **citing** `gensim` in academic papers, please use
`this BibTeX entry <http://nlp.fi.muni.cz/projekty/gensim/bibtex_gensim.bib>`_.
Expand Down
2 changes: 1 addition & 1 deletion docs/_sources/wiki.txt
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ every 10,000 articles, this means we will have done 300 updates in one pass, qui
enough to have a very accurate topics estimate::

>>> # extract 100 LDA topics, using 1 pass and updating once every 1 chunk (10,000 documents)
>>> lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunks=10000, passes=1)
>>> lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1)
using serial LDA version on this node
running online LDA training, 100 topics, 1 passes over the supplied corpus of 3146817 documets, updating model once every 10000 documents
..
Expand Down
15 changes: 14 additions & 1 deletion docs/apiref.html
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -158,7 +171,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/changes_080.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -179,7 +192,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/bleicorpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -209,7 +222,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/corpora.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -113,7 +126,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/dictionary.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -240,7 +253,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/indexedcorpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -221,7 +234,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/lowcorpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -223,7 +236,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/mmcorpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -197,7 +210,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/svmlightcorpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -214,7 +227,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/textcorpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -213,7 +226,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
15 changes: 14 additions & 1 deletion docs/corpora/wikicorpus.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
<meta property="og:title" content="#gensim" />
<meta property="og:description" content="Efficient topic modelling in Python" />

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-24066335-1']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</head>
<body>
<!--
Expand Down Expand Up @@ -254,7 +267,7 @@ <h3>Navigation</h3>

<div class="footer">
&copy; Copyright 2011, Radim Řehůřek &lt;radimrehurek(at)seznam.cz&gt;.
Last updated on Jun 20, 2011.
Last updated on Jun 22, 2011.
</div>
</body>
</html>
Loading

0 comments on commit f9560e5

Please sign in to comment.