Add paragraph describing dictionary.dfs and dictionary.compactify()

In code snippet 13 there are two new concepts introduced that have not been explained yet. In addition the workflow to create the dictionary here is completely different from the workflow described in code snippets 4 and 5. I've added a paragraph that tries to explain the new workflow and concepts.
piskvorky · May 22, 2017 · 1e835e7 · 1e835e7
1 parent 0635638
commit 1e835e7
Showing 1 changed file with 2 additions and 0 deletions.
diff --git a/docs/notebooks/Corpora_and_Vector_Spaces.ipynb b/docs/notebooks/Corpora_and_Vector_Spaces.ipynb
@@ -381,6 +381,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "The dictionary was first built from the complete mycorpus.txt file. Then the list of tokenids to remove was generated by querying the dictionary for the token ids of the stop words, and by querying the document frequencies dictionary (dictionary.dfs) for token ids that only appear once. Finally, dictionary.compactify() is called to remove the gaps in the token id series.\n",
+    "\n",
     "And that is all there is to it! At least as far as bag-of-words representation is concerned. Of course, what we do with such corpus is another question; it is not at all clear how counting the frequency of distinct words could be useful. As it turns out, it isn’t, and we will need to apply a transformation on this simple representation first, before we can use it to compute any meaningful document vs. document similarities. Transformations are covered in the [next tutorial](https://radimrehurek.com/gensim/tut2.html), but before that, let’s briefly turn our attention to *corpus persistency*.\n",
     "\n",
     "## Corpus Formats\n",