Fix typo in translation_matrix notebook (#1598)

piskvorky · Sep 26, 2017 · 0a2c05d · 0a2c05d
1 parent 33a3ef2
commit 0a2c05d
Showing 1 changed file with 42 additions and 42 deletions.
diff --git a/docs/notebooks/translation_matrix.ipynb b/docs/notebooks/translation_matrix.ipynb
@@ -27,7 +27,7 @@
     "editable": true
    },
    "source": [
-    "Suppose we are given a setofword pairs and their associated vector representaion $\\{x_{i},z_{i}\\}_{i=1}^{n}$, where $x_{i} \\in R^{d_{1}}$ is the distibuted representation of word $i$ in the source language, and ${z_{i} \\in R^{d_{2}}}$ is the vector representation of its translation. Our goal is to find a transformation matrix $W$ such that $Wx_{i}$ approximates $z_{i}$. In practice, $W$ can be learned by the following optimization prolem:\n",
+    "Suppose we are given a set of word pairs and their associated vector representaion $\\{x_{i},z_{i}\\}_{i=1}^{n}$, where $x_{i} \\in R^{d_{1}}$ is the distibuted representation of word $i$ in the source language, and ${z_{i} \\in R^{d_{2}}}$ is the vector representation of its translation. Our goal is to find a transformation matrix $W$ such that $Wx_{i}$ approximates $z_{i}$. In practice, $W$ can be learned by the following optimization prolem:\n",
     "\n",
     "<center>$\\min \\limits_{W} \\sum \\limits_{i=1}^{n} ||Wx_{i}-z_{i}||^{2}$</center>"
    ]
@@ -56,7 +56,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "metadata": {
     "collapsed": false,
     "deletable": true,
@@ -86,16 +86,16 @@
     "editable": true
    },
    "source": [
-    "For this tutorial, we'll be training our model using the English -> Italian word pairs from the OPUS collection. This corpus contains 5000 word pairs. Each pair is a English word and corresponding Italian word.\n",
+    "For this tutorial, we'll train our model using the English -> Italian word pairs from the OPUS collection. This corpus contains 5000 word pairs. Each word pair is English word with corresponding Italian word.\n",
     "\n",
-    "dataset download: \n",
+    "Dataset download: \n",
     "\n",
     "[OPUS_en_it_europarl_train_5K.txt](https://pan.baidu.com/s/1nuIuQoT)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
    "metadata": {
     "collapsed": false,
     "deletable": true,
@@ -115,7 +115,7 @@
     "\n",
     "with utils.smart_open(train_file, \"r\") as f:\n",
     "    word_pair = [tuple(utils.to_unicode(line).strip().split()) for line in f]\n",
-    "print word_pair[:10]"
+    "print (word_pair[:10])"
    ]
   },
   {
@@ -125,10 +125,10 @@
     "editable": true
    },
    "source": [
-    "This tutorial uses 300-dimensional vectors of English words as source and vectors of Italian words as target.(those vector trained by the word2vec toolkit with cbow. The context window was set 5 words to either side of the target,\n",
+    "This tutorial uses 300-dimensional vectors of English words as source and vectors of Italian words as target. (Those vector trained by the word2vec toolkit with cbow. The context window was set 5 words to either side of the target,\n",
     "the sub-sampling option was set to 1e-05 and estimate the probability of a target word with the negative sampling method, drawing 10 samples from the noise distribution)\n",
     "\n",
-    "dataset download:\n",
+    "Download dataset:\n",
     "\n",
     "[EN.200K.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt](https://pan.baidu.com/s/1nv3bYel)\n",
     "\n",
@@ -137,7 +137,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 4,
    "metadata": {
     "collapsed": false,
     "deletable": true,
@@ -152,7 +152,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 5,
    "metadata": {
     "collapsed": true,
     "deletable": true,
@@ -172,12 +172,12 @@
     "editable": true
    },
    "source": [
-    "training the translation matrix"
+    "Train the translation matrix"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 6,
    "metadata": {
     "collapsed": false,
     "deletable": true,
@@ -188,14 +188,14 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "the shape of translation matrix is:  (300, 300)\n"
+      "('the shape of translation matrix is: ', (300, 300))\n"
      ]
     }
    ],
    "source": [
     "transmat = translation_matrix.TranslationMatrix(source_word_vec, target_word_vec, word_pair)\n",
     "transmat.train(word_pair)\n",
-    "print \"the shape of translation matrix is: \", transmat.translation_matrix.shape"
+    "print (\"the shape of translation matrix is: \", transmat.translation_matrix.shape)"
    ]
   },
   {
@@ -205,7 +205,7 @@
     "editable": true
    },
    "source": [
-    "Prediction Time: for any given new word, we can map it to the other language space by coputing $z = Wx$, then we find the word whose representation is closet to z in the target language space, using consine similarity as the distance metric."
+    "Prediction Time: For any given new word, we can map it to the other language space by coputing $z = Wx$, then we find the word whose representation is closet to z in the target language space, using consine similarity as the distance metric."
    ]
   },
   {
@@ -215,13 +215,13 @@
     "editable": true
    },
    "source": [
-    "#### part one:\n",
-    "Let's look at some number translation. We use English words (one, two, three, four and five) as test."
+    "#### Part one:\n",
+    "Let's look at some vocabulary of numbers translation. We use English words (one, two, three, four and five) as test."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 7,
    "metadata": {
     "collapsed": false,
     "deletable": true,
@@ -232,23 +232,23 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:224: UserWarning: The parameter source_lang_vec isn't specified, use the model's source language word vector as default.\n",
+      "/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:223: UserWarning: The parameter source_lang_vec isn't specified, use the model's source language word vector as default.\n",
       "  warnings.warn(\"The parameter source_lang_vec isn't specified, use the model's source language word vector as default.\")\n",
-      "/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:228: UserWarning: The parameter target_lang_vec isn't specified, use the model's target language word vector as default.\n",
+      "/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:227: UserWarning: The parameter target_lang_vec isn't specified, use the model's target language word vector as default.\n",
       "  warnings.warn(\"The parameter target_lang_vec isn't specified, use the model's target language word vector as default.\")\n"
      ]
     }
    ],
    "source": [
-    "# the piar is (English, Italian), we can see whether the translated word is right or not \n",
+    "# The pair is in the form of (English, Italian), we can see whether the translated word is correct\n",
     "words = [(\"one\", \"uno\"), (\"two\", \"due\"), (\"three\", \"tre\"), (\"four\", \"quattro\"), (\"five\", \"cinque\")]\n",
     "source_word, target_word = zip(*words)\n",
     "translated_word = transmat.translate(source_word, 5, )"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 8,
    "metadata": {
     "collapsed": false,
     "deletable": true,
@@ -260,17 +260,17 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "word  one  and translated word [u'solo', u'due', u'tre', u'cinque', u'quattro']\n",
-      "word  two  and translated word [u'due', u'tre', u'quattro', u'cinque', u'otto']\n",
-      "word  three  and translated word [u'tre', u'quattro', u'due', u'cinque', u'sette']\n",
-      "word  four  and translated word [u'tre', u'quattro', u'cinque', u'due', u'sette']\n",
-      "word  five  and translated word [u'cinque', u'tre', u'quattro', u'otto', u'dieci']\n"
+      "('word ', 'one', ' and translated word', [u'solo', u'due', u'tre', u'cinque', u'quattro'])\n",
+      "('word ', 'two', ' and translated word', [u'due', u'tre', u'quattro', u'cinque', u'otto'])\n",
+      "('word ', 'three', ' and translated word', [u'tre', u'quattro', u'due', u'cinque', u'sette'])\n",
+      "('word ', 'four', ' and translated word', [u'tre', u'quattro', u'cinque', u'due', u'sette'])\n",
+      "('word ', 'five', ' and translated word', [u'cinque', u'tre', u'quattro', u'otto', u'dieci'])\n"
      ]
     }
    ],
    "source": [
     "for k, v in translated_word.iteritems():\n",
-    "    print \"word \", k, \" and translated word\", v"
+    "    print (\"word \", k, \" and translated word\", v)"
    ]
   },
   {
@@ -280,8 +280,8 @@
     "editable": true
    },
    "source": [
-    "#### part two:\n",
-    "Let's look at some fruit translations. We use English words (apple, orange, grape, banana and mango) as test."
+    "#### Part two:\n",
+    "Let's look at some vocabulary of fruits translation. We use English words (apple, orange, grape, banana and mango) as test."
    ]
   },
   {
@@ -310,7 +310,7 @@
     "source_word, target_word = zip(*words)\n",
     "translated_word = transmat.translate(source_word, 5)\n",
     "for k, v in translated_word.iteritems():\n",
-    "    print \"word \", k, \" and translated word\", v"
+    "    print (\"word \", k, \" and translated word\", v)"
    ]
   },
   {
@@ -320,8 +320,8 @@
     "editable": true
    },
    "source": [
-    "#### part three:\n",
-    "Let's look at some animal translations. We use English words (dog, pig, cat, horse and bird) as test."
+    "#### Part three:\n",
+    "Let's look at some vocabulary of animals translation. We use English words (dog, pig, cat, horse and bird) as test."
    ]
   },
   {
@@ -350,7 +350,7 @@
     "source_word, target_word = zip(*words)\n",
     "translated_word = transmat.translate(source_word, 5)\n",
     "for k, v in translated_word.iteritems():\n",
-    "    print \"word \", k, \" and translated word\", v"
+    "    print (\"word \", k, \" and translated word\", v)"
    ]
   },
   {
@@ -370,7 +370,7 @@
     "editable": true
    },
    "source": [
-    "Testing the creation time, we extracted more word pairs from a dictionary built from Europarl([Europara, en-it](http://opus.lingfil.uu.se/)).we obtain about 20K word pairs and their coresponding word vectors.Or you can download from this.[word_dict.pkl](https://pan.baidu.com/s/1dF8HUX7)"
+    "Testing the creation time, we extracted more word pairs from a dictionary built from Europarl([Europara, en-it](http://opus.lingfil.uu.se/)). We obtain about 20K word pairs and their coresponding word vectors or you can download from this.[word_dict.pkl](https://pan.baidu.com/s/1dF8HUX7)"
    ]
   },
   {
@@ -395,7 +395,7 @@
     "word_dict = \"word_dict.pkl\"\n",
     "with utils.smart_open(word_dict, \"r\") as f:\n",
     "    word_pair = pickle.load(f)\n",
-    "print \"the length of word pair \", len(word_pair)"
+    "print (\"the length of word pair \", len(word_pair))"
    ]
   },
   {
@@ -717,8 +717,8 @@
     "editable": true
    },
    "source": [
-    "The figure shows that the word vectors for English number one to five and the corresponding Italian words uno to cinque have similar geometric arrangements. So the relationship between vector spaces that represent these tow languages can be captured by linear mapping. \n",
-    "If we know the translation of one and four from English to Spanish, we can learn the transformation matrix that can help us to translate five or other numbers."
+    "The figure shows that the word vectors for English number one to five and the corresponding Italian words uno to cinque have similar geometric arrangements. So the relationship between vector spaces that represent these two languages can be captured by linear mapping. \n",
+    "If we know the translation of one to four from English to Italian, we can learn the transformation matrix that can help us to translate five or other numbers to the Italian word."
    ]
   },
   {
@@ -744,7 +744,7 @@
     "en_words_vec = [source_word_vec[item[0]] for item in words]\n",
     "it_words_vec = [target_word_vec[item[1]] for item in words]\n",
     "\n",
-    "# translate the English word five to Spanish\n",
+    "# Translate the English word five to Italian word\n",
     "translated_word = transmat.translate([en_words[4]], 3)\n",
     "print \"translation of five: \", translated_word\n",
     "\n",
@@ -958,7 +958,7 @@
     "editable": true
    },
    "source": [
-    "Let's see some animals word, the figue show that most of words are also share the similar geometric arrangements."
+    "Let's see some animal words, the figue shows that most of words are also share the similar geometric arrangements."
    ]
   },
   {
@@ -1129,7 +1129,7 @@
     "en_words_vec = [source_word_vec[item[0]] for item in words]\n",
     "it_words_vec = [target_word_vec[item[1]] for item in words]\n",
     "\n",
-    "# translate the English word birds to Spanish\n",
+    "# Translate the English word birds to Italian word\n",
     "translated_word = transmat.translate([en_words[4]], 3)\n",
     "print \"translation of birds: \", translated_word\n",
     "\n",
@@ -1327,7 +1327,7 @@
     "editable": true
    },
    "source": [
-    "You probably will see that two kind of different color nodes, one for the English and the other for the Italian. For the translation of word `bird`, we return `top 3` similar words `[u'uccelli', u'garzette', u'iguane']`. We can easily see that the animals' words translation is also convincing as the numbers."
+    "You probably will see that two kind of different color nodes, one for the English and the other for the Italian. For the translation of word `birds`, we return `top 3` similar words `[u'uccelli', u'garzette', u'iguane']`. We can easily see that the animals' words translation is also convincing as the numbers."
    ]
   },
   {