Skip to content

Commit

Permalink
Fix typo in translation_matrix notebook (#1598)
Browse files Browse the repository at this point in the history
  • Loading branch information
robotcator authored and menshikh-iv committed Sep 26, 2017
1 parent 33a3ef2 commit 0a2c05d
Showing 1 changed file with 42 additions and 42 deletions.
84 changes: 42 additions & 42 deletions docs/notebooks/translation_matrix.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"editable": true
},
"source": [
"Suppose we are given a setofword pairs and their associated vector representaion $\\{x_{i},z_{i}\\}_{i=1}^{n}$, where $x_{i} \\in R^{d_{1}}$ is the distibuted representation of word $i$ in the source language, and ${z_{i} \\in R^{d_{2}}}$ is the vector representation of its translation. Our goal is to find a transformation matrix $W$ such that $Wx_{i}$ approximates $z_{i}$. In practice, $W$ can be learned by the following optimization prolem:\n",
"Suppose we are given a set of word pairs and their associated vector representaion $\\{x_{i},z_{i}\\}_{i=1}^{n}$, where $x_{i} \\in R^{d_{1}}$ is the distibuted representation of word $i$ in the source language, and ${z_{i} \\in R^{d_{2}}}$ is the vector representation of its translation. Our goal is to find a transformation matrix $W$ such that $Wx_{i}$ approximates $z_{i}$. In practice, $W$ can be learned by the following optimization prolem:\n",
"\n",
"<center>$\\min \\limits_{W} \\sum \\limits_{i=1}^{n} ||Wx_{i}-z_{i}||^{2}$</center>"
]
Expand Down Expand Up @@ -56,7 +56,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {
"collapsed": false,
"deletable": true,
Expand Down Expand Up @@ -86,16 +86,16 @@
"editable": true
},
"source": [
"For this tutorial, we'll be training our model using the English -> Italian word pairs from the OPUS collection. This corpus contains 5000 word pairs. Each pair is a English word and corresponding Italian word.\n",
"For this tutorial, we'll train our model using the English -> Italian word pairs from the OPUS collection. This corpus contains 5000 word pairs. Each word pair is English word with corresponding Italian word.\n",
"\n",
"dataset download: \n",
"Dataset download: \n",
"\n",
"[OPUS_en_it_europarl_train_5K.txt](https://pan.baidu.com/s/1nuIuQoT)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"metadata": {
"collapsed": false,
"deletable": true,
Expand All @@ -115,7 +115,7 @@
"\n",
"with utils.smart_open(train_file, \"r\") as f:\n",
" word_pair = [tuple(utils.to_unicode(line).strip().split()) for line in f]\n",
"print word_pair[:10]"
"print (word_pair[:10])"
]
},
{
Expand All @@ -125,10 +125,10 @@
"editable": true
},
"source": [
"This tutorial uses 300-dimensional vectors of English words as source and vectors of Italian words as target.(those vector trained by the word2vec toolkit with cbow. The context window was set 5 words to either side of the target,\n",
"This tutorial uses 300-dimensional vectors of English words as source and vectors of Italian words as target. (Those vector trained by the word2vec toolkit with cbow. The context window was set 5 words to either side of the target,\n",
"the sub-sampling option was set to 1e-05 and estimate the probability of a target word with the negative sampling method, drawing 10 samples from the noise distribution)\n",
"\n",
"dataset download:\n",
"Download dataset:\n",
"\n",
"[EN.200K.cbow1_wind5_hs0_neg10_size300_smpl1e-05.txt](https://pan.baidu.com/s/1nv3bYel)\n",
"\n",
Expand All @@ -137,7 +137,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {
"collapsed": false,
"deletable": true,
Expand All @@ -152,7 +152,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {
"collapsed": true,
"deletable": true,
Expand All @@ -172,12 +172,12 @@
"editable": true
},
"source": [
"training the translation matrix"
"Train the translation matrix"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"metadata": {
"collapsed": false,
"deletable": true,
Expand All @@ -188,14 +188,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"the shape of translation matrix is: (300, 300)\n"
"('the shape of translation matrix is: ', (300, 300))\n"
]
}
],
"source": [
"transmat = translation_matrix.TranslationMatrix(source_word_vec, target_word_vec, word_pair)\n",
"transmat.train(word_pair)\n",
"print \"the shape of translation matrix is: \", transmat.translation_matrix.shape"
"print (\"the shape of translation matrix is: \", transmat.translation_matrix.shape)"
]
},
{
Expand All @@ -205,7 +205,7 @@
"editable": true
},
"source": [
"Prediction Time: for any given new word, we can map it to the other language space by coputing $z = Wx$, then we find the word whose representation is closet to z in the target language space, using consine similarity as the distance metric."
"Prediction Time: For any given new word, we can map it to the other language space by coputing $z = Wx$, then we find the word whose representation is closet to z in the target language space, using consine similarity as the distance metric."
]
},
{
Expand All @@ -215,13 +215,13 @@
"editable": true
},
"source": [
"#### part one:\n",
"Let's look at some number translation. We use English words (one, two, three, four and five) as test."
"#### Part one:\n",
"Let's look at some vocabulary of numbers translation. We use English words (one, two, three, four and five) as test."
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {
"collapsed": false,
"deletable": true,
Expand All @@ -232,23 +232,23 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:224: UserWarning: The parameter source_lang_vec isn't specified, use the model's source language word vector as default.\n",
"/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:223: UserWarning: The parameter source_lang_vec isn't specified, use the model's source language word vector as default.\n",
" warnings.warn(\"The parameter source_lang_vec isn't specified, use the model's source language word vector as default.\")\n",
"/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:228: UserWarning: The parameter target_lang_vec isn't specified, use the model's target language word vector as default.\n",
"/home/robotcator/PycharmProjects/gensim/gensim/models/translation_matrix.py:227: UserWarning: The parameter target_lang_vec isn't specified, use the model's target language word vector as default.\n",
" warnings.warn(\"The parameter target_lang_vec isn't specified, use the model's target language word vector as default.\")\n"
]
}
],
"source": [
"# the piar is (English, Italian), we can see whether the translated word is right or not \n",
"# The pair is in the form of (English, Italian), we can see whether the translated word is correct\n",
"words = [(\"one\", \"uno\"), (\"two\", \"due\"), (\"three\", \"tre\"), (\"four\", \"quattro\"), (\"five\", \"cinque\")]\n",
"source_word, target_word = zip(*words)\n",
"translated_word = transmat.translate(source_word, 5, )"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"metadata": {
"collapsed": false,
"deletable": true,
Expand All @@ -260,17 +260,17 @@
"name": "stdout",
"output_type": "stream",
"text": [
"word one and translated word [u'solo', u'due', u'tre', u'cinque', u'quattro']\n",
"word two and translated word [u'due', u'tre', u'quattro', u'cinque', u'otto']\n",
"word three and translated word [u'tre', u'quattro', u'due', u'cinque', u'sette']\n",
"word four and translated word [u'tre', u'quattro', u'cinque', u'due', u'sette']\n",
"word five and translated word [u'cinque', u'tre', u'quattro', u'otto', u'dieci']\n"
"('word ', 'one', ' and translated word', [u'solo', u'due', u'tre', u'cinque', u'quattro'])\n",
"('word ', 'two', ' and translated word', [u'due', u'tre', u'quattro', u'cinque', u'otto'])\n",
"('word ', 'three', ' and translated word', [u'tre', u'quattro', u'due', u'cinque', u'sette'])\n",
"('word ', 'four', ' and translated word', [u'tre', u'quattro', u'cinque', u'due', u'sette'])\n",
"('word ', 'five', ' and translated word', [u'cinque', u'tre', u'quattro', u'otto', u'dieci'])\n"
]
}
],
"source": [
"for k, v in translated_word.iteritems():\n",
" print \"word \", k, \" and translated word\", v"
" print (\"word \", k, \" and translated word\", v)"
]
},
{
Expand All @@ -280,8 +280,8 @@
"editable": true
},
"source": [
"#### part two:\n",
"Let's look at some fruit translations. We use English words (apple, orange, grape, banana and mango) as test."
"#### Part two:\n",
"Let's look at some vocabulary of fruits translation. We use English words (apple, orange, grape, banana and mango) as test."
]
},
{
Expand Down Expand Up @@ -310,7 +310,7 @@
"source_word, target_word = zip(*words)\n",
"translated_word = transmat.translate(source_word, 5)\n",
"for k, v in translated_word.iteritems():\n",
" print \"word \", k, \" and translated word\", v"
" print (\"word \", k, \" and translated word\", v)"
]
},
{
Expand All @@ -320,8 +320,8 @@
"editable": true
},
"source": [
"#### part three:\n",
"Let's look at some animal translations. We use English words (dog, pig, cat, horse and bird) as test."
"#### Part three:\n",
"Let's look at some vocabulary of animals translation. We use English words (dog, pig, cat, horse and bird) as test."
]
},
{
Expand Down Expand Up @@ -350,7 +350,7 @@
"source_word, target_word = zip(*words)\n",
"translated_word = transmat.translate(source_word, 5)\n",
"for k, v in translated_word.iteritems():\n",
" print \"word \", k, \" and translated word\", v"
" print (\"word \", k, \" and translated word\", v)"
]
},
{
Expand All @@ -370,7 +370,7 @@
"editable": true
},
"source": [
"Testing the creation time, we extracted more word pairs from a dictionary built from Europarl([Europara, en-it](http://opus.lingfil.uu.se/)).we obtain about 20K word pairs and their coresponding word vectors.Or you can download from this.[word_dict.pkl](https://pan.baidu.com/s/1dF8HUX7)"
"Testing the creation time, we extracted more word pairs from a dictionary built from Europarl([Europara, en-it](http://opus.lingfil.uu.se/)). We obtain about 20K word pairs and their coresponding word vectors or you can download from this.[word_dict.pkl](https://pan.baidu.com/s/1dF8HUX7)"
]
},
{
Expand All @@ -395,7 +395,7 @@
"word_dict = \"word_dict.pkl\"\n",
"with utils.smart_open(word_dict, \"r\") as f:\n",
" word_pair = pickle.load(f)\n",
"print \"the length of word pair \", len(word_pair)"
"print (\"the length of word pair \", len(word_pair))"
]
},
{
Expand Down Expand Up @@ -717,8 +717,8 @@
"editable": true
},
"source": [
"The figure shows that the word vectors for English number one to five and the corresponding Italian words uno to cinque have similar geometric arrangements. So the relationship between vector spaces that represent these tow languages can be captured by linear mapping. \n",
"If we know the translation of one and four from English to Spanish, we can learn the transformation matrix that can help us to translate five or other numbers."
"The figure shows that the word vectors for English number one to five and the corresponding Italian words uno to cinque have similar geometric arrangements. So the relationship between vector spaces that represent these two languages can be captured by linear mapping. \n",
"If we know the translation of one to four from English to Italian, we can learn the transformation matrix that can help us to translate five or other numbers to the Italian word."
]
},
{
Expand All @@ -744,7 +744,7 @@
"en_words_vec = [source_word_vec[item[0]] for item in words]\n",
"it_words_vec = [target_word_vec[item[1]] for item in words]\n",
"\n",
"# translate the English word five to Spanish\n",
"# Translate the English word five to Italian word\n",
"translated_word = transmat.translate([en_words[4]], 3)\n",
"print \"translation of five: \", translated_word\n",
"\n",
Expand Down Expand Up @@ -958,7 +958,7 @@
"editable": true
},
"source": [
"Let's see some animals word, the figue show that most of words are also share the similar geometric arrangements."
"Let's see some animal words, the figue shows that most of words are also share the similar geometric arrangements."
]
},
{
Expand Down Expand Up @@ -1129,7 +1129,7 @@
"en_words_vec = [source_word_vec[item[0]] for item in words]\n",
"it_words_vec = [target_word_vec[item[1]] for item in words]\n",
"\n",
"# translate the English word birds to Spanish\n",
"# Translate the English word birds to Italian word\n",
"translated_word = transmat.translate([en_words[4]], 3)\n",
"print \"translation of birds: \", translated_word\n",
"\n",
Expand Down Expand Up @@ -1327,7 +1327,7 @@
"editable": true
},
"source": [
"You probably will see that two kind of different color nodes, one for the English and the other for the Italian. For the translation of word `bird`, we return `top 3` similar words `[u'uccelli', u'garzette', u'iguane']`. We can easily see that the animals' words translation is also convincing as the numbers."
"You probably will see that two kind of different color nodes, one for the English and the other for the Italian. For the translation of word `birds`, we return `top 3` similar words `[u'uccelli', u'garzette', u'iguane']`. We can easily see that the animals' words translation is also convincing as the numbers."
]
},
{
Expand Down

0 comments on commit 0a2c05d

Please sign in to comment.