This repository is a simple implementation of Poincaré Embeddings for Learning Hierarchical Representations paper introduced by Maximilian Nickel and Douwe Kiela at Facebook AI Research.
This paper introduced an interesting model to learn vector representation of nodes in a graph. It takes a list of relations between nodes such as:
dataset = [[banana fruit], [eatable_fruit fruit]]
Afterward, it attempts to learn the dataset vector representation such that the distance between nodes' vectors accurately represent how close the nodes are in the graph.
The novality of this paper is by introducing a new approach for learning hierarchical representations of the nodes by embedding them into hyperbolic space, or more precisely into an n-dimentional Poincare ball. The reason presented for this is that hyperbolic spaces are more suitable for capturing hierarchical and similarity information of nodes, as opposed to the commonly used Euclidean space. For more insights in understanding the following contents, please refer the paper above.
The model calculates the distances between two nodes' vectors through the following equation:
Where:
u, v are multi-dimentional vectors of any two words in the dataset.
The distances within the Poincare ball changes smoothly with respect to the location of the u and v vectors. This locality property of the Poincare distance is key for finding continous embeddings of hierarchies. For nodes close to the Poincare ball boundary, their distances to other nodes is relatively low in the Euclidean space terms.
The paper mentioned the following equation:
Where:
N(u) is a set of negative examples (nodes not related to the node u). The paper suggests to sample 10 negative examples per positive example for training. This loss function minimizes the distance between connected nodes and maximizes the distances between unconnected nodes.
The paper presented the following equation in order to optimize the model embeddings:
where:
proj(θ) constrain the embeddings to remain within the Poincare ball via the following equation:
This repository contains the following:
- Poincaré embeddings for learning hierarchical representations paper
- Sample data to train at data/*.tsv
- implementation codebase pytorch_scripts.py and prog.py
In order to run and train the model, you have to make sure the following libraries are installed in your python3 version:
- Pytorch
- NLTK
- Matplotlib
Th repo generates a pair of related words list file of (.tsv) extension imported from WordNet library. However, you may generate a list of word pairs file of (.tsv) extension in your own by saving it in the data/ folder to be fed to the model.
Once all is set, refer the prog.py script file and alter the following variable which should match the created file name in the data/ folder. There is several other parameters that you may update in the prog.py file depending on your own preference.
Setting the follwing variable as it shown below, will generate a similar result of the following graph:
- Example (A)
word = 'brown'
- Example (B)
word = 'fruit'
GNU GENERAL PUBLIC LICENSE