-
Clone the repository. Make sure to clone the submodule(s) as well by using the
--recurse-submodules
flag. You can also clone the submodule(s) afterwards by using:git submodule update --init
-
Install the project's dependencies. This project uses
pipenv
for dependency manegement.pipenv install --dev
- You can install pipenv if needed by running
pip3 install pipenv
(pip
on Windows) - The
--dev
flag makes sure all development dependencies are also installed. If you are just planning to run the project and not develop from it, you can omit it. - You can also install all dependencies through
pip
directly, using the providedrequirements.txt
file. It is recommended to use a virtual environment nonetheless.
- You can install pipenv if needed by running
-
Ensure the necessary project resources are installed.
- You can download WordNet 3.0 here. The default location to extract is
data/WordNet-3.0
. - The WSD Evaluation framework data can be downloaded from here. The default location to extrarct is
data/wsdeval/WSD_Evaluation_Framework
. - The XL-WSD data can be downloaded from here ("Data"). The default location to extract is
data/xl-wsd
.
- You can download WordNet 3.0 here. The default location to extract is
-
Run the CLI of the project. This can be done through pipenv.
pipenv run python app
Implementational code can be found in three places. Firstly, there is the CLI, which can be run through the pipenv run python app
command. The following functions are available:
vectorise
is used to create CWEs from target sentences.prep-ewiser
prepares the CoarseWSD-20 dataset to be interpreted by the EWISER model.split-embeddings
splits the CoarseWSD-20 embeddings into synset state clouds using ground truth.lemmas
extracts all lemmata from a WSD Evaluation Framework variant and saves them as a csv.filter-bookcorpus
filters the original BookCorpus set of sentences into those that use a given set of nouns.
Secondly, there are a number of interactive scripts in the app/
directory. These are designed to work in your editor's scientific mode, using "% ##
" to specify individual code blocks. This should work at least in VSCode and JetBrains PyCharm.
Thirdly, the EWISER submodule (repos/ewiser
) has some interfacing scripts in the bin
directory. These are used to interface with the EWISER implementation (which was not made by me, see to the submodule's README).
Run:
pipenv run python -m unittest discover tests "*.py"
- Black code formatter
- Mypy static type analyser
- Flake8 linter
- (optional) Pylint linter