Implementation for MSc thesis AI

Installing the project

Clone the repository. Make sure to clone the submodule(s) as well by using the --recurse-submodules flag. You can also clone the submodule(s) afterwards by using:
```
git submodule update --init
```
Install the project's dependencies. This project uses pipenv for dependency manegement.
```
pipenv install --dev
```
1. You can install pipenv if needed by running pip3 install pipenv (pip on Windows)
2. The --dev flag makes sure all development dependencies are also installed. If you are just planning to run the project and not develop from it, you can omit it.
3. You can also install all dependencies through pip directly, using the provided requirements.txt file. It is recommended to use a virtual environment nonetheless.
Ensure the necessary project resources are installed.
1. You can download WordNet 3.0 here. The default location to extract is data/WordNet-3.0.
2. The WSD Evaluation framework data can be downloaded from here. The default location to extrarct is data/wsdeval/WSD_Evaluation_Framework.
3. The XL-WSD data can be downloaded from here ("Data"). The default location to extract is data/xl-wsd.
Run the CLI of the project. This can be done through pipenv.
```
pipenv run python app
```

Structure

Implementational code can be found in three places. Firstly, there is the CLI, which can be run through the pipenv run python app command. The following functions are available:

vectorise is used to create CWEs from target sentences.
prep-ewiser prepares the CoarseWSD-20 dataset to be interpreted by the EWISER model.
split-embeddings splits the CoarseWSD-20 embeddings into synset state clouds using ground truth.
lemmas extracts all lemmata from a WSD Evaluation Framework variant and saves them as a csv.
filter-bookcorpus filters the original BookCorpus set of sentences into those that use a given set of nouns.

Secondly, there are a number of interactive scripts in the app/ directory. These are designed to work in your editor's scientific mode, using "% ##" to specify individual code blocks. This should work at least in VSCode and JetBrains PyCharm.

Thirdly, the EWISER submodule (repos/ewiser) has some interfacing scripts in the bin directory. These are used to interface with the EWISER implementation (which was not made by me, see to the submodule's README).

Running unit tests

Run:

pipenv run python -m unittest discover tests "*.py"

Static analysers in use

Black code formatter
Mypy static type analyser
Flake8 linter
(optional) Pylint linter

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.vscode		.vscode
app		app
data		data
repos		repos
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation for MSc thesis AI

Installing the project

Structure

Running unit tests

Static analysers in use

Resources

About

Releases

Packages

Contributors 2

Languages

License

jesper-sk/msc-thesis-ai-imp

Folders and files

Latest commit

History

Repository files navigation

Implementation for MSc thesis AI

Installing the project

Structure

Running unit tests

Static analysers in use

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages