Skip to content

jesper-sk/msc-thesis-ai-imp

Repository files navigation

Code style: black

Implementation for MSc thesis AI

Installing the project

  1. Clone the repository. Make sure to clone the submodule(s) as well by using the --recurse-submodules flag. You can also clone the submodule(s) afterwards by using:

    git submodule update --init
  2. Install the project's dependencies. This project uses pipenv for dependency manegement.

    pipenv install --dev
    1. You can install pipenv if needed by running pip3 install pipenv (pip on Windows)
    2. The --dev flag makes sure all development dependencies are also installed. If you are just planning to run the project and not develop from it, you can omit it.
    3. You can also install all dependencies through pip directly, using the provided requirements.txt file. It is recommended to use a virtual environment nonetheless.
  3. Ensure the necessary project resources are installed.

    1. You can download WordNet 3.0 here. The default location to extract is data/WordNet-3.0.
    2. The WSD Evaluation framework data can be downloaded from here. The default location to extrarct is data/wsdeval/WSD_Evaluation_Framework.
    3. The XL-WSD data can be downloaded from here ("Data"). The default location to extract is data/xl-wsd.
  4. Run the CLI of the project. This can be done through pipenv.

    pipenv run python app

Structure

Implementational code can be found in three places. Firstly, there is the CLI, which can be run through the pipenv run python app command. The following functions are available:

  • vectorise is used to create CWEs from target sentences.
  • prep-ewiser prepares the CoarseWSD-20 dataset to be interpreted by the EWISER model.
  • split-embeddings splits the CoarseWSD-20 embeddings into synset state clouds using ground truth.
  • lemmas extracts all lemmata from a WSD Evaluation Framework variant and saves them as a csv.
  • filter-bookcorpus filters the original BookCorpus set of sentences into those that use a given set of nouns.

Secondly, there are a number of interactive scripts in the app/ directory. These are designed to work in your editor's scientific mode, using "% ##" to specify individual code blocks. This should work at least in VSCode and JetBrains PyCharm.

Thirdly, the EWISER submodule (repos/ewiser) has some interfacing scripts in the bin directory. These are used to interface with the EWISER implementation (which was not made by me, see to the submodule's README).

Running unit tests

Run:

pipenv run python -m unittest discover tests "*.py"

Static analysers in use

  1. Black code formatter
  2. Mypy static type analyser
  3. Flake8 linter
  4. (optional) Pylint linter

Resources

About

The implementation code of my Master's thesis project.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages