Skip to content

MiMoText/lit2vec

Repository files navigation

lit2vec

This repository aims to analyze the relationships between literary works mentioned in scholarly works. Different from co-occurence approach, we here use word2vec method to messeaur the context similarity between mentions of literary works in scholarly works.

Using gensim, the word2vec model was trained on our scholarly work corpus, which currently 26 texts or 289,746 tokens. More scholarly works will be added in the future (s. Zotero Library).

Table of Contents

  • work_identifier.tsv: contains all identifiers of literary works and their MiMoText_ID as well as Wikidata_ID, if existing.
  • word2vec.py: script to train a word2vec model with all.tsv which cannot be published here due to copy right issue.
  • word2vec.model: word2vec model trained on MiMoText scholarly work corpus usding gensim.
  • similarity.py: compute similary between all work pairs, using model.wv.similarity().
  • similarity.tsv: output of similarity.py
  • plt_2d.py: plot all work_identifiers in a 2d space
  • plot_2d.html: output of plt_2d.py (download and open it with broswer to explore it)
  • plt_3d.py: plot all work_identifiers in a 3d space
  • plot_3d.html: output of plt_3d.py (download and open it with broswer to explore it)

Screenshot of plots

Screenshot of 2d plot Screenshot of 3d plot

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published