This repository aims to analyze the relationships between literary works mentioned in scholarly works. Different from co-occurence approach, we here use word2vec method to messeaur the context similarity between mentions of literary works in scholarly works.
Using gensim, the word2vec model was trained on our scholarly work corpus, which currently 26 texts or 289,746 tokens. More scholarly works will be added in the future (s. Zotero Library).
work_identifier.tsv
: contains all identifiers of literary works and their MiMoText_ID as well as Wikidata_ID, if existing.word2vec.py
: script to train a word2vec model withall.tsv
which cannot be published here due to copy right issue.word2vec.model
: word2vec model trained on MiMoText scholarly work corpus usding gensim.similarity.py
: compute similary between all work pairs, using model.wv.similarity().similarity.tsv
: output ofsimilarity.py
plt_2d.py
: plot all work_identifiers in a 2d spaceplot_2d.html
: output ofplt_2d.py
(download and open it with broswer to explore it)plt_3d.py
: plot all work_identifiers in a 3d spaceplot_3d.html
: output ofplt_3d.py
(download and open it with broswer to explore it)