Skip to content

MiMoText/ontology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ontology

Ontology of the MiMoText project.

Context & scope

The MiMoText ontology is dedicated to the domain of literary history and historiography – a broad and far-reaching field of knowledge which we do not intend to address in its entirety. The scope and coverage of the ontology result from the practical requirements of the realization of our knowledge graph and the associated concerns. An essential particularity of our domain is that we model perspectives rather than facts. Therefore, the possibilities of the linked open data paradigm to let complementary and also contradictory statements coexist are very useful to us. This particularity has the consequence that 'reification' (i.e., statements about statements) is an important modeling dimension that includes, among other things, the unambiguous referencing of each statement, including linkage to the dataset to which it can be traced.

Reference publication:

  • Schöch, Christof, Maria Hinzmann, Julia Röttgermann, Katharina Dietz, and Anne Klee. 2022. “Smart Modelling for Literary History.” International Journal of Humanities and Arts Computing 16 (1): 78–93. https://doi.org/10.3366/ijhac.2022.0278.
  • Hinzmann, Maria, Matthias Bremm, Tinghui Duan, Anne Klee, Johanna Konstanciak, Julia Röttgermann, Christof Schöch, and Joëlle Weis. 2024. “Patterns in Modeling and Querying a Knowledge Graph for Literary History (Preprint).” Zenodo. https://doi.org/10.5281/ZENODO.12080340.

Aims

  1. One goal of the ontology is to structure data in the project "Mining and Modeling Text". Essentially, this project is about extracting heterogeneous data from three different types of sources using quantitative methods (mining) and linking them in such a way (modeling) that they can be aggregated in a queryable knowledge graph and offer interesting perspectives as well as innovative research opportunities for researchers.
  2. We consider the French Enlightenment novel as an example and designed our project and the ontology to be transferable to other domains of the humanities (philosophy, history and cultural studies, art history, etc.).

Structure

We structure the ontology in modules to increase interoperability, reuse possibilities, and reproducibility. The objectives can be differentiated:

  • Clarity & reproducibility: The conceptual decisions as well as our source data are thereby understandable and traceable in individual 'packages'.
  • Transferability: Some challenges are also relevant in other Computational Literary Studies projects as well as in Digital Humanities in general, especially in the linked open data context.
  • Use of MiMoTextBase via the SPARQL-endpoint: In particular, more complex queries require a thorough knowledge of the data model and the imported data. In addition to our tutorial, the ontology should enable the formulation of SPARQL-queries (especially for those who already have SPARQL knowledge) and an interest-specific, targeted understanding of the data model.

Modules

The modules are heterogeneous in scope respectively coverage. Within the modules, all essential modeling decisions are represented and reflected. The functionality of Wikibase as well as conventions and standards in Wikidata are crucial for our modeling decisions. We have taken other standards into account where possible and would like to expand this in the future. There are currently 13 completed modules:

  • Module 1: theme
  • Module 2: space
  • Module 3: narrative form
  • Module 4: literary work
  • Module 5: author
  • Module 6: mapping
  • Module 7: referencing
  • Module 8: versioning & publication
  • Module 9: terminology
  • Module 10: bibliography
  • Module 11: scholarly work
  • Module 12: literary work analysis
  • Module 13: federation These modules are related to and based on two pilot projects (1 & 2), a statement type that is central to our domain (3), our two central entity types or 'classes' (4 and 5), two further modules that represent central challenges and our approaches to solving them (6-7), as well as one for integrating the different source types including data sets (8) and a module that represents the modeling of our multilingual controlled vocabularies and depicts terminological decisions (9). Two further modules address more detailed the different source types of MiMoText: a module on our main bibliographic source (10) and a module for the scholarly work data (11). Module 12 concerns the analysis of literary works (topic modeling, sentiment analysis, stylometry) and Module 13 represents the modulation decisions around bidirectional federation with Wikidata. (In the visualizations of the ontology version 1.0, in all modules those Q-items and properties are represented for which data are already available in the MiMoTextBase. In some cases we have highlighted extension plans for the near future in gray.)

Module overview and visualizations

The visualizations were created using the software VUE (= Visual Understanding Environment), an open source project of Tufts University.

Classes and properties

Visualization of classes and properties

Classes

(to be updated here, see overview above)

authorbibliographyconceptconcept narrative formcontrolled vocabularydata setentityfictional prosehumangenreitem(class)literary workmatching tableowl:Thingproperty classpublicationscholarly workspatial vocabularythematic conceptthematic vocabularytopictopic labels and concepts (11-2020)topic modelvocabulary narrative form

Properties

(including qualifiers)

(to be updated here, see overview above)

aboutauthorauthor name stringauthor ofBGRF IDcharacters stringBGRF_plot_themeBGRF_style_intentionalityclose matchdescribed at URLdescribed by sourcedistribution format stringequivalent classequivalent propertyexact matchfull work available atgenreinstance ofnamenarrative formnarrative form stringnarrative locationnarrative location stringnumber of pages stringoccupationpart ofpublication datepublication date stringplace of publicationplace of publication stringreference URLrelated torepresented bystated insubclass ofsubproperty oftitle

Modeling approach & decisions

We are not attempting to model the entire domain of literary history and historiography deductively, but rather based on our purpose to connect inductively the different data source types. Where possible and necessary, for example in the case of the narrative form, we have conceptually addressed the corresponding theoretical concepts and strived for a mediation that increases plausibility and acceptance among literary scholars. Each module, which has only been visualized so far, is explained and more detail in the various decisions made in the modeling process.

Infrastructure & prefixes

Open Science

For the provision of data, we follow open science principles, such as the publication of FAIR data in open access as well as the use of open source software – in particular Wikibase. We created a custom bot using the Python library Pywikibot to import and update the RDF triples into our Wikibase instance from TSV files. Towards the end of the project, we also used QuickStatements to import data.

Wikibase ecosystem

We understand our MiMoTextBase as part of the Wikibase ecosystem with the decision for a local Wikibase instance. The infrastructure of a local Wikibase has a significant influence on the data model and therefore also on the query possibilities.

Prefixes

Regarding the prefixes, we have decided to stay within the Wikidata logic of the labeling to increase usability for those who are already familiar with the Wikidata prefixes (e.g. 'mmd' for the 'entities' with the prefix 'wd', 'mmdt' for the 'properties' instead of 'wdt').

For further prefixes and a visualization of the Wikidata / Wikibase data model see: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks#/media/File:SPARQL_data_representation.png

Linking with the Wikidata-Graph

In the context of linking to the Wikidata cloud, we currently have realized several steps in the spirit of the 'Wikibase ecosystem'. The interview with Wikimedia Germany summarizes our main steps towards bidirectional federation between MiMoTextBase and Wikidata.

  • mapping of thematic and spatial concepts in our controlled vocabularies (done).
  • mapping of properties and classes (done)
  • mapping of author and work items (done)
  • creation of a MiMoText ID in Wikidata (done)
  • linking the literary works we could map from Wikidata to the work items in our MiMoTextBase (done)
  • importing the work and author items not yet available in Wikidata into Wikidata and also linking them to our MiMoTextBase. (done)
  • request for whitelisting of MiMoTextBase as SPARQL endpoint queryable from Wikidata (done)

Beyond that we would also be happy to have the opportunity to link our data in the sense of Linked Open Data to further ressources. For some authority data we have already considered this (VIAF, GND e.g.)

Formal representation

This repository addresses so far mainly the conceptual representation. The ontology is part of the MiMoTextBase and downloadable as an RDFdump for further use in a local Wikibase. We are thinking about making the overall ontology as well as the individual modules available as OWL files, also depending on feedback about possible usage scenarios. Flexible visualization of the ontology would be one of the benefits.

Further interoperability & reuse

We would like to further increase interoperability by a formal representation for different user groups: a) For users within the wikiverse, this will be done via 'entity schemes'. b) In the spirit of the general W3C standard, this will involve a representation of the ontology in OWL created with the ontology editor Protégé. With regard to reuse and mapping, we have so far focused on Wikidata, which we see as a kind of 'hub'. In the development of the ontology, we have explored other projects in or neighboring our domain and tried to integrate solution approaches, if they were transferable, into our modeling (e.g. ArtBase, ONAMA, Enslaved.org, Krater/O4DH. The ELTeC project is particularly related, insofar as it also involves modeling novel data in Wikidata (Nešić et al. 2021). However, while in European Literary Text Collection (ELTeC) the differentiation between the literary work and different edition versions is very important, we have decided to put the literary work in the center and to avoid further differentiations within our MiMoTextBase in order to reduce complexity.

Reification & inference

We have so far focused mainly on referencing (see referencing module) and conceptualized to use simple qualifiers (e.g. counting text occurrences with the property "occurence in text"). We have not yet made use of all the potential of the Wikibase infrastructure, for example, for ranking statements or modeling different reliabilities of statements. However, we have discussed this in the context of ranking topics for more precise modeling of thematic statements based on topic modeling. The question of inferring statements from other statements (implicit statements) is different within the data model of Wikibase (and our ontology working with it) than in ontologies that follow the OWL standards more strictly and in particular make a clear separation of classes and instance data ('individuals'). This separation does not exist in the Wikibase ecosystem, which has consequences for inferences. Nevertheless, inference is of course possible, also insofar as there are efforts to increase the interoperability of the Wikibase data model.

Further information

See also:

Community

An ontology is not an ontology as long as it is only used in one project. We therefore see it rather as an ontology proposal. We would like to discuss our modeling decisions in exchange with colleagues from Computational Literary Studies as well as other fields of Digital Humanities in order to jointly develop the ontology for the community, i.e. a data model that can structure information in different projects and link them in the spirit of Linked Open Data. We would also like to get in closer contact with literary scholars to extend the modeling of the relevant types of statements we have identified.

Licence (CC-0)

All data modeling is in the public domain and can be reused without restrictions. We don’t claim any copyright or other rights. If you use the ontology, for example in research or teaching, please reference this ontology using the citation suggestion below.

Citation Suggestion

MiMoText ontology, edited by Maria Hinzmann, with contributions from Matthias Bremm, Tinghui Duan, Anne Klee, Johanna Konstanciak Julia Röttgermann, Christof Schöch and Moritz Steffes. Release v1.0.0. Trier: TCDH, 2023. URL: https://github.com/MiMoText/ontology

Bibliography

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages