-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.json
1 lines (1 loc) · 341 KB
/
index.json
1
[{"categories":[""],"contents":"Time: all day Presenters: Emilie Pasche, Julien Gobeill, Donat Agosti, Lyubomir Penev, Quentin Groom, Teodor Georgiev, Esteban Gaillac, Alexandre Flament, Déborah Caucheteur, Pierre-André Michel, Marie Kolsch, Anaïs Mottaz, Patrick Ruch\nOne Health is a comprehensive and unified approach that recognises the close connection between the health of people, animals and whole ecosystems. While life and health sciences are widely represented in digital libraries such as PubMed Central, articles focussing on biodiversity, ecology or environmental sciences are relatively marginal. Addressing this critical gap, we introduce a novel Research Infrastructure named “BiodiversityPMC”, leveraging the Swiss Institute of Bioinformatics Literature Services (SIBiLS), which are already mirroring MEDLINE (over 36 million abstracts), PubMed Central contents (over 6 million full-text articles) and supplementary data files associated to scientific publications (over 6 million supplementary files including OCRized images and tables). The coverage of SIBiLS is expanded to encompass a broader range of biodiversity-related content, including environmental sciences and ecology. This involves indexing half a million taxonomic treatments and articles harvested from Plazi, and a growing set of full-text articles from journals in the field (e.g., Pensoft, European Journal of Taxonomy), which are not included into the original PubMed Central. To ensure comprehensive and standardized access, the contents are normalized using a large collection of life sciences terminologies and ontologies. Each instance of a term (or its synonym) is assigned a unique accession number, to support a semantically richer search experience. Of particular interest for the biodiversity communities, SIBiLS contents are normalized using ENVO (Environmental Ontology), ROBI (Relation Ontology Biotic Interactions), an enriched subset of RO (Relation Ontology), and LOTUS, a natural products database. Further, taxonomic names are normalized using both the NCBI Taxonomy and the Open Tree of Life, which include names from the Catalogue of Life. The resulting data graph contains more than 10 billion normalized descriptors. Access to BiodiversityPMC is facilitated through a new graphic user interface, an OpenAPI and a SPARQL endpoint. BiodiversityPMC not only offers traditional search methods (access via keyword search), but also introduces innovative approaches for navigating the complex landscape of health and environmental sciences. An original question-answering interface can help provide new perspectives over the literature. BidoversityPMC stands out as a valuable resource proficient in addressing a wide spectrum of questions related to biodiversity in the broad sense.\n","id":0,"permalink":"https://plazi.org/posts/2024/biodiversitypmc-the-one-health-library/","tags":["Events"],"title":"BiodiversityPMC: The One Health Library (Poster)"},{"categories":[""],"contents":"Time: 2:30PM CEST Presenters: Donat Agosti\nResearch results are published as scientific articles. They represent an intricate network of citations and facts, representing the existing knowledge, as billions of statements. In biodiversity, this includes a corpus of an estimated 500 million pages. A small but growing part is published in a semantically enhanced open access format, but the overwhelming part is behind multiple barriers, from being print only to closed access portable document formats (PDFs). To make use of the emerging AI tools, this corpus needs to be made available in a machine actionable way. At least part of it has to be curated to serve as training material for AI and machine learning. The steps towards fully machine actionable data will be described in this presentation. Starting with print (), print with metadata (), to scan-based PDF with metadata (), text-based PDF with metadata (), ASCI – standard structured XML with metadata (), ASCI – XML with semantic enhancements and metadata () and ending with ASCI – XML with semantic enhancements, attributes and metadata (****). To serve the wider community, the publications have to be open access, infrastructures need to be expanded such as the Biodiversity Literature Repository to allow FAIRizing of data, including specific blocks of text such as taxonomic treatments, recommendations or illustrations, and vocabularies have to be developed and maintained to enable semantic enhancement in cases where they do not exist.\n","id":1,"permalink":"https://plazi.org/posts/2024/a-7-scheme-for-getting-fair-publications/","tags":["Events"],"title":"How do we get there? a 7* scheme of getting open FAIR publications"},{"categories":[""],"contents":"Presenter: Donat Agosti\nSeveral ELIXIR communities are designing text analytic solutions to perform various data curation tasks. The development of pre-trained and large language models is also helping non expert communities developing successful applications (e.g., triage, named-entity recognition, chatbots, search engines). The span of tasks can be relatively heterogeneous, ranging from well-established and common natural language processing tasks (e.g., named entity recognition, automatic text categorization) up to more complex curation support tasks (e.g., triage or question-answering, bi-directional linking between curated databases and publications). There is also a growing focus on converting and annotating publications, to expand the pool of FAIR data available for downstream tasks, as illustrated by the efforts of the Elixir Data Platform and several node and ELIXIR core data resources (e.g., BioStudies, EuropePMC).\nThe first part of the workshop will aim at organizing a forum where communities would report, in short pitch-style presentations, on some experiments or challenges they face with the development of scalable automation methods and their application to biocuration, especially regarding genomic, taxonomic and metabolomic data. Biomedical named entity recognition approaches and literature search will be discussed with an emphasis on data exchange standards (e.g. JATS, BIOC, IOB, TaxPub, RO-Crate) and pre-trained large language models. A particular attention will also be paid to turning the long tail of supplementary data into FAIR digital objects. The discussion will be led by representatives from several nodes, including the ELIXIR-UK, CH, BE and LU nodes. It will enable exchange between diverse biological and biomedical communities and focus groups such as Biodiversity, Plant Sciences, Metabolomics, Rare Diseases, Health Data and Biocuration, and more technological groups like Machine Learning. Members of these various communities and focus groups will thus have the opportunity to share their respective approaches, results and challenges.\nThe second part of the forum will be mainly based on informal discussions, with the objective to tentatively help setting priority areas that ELIXIR could focus on in future developments related to data accessibility and federated data management (science and technology tiers of ELIXIR\u0026rsquo;s strategic priorities) and biocuration. This discussion is important for determining how ELIXIR can best support and augment the ongoing efforts on text mining, data accessibility and biocuration. The goal is to develop a coherent strategy that aligns with the evolving needs of the scientific community (e.g., the International Biocuration Society) and leverages the latest advances in text analytics and language modelling.\n","id":2,"permalink":"https://plazi.org/posts/2024/treatmentbank-an-automated-workflow/","tags":["Events"],"title":"TreatmentBank: an automated workflow to convert biodiversity PDFs as input into Biodiversity PMC"},{"categories":[""],"contents":"Presenter: Julia Giora, Donat Agosti\nWe plan to share our experiment to use Plazi treament files to get new taxon names into TaxonWorks. Plazi will give us their perspective on what we did and ideas for future collaboration and what they plan to work on in the coming years. New to Plazi? See this treatment for an example of how Plazi gets data out of publications into a format we can take advantage of for uploading to our database(s) via files or the Plazi API.\n","id":3,"permalink":"https://plazi.org/posts/2024/from-plazi-to-taxonworks/","tags":["Events"],"title":"From Plazi to TaxonWorks"},{"categories":null,"contents":" Plazi production on 23. December 2023 Source\n1992 was a seminal point in the fight for biodiversity. The Rio Earth Summit put the loss of biodiversity on the top of the political agenda after pressure from scientists. With this the ball was back with the scientists to provide scientific evidence and the tools to measure the loss of diversity from genes to species to ecosystems. Required is a standardized report documenting change that can ultimately be used in the political dialogue, balancing the many interests covering the environment. An example off is offered by the assessments of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES).\n30 years later, the state of biodiversity continues to be severe. Of an estimated 8 million (multicell) plant and animal species, around 1 million are threatened with extinction, and over 10% of genetic diversity of plants and animals may have been lost over the past 150 years1.\nOver half of the world\u0026rsquo;s GDP (about $58 trillion) is significantly reliant on nature and biodiversity. The World Economic Forum ranks the biodiversity crisis as a top 10 global economic risk, highlighting the urgent need for action.\nThis is happening while we still don\u0026rsquo;t know how many species there are on Earth. In fact, we not only do not know how many species live on Earth. We even do not know how many we know, nor what we know about those we know. Most of the content in our daily expanding corpus, now estimated at 500 million pages, consisting of valuable information, is hidden in printed publications in our libraries or behind paywalls.\nThings that have changed A lot, however, has also been achieved since 1992 that gives us hope. The progress is particularly heartening during the last ten years since the launch of the Bouchout Declaration on Open Biodiversity Knowledge Management.\nFunders like the European Commission, Swiss Science Foundation and the Arcadia Fund require open access to data generated through their programs. Large open access based research infrastructures like the Distributed System of Scientific Collections (DiSSCo) provide digital access to their physical specimens. Global infrastructures like the Global Biodiversity Information Facility (GBIF) aggregate over one billion occurrences under an open license. A lot of progress has also been achieved with regards to the tools for measuring biodiversity. DNA based sampling is becoming the dominating method for sampling species, and citizen science networks such as the Cornell Lab of Ornithology and iNaturalist provide a huge number of observations adding up to 5 million species names. Remote sensing now covers every corner of the world with data down to a few meters resolution, adding environmental information to each observation.\nOrganize our knowledge and make it accessible When we collect information about plants or animals, we organize it by adding a unique identifier, using two main methods:\nEither giving them a taxonomic name in the form of a Latin binominal scientific name (if identifiable) as a unique ID. Or by adding a numeric code. This is applied especially for those species recognized by molecular methods that don’t have a taxonomic name yet. While the numeric codes work well in the digital world, the scientific names can be problematic.\nSpecimens are identified as belonging to a particular species, whereas the taxonomic name serves as the gateway to all the published knowledge about it.\nHowever, taxonomic names are a bit tricky to use in the modern, digital world of information. The name might mean different things to different experts, it can change over time with new research, it needs experts\u0026rsquo; knowledge to decipher the taxonomic name and to find the knowledge. It is currently not easy for machines to explore the data behind the name. You might only find a citation of a publication, but not the actual data.\nPlazi’s Mission The goal of Plazi is to bridge this gap by promoting open access to taxonomic data hidden in scientific publications. Plazi converts taxonomic publications into digital, accessible knowledge: Understandable by humans and machines at any time, any place – for everybody.\nPlazi develops tools and services that can find, convert, and store data from these publications and make them easy to use. We also support the creation of clear and consistent vocabularies, explore new ways of publishing, and provide resources to help converting scientific papers into a more accessible format. Ultimately, we want to build a global network of resources related to biodiversity, connecting information about species in a way that is easy to understand and use.\n850K Taxonomic Treatments\nWhat we have achieved Over the last 15 years, Plazi built two research Infrastructures (Treatmantbank, Biodiversity Literature Repository together with Zenodo) recognized by the European Union, is spearheading the building of the biodiversity PMC together with SIBiLS and Pensoft, catalyzed access to data in publications to over 60 journals, liberated 850,000 taxonomic treatments from 620,000 taxa2, 1,450,000 material citations.\nThe geographic distribution of the collecting locations of material citations extracted from publications. Scientists around the world are contributing to the goal of building a baseline dataset of the species we know, what we know about them and identify them, to their behavior, biotic interactions or distribution. Plazi contributes online learning resources and trainings to grow the community and get scientists involved to contribute to open up data.\nPlazi has been committed to one goal since its foundation in 2008: Providing open access to this knowledge, developing and maintaining infrastructure together with other partners, and promoting access for example through the co-organization of the Bouchout Declaration on Open Biodiversity Knowledge Management which will celebrate in 2024 its tenth anniversary.\nOver EUR 3 millions have been raised, including a generous support by the Arcadia Fund, to build and maintain this infrastructure.\nNext steps: What needs to be done With 15 years of activity - a long life for what started as a project - Plazi has shown what can be achieved towards open access to biodiversity knowledge.\nWhat is now needed is to expand this activity to convince funders, publishers, editors and authors to create digital accessible knowledge as part of their ongoing publishing activity. On the other hand, we need scaling up building tools to convert the huge corpus of legacy publication, and infrastructures to provide continuous access to and power to mine its content.\nThis enormously rich digital accessible knowledge will allow its use beyond biodiversity, for example by its seamless integration with biomedical knowledge to understand possible viral spillovers. It will contribute to building the reference catalog of life needed to link all knowledge about life. Finally, it will democratize science by providing equal, unlimited access to data that is mainly collected in the South and housed in the North, and allow using artificial intelligence to mine and create new insides and wealth.\nThe Biodiversity Literature Repository at Zenodo, CERN, serves as a sustainable repository for data and annotations derived from publications. The BiodiversityPMC combines publications from the life sciences with those from the fields of biodiversity and nature conservation. The data is used by the Global Biodiversity Information Facility (GBIF) to link observations with the research results derived from them. The data flows into the checklist bank to expand the Catalogue of Life. TreatmentBank and new publication workflows generate a constant stream of new semantically enhanced publications and legacy data. All these institutions, products and platforms - and more - work with data from Plazi, among others. Plazi is well connected in this network that has formed around these endeavors over the past years and has established itself as a central data broker. Earth beyond six of nine planetary boundaries\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nA \u0026ldquo;taxa\u0026rdquo; (plural of \u0026ldquo;taxon\u0026rdquo;) refers to a group of one or more organisms that are classified together as a unit in biological taxonomy.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","id":4,"permalink":"https://plazi.org/15-years/","tags":null,"title":"15 years of discovering known biodiversity"},{"categories":null,"contents":" Plazi production on 23. December 2023 Source\n1992 was a seminal point in the fight for biodiversity. The Rio Earth Summit put the loss of biodiversity on the top of the political agenda after pressure from scientists. With this the ball was back with the scientists to provide scientific evidence and the tools to measure the loss of diversity from genes to species to ecosystems. Required is a standardized report documenting change that can ultimately be used in the political dialogue, balancing the many interests covering the environment. An example off is offered by the assessments of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES).\n30 years later, the state of biodiversity continues to be severe. Of an estimated 8 million (multicell) plant and animal species, around 1 million are threatened with extinction, and over 10% of genetic diversity of plants and animals may have been lost over the past 150 years1.\nOver half of the world\u0026rsquo;s GDP (about $58 trillion) is significantly reliant on nature and biodiversity. The World Economic Forum ranks the biodiversity crisis as a top 10 global economic risk, highlighting the urgent need for action.\nThis is happening while we still don\u0026rsquo;t know how many species there are on Earth. In fact, we not only do not know how many species live on Earth. We even do not know how many we know, nor what we know about those we know. Most of the content in our daily expanding corpus, now estimated at 500 million pages, consisting of valuable information, is hidden in printed publications in our libraries or behind paywalls.\nThings that have changed A lot, however, has also been achieved since 1992 that gives us hope. The progress is particularly heartening during the last ten years since the launch of the Bouchout Declaration on Open Biodiversity Knowledge Management.\nFunders like the European Commission, Swiss Science Foundation and the Arcadia Fund require open access to data generated through their programs. Large open access based research infrastructures like the Distributed System of Scientific Collections (DiSSCo) provide digital access to their physical specimens. Global infrastructures like the Global Biodiversity Information Facility (GBIF) aggregate over one billion occurrences under an open license. A lot of progress has also been achieved with regards to the tools for measuring biodiversity. DNA based sampling is becoming the dominating method for sampling species, and citizen science networks such as the Cornell Lab of Ornithology and iNaturalist provide a huge number of observations adding up to 5 million species names. Remote sensing now covers every corner of the world with data down to a few meters resolution, adding environmental information to each observation.\nOrganize our knowledge and make it accessible When we collect information about plants or animals, we organize it by adding a unique identifier, using two main methods:\nEither giving them a taxonomic name in the form of a Latin binominal scientific name (if identifiable) as a unique ID. Or by adding a numeric code. This is applied especially for those species recognized by molecular methods that don’t have a taxonomic name yet. While the numeric codes work well in the digital world, the scientific names can be problematic.\nSpecimens are identified as belonging to a particular species, whereas the taxonomic name serves as the gateway to all the published knowledge about it.\nHowever, taxonomic names are a bit tricky to use in the modern, digital world of information. The name might mean different things to different experts, it can change over time with new research, it needs experts\u0026rsquo; knowledge to decipher the taxonomic name and to find the knowledge. It is currently not easy for machines to explore the data behind the name. You might only find a citation of a publication, but not the actual data.\nPlazi’s Mission The goal of Plazi is to bridge this gap by promoting open access to taxonomic data hidden in scientific publications. Plazi converts taxonomic publications into digital, accessible knowledge: Understandable by humans and machines at any time, any place – for everybody.\nPlazi develops tools and services that can find, convert, and store data from these publications and make them easy to use. We also support the creation of clear and consistent vocabularies, explore new ways of publishing, and provide resources to help converting scientific papers into a more accessible format. Ultimately, we want to build a global network of resources related to biodiversity, connecting information about species in a way that is easy to understand and use.\n850K Taxonomic Treatments\nWhat we have achieved Over the last 15 years, Plazi built two research Infrastructures (Treatmantbank, Biodiversity Literature Repository together with Zenodo) recognized by the European Union, is spearheading the building of the biodiversity PMC together with SIBiLS and Pensoft, catalyzed access to data in publications to over 60 journals, liberated 850,000 taxonomic treatments from 620,000 taxa2, 1,450,000 material citations.\nThe geographic distribution of the collecting locations of material citations extracted from publications. Scientists around the world are contributing to the goal of building a baseline dataset of the species we know, what we know about them and identify them, to their behavior, biotic interactions or distribution. Plazi contributes online learning resources and trainings to grow the community and get scientists involved to contribute to open up data.\nPlazi has been committed to one goal since its foundation in 2008: Providing open access to this knowledge, developing and maintaining infrastructure together with other partners, and promoting access for example through the co-organization of the Bouchout Declaration on Open Biodiversity Knowledge Management which will celebrate in 2024 its tenth anniversary.\nOver EUR 3 millions have been raised, including a generous support by the Arcadia Fund, to build and maintain this infrastructure.\nNext steps: What needs to be done With 15 years of activity - a long life for what started as a project - Plazi has shown what can be achieved towards open access to biodiversity knowledge.\nWhat is now needed is to expand this activity to convince funders, publishers, editors and authors to create digital accessible knowledge as part of their ongoing publishing activity. On the other hand, we need scaling up building tools to convert the huge corpus of legacy publication, and infrastructures to provide continuous access to and power to mine its content.\nThis enormously rich digital accessible knowledge will allow its use beyond biodiversity, for example by its seamless integration with biomedical knowledge to understand possible viral spillovers. It will contribute to building the reference catalog of life needed to link all knowledge about life. Finally, it will democratize science by providing equal, unlimited access to data that is mainly collected in the South and housed in the North, and allow using artificial intelligence to mine and create new insides and wealth.\nThe Biodiversity Literature Repository at Zenodo, CERN, serves as a sustainable repository for data and annotations derived from publications. The BiodiversityPMC combines publications from the life sciences with those from the fields of biodiversity and nature conservation. The data is used by the Global Biodiversity Information Facility (GBIF) to link observations with the research results derived from them. The data flows into the checklist bank to expand the Catalogue of Life. TreatmentBank and new publication workflows generate a constant stream of new semantically enhanced publications and legacy data. All these institutions, products and platforms - and more - work with data from Plazi, among others. Plazi is well connected in this network that has formed around these endeavors over the past years and has established itself as a central data broker. Earth beyond six of nine planetary boundaries\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nA \u0026ldquo;taxa\u0026rdquo; (plural of \u0026ldquo;taxon\u0026rdquo;) refers to a group of one or more organisms that are classified together as a unit in biological taxonomy.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","id":5,"permalink":"https://plazi.org/posts/2023/12/15-years/","tags":null,"title":"15 years of discovering known biodiversity"},{"categories":[""],"contents":"Many of the material citations extracted by Plazi contain geo-coordinates which opens up the way to searching for them by various geographic features along with the non-geographic attributes such as taxa. It is now possible to search for images that are a part of the treatments by ecoregions, biomes and realms.\nEcoregions, biomes and realms from Olson, et. al.\nThe World Wildlife Fund (WWF) defines terrestrial ecoregions as follows:\nTerrestrial Ecoregions of the World (TEOW) is a biogeographic regionalization of the Earth\u0026rsquo;s terrestrial biodiversity. Our biogeographic units are ecoregions, which are defined as relatively large units of land or water containing a distinct assemblage of natural communities sharing a large majority of species, dynamics, and environmental conditions. … Ecoregions represent the original distribution of distinct assemblages of species and communities.\n\u0026ndash; David M. Olson, et. al. 2011. Terrestrial Ecoregions of the World: A New Map of Life on Earth: A new global map of terrestrial ecoregions provides an innovative tool for conserving biodiversity, BioScience, Volume 51, Issue 11, November 2001, Pages 933–938, https://doi.org/10.1641/0006-3568(2001)051[0933:TEOTWA]2.0.CO;2\nIt is now possible to search for images on Ocellus by ecoregions and biomes. For example, look at the following queries:\nimages from treatments in the ecoregion called 'Humid Pampas'\nimages from treatments in the biome 'Temperate Grasslands, Savannas and Shrublands'\nSynonyms of biomes are also supported. For example, \u0026lsquo;Temperate Grasslands, Savannas and Shrublands\u0026rsquo; are known as \u0026lsquo;pampas\u0026rsquo; in South America, and searching for \u0026lsquo;pampas\u0026rsquo; gives the same result as above.\nAll this is possible because much of our materialCitations data contains geolocation extracted from treaments plus, a new ecoregions database download in ESRI Shapefile format that includes information on biomes and realms. The data is from RESOLVE and described in the updated Eric Dinerstein, et al. 2017. An Ecoregion-Based Approach to Protecting Half the Terrestrial Realm, BioScience, Volume 67, Issue 6, June 2017, Pages 534–545, https://doi.org/10.1093/biosci/bix014. As shown in the figure above, the data contains 847 terrestrial ecoregions divided into 14 biomes and eight realms.\nThe ecoregions data has been prepared for and included in Zenodeo API that powers Ocellus. Below are a couple of examples of queries to Zenodeo that return the data as JSON.\n$ curl https://test.zenodeo.org/v3/ecoregions { \u0026#34;item\u0026#34;: { … \u0026#34;result\u0026#34;: { \u0026#34;count\u0026#34;: 847, \u0026#34;records\u0026#34;: [ { \u0026#34;id\u0026#34;: 1, \u0026#34;eco_name\u0026#34;: \u0026#34;Adelie Land tundra\u0026#34;, \u0026#34;biome_name\u0026#34;: \u0026#34;Tundra\u0026#34; }, { \u0026#34;id\u0026#34;: 2, \u0026#34;eco_name\u0026#34;: \u0026#34;Admiralty Islands lowland rain forests\u0026#34;, \u0026#34;biome_name\u0026#34;: \u0026#34;Tropical \u0026amp; Subtropical Moist Broadleaf Forests\u0026#34; }, … ] }, … } $ curl https://test.zenodeo.org/v3/biomes { \u0026#34;item\u0026#34;: { … \u0026#34;result\u0026#34;: { \u0026#34;count\u0026#34;: 15, \u0026#34;records\u0026#34;: [ { \u0026#34;id\u0026#34;: 1, \u0026#34;biome_name\u0026#34;: \u0026#34;Tundra\u0026#34;, \u0026#34;synonym\u0026#34;: \u0026#34;Tundra\u0026#34; }, { \u0026#34;id\u0026#34;: 2, \u0026#34;biome_name\u0026#34;: \u0026#34;Tropical and Subtropical Moist Broadleaf Forests\u0026#34;, \u0026#34;synonym\u0026#34;: \u0026#34;Tropical and Subtropical Moist Broadleaf Forests\u0026#34; }, … ] }, … } There is still much to be done. For starters, we need to add more synonyms to the biomes. Second, the synonyms need to identify biomes geographically. For example, right now \u0026lsquo;pampas\u0026rsquo; is a synonym for all of \u0026lsquo;Temperate Grasslands, Savannas and Shrublands\u0026rsquo;, no matter where in the world they may be. But a user in S America might expect only the regions in S America when searching for \u0026lsquo;pampas\u0026rsquo;. For this, a new, more detailed biomes map has to be built with synonyms as an attribute. Third, selection of ecoregions or biomes has to be easier than it is now. And finally, the queries have to be faster.\n","id":6,"permalink":"https://plazi.org/posts/2023/12/searching-by-ecoregions/","tags":["news"],"title":"Searching by ecoregions and biomes"},{"categories":[""],"contents":"Presenter: Aja Sherman\n","id":7,"permalink":"https://plazi.org/posts/2023/05/harmonizing-taxonomic-resources/","tags":["Events"],"title":"Harmonizing taxonomic resources is necessary for novel insights into bat roosting dataset"},{"categories":[""],"contents":"Presenter: Julia Giora\nSince 2008, swiss-based not-for-profit organization Plazi has been supporting and promoting the development of persistent and openly accessible digital taxonomic literature. To achieve this goal, Plazi makes use of in-house softwares for data mining and extraction from taxonomic publications, along with other partner institution tools and platforms, to liberate data on animals, plants, fungi, and more. In its mission to make taxonomic data FAIRly available to the community, Plazi has developed sets of training material and courses, which enables taxonomists, collection curators, students, technicians and others to participate in the process of taxonomic data liberation. The participation of several different members of the community is key as data requires deep curation, often very specific to a particular field. Most recently, Plazi led a virtual 2-day workshop as part of the MOBILISE ACTION in Europe, along with two 4-day in-person workshops in Brazil and South Africa. Participants are issued certificates which entitle them to extract data on their own, thus multiplying the output of FAIR data using Plazi’s workflow. Here, we present a summary of the current status of said training tools/material, the next steps in development, and how they can help more and more taxonomists, or enthusiasts, liberate data. See presentation\n","id":8,"permalink":"https://plazi.org/posts/2023/05/engaging-the-community-in-fair-data-liberation/","tags":["Events"],"title":"Engaging the community in FAIR taxonomic data liberation: an overview of training resources at Plazi"},{"categories":[""],"contents":"Presenter: Donat Agosti\nScholarly publications are the channel to communicate research findings based on the analysis of specimens and their traits. In an open science world, the link to the specimens together with the description of the methods involved should enable us to reproduce the results. Increasingly, taxonomic publications provide a section with structured or even semantically enhanced tables, including persistent digital identifiers of the specimens. These publications can be processed, reformatted as datasets of material citations that can be reused, for example by the Global Biodiversity Information Facility (GBIF). In many cases these are the only representations of species in GBIF, but in other cases they complement specimens that have been submitted as preserved specimens. In this case, the presence of a material citation provides in return access to the knowledge about the specimen, or in other words, extends the biodiversity knowledge graph from a specimen to what is stated in the respective hosting taxonomic treatment, often far beyond by all the links that are embedded in it. Matching can be done automatically but in many cases depends on human curation. For this case a matching service developed with the Swiss eBioDiv and the European funded BiCIKL projects will be explained.\n","id":9,"permalink":"https://plazi.org/posts/2023/05/matching-material-citations-to-occurrences/","tags":["Events"],"title":"Matching material citations to occurrences: extending the biodiversity knowledge graph"},{"categories":[""],"contents":"The scientific knowledge on biodiversity is imprisoned in a daily growing corpus of hundred millions of pages of scientific publications. This knowledge is needed to better understand the dynamics and dimensions of the global biodiversity crisis, to understand the impact of climate change on the distribution of species or to understand the viral spillover from animals to humans. This knowledge is very difficult to access because it is unstructured, in printed formats, including portable data format (PDF), which are difficult to machine operate, or closed access. The power of access to millions of machine actionable articles in PMC, including millions of supplementary data files, or tens of millions of abstracts in PubMed and tools to annotate and mine and discover new facts is obvious. These tools could be used for TDM and annotations of biodiversity literature - but the PMC/PubMed equivalent has not been available for publications in the biodiversity domain, hence the need for a BiodiversityPMC!\nIn this workshop we will introduce the fledgling “biodiversity PMC” built and maintained by SIBiLS, Zenodo and Plazi, making use of the recently reviewed copyright law in Switzerland. Legal, institutional, technical aspects from processing to long term storage and accessing and annotating of the data will be discussed. This will be complemented by the research questions driving this effort from discovering known biodiversity, to extracting traits to study the impact of climate change to annotating biotic interactions to understand viral spillover to build question/answer systems.\n","id":10,"permalink":"https://plazi.org/posts/2023/05/text-mining-and-biodiv-ri/","tags":["Events"],"title":"Text Mining and Biodiversity Research Infrastructure"},{"categories":[""],"contents":"Scientific publications are the means by which scientific knowledge is communicated. In botany as well as all biological sciences, each new species discovered is based on at least one publication including a protologue conformant to the International Code of Nomenclature for algae, fungi, and plants. A protologue is based on a standard vocabulary, including a material citation of the type specimen and its hosting institution, generally a figure with the diagnostic characters and a discussion of related species. Increasingly, DNA sequences are cited. The protologues are referenced in subsequent taxonomic treatments, clearly delimited sections of a publications about one particular taxon, and implicit in taxonomic names. Together, these treatments cover the history of the names of the taxon (synonymy), are a very rich source for traits, distribution, and since they are based on physical specimens, an authoritative identification of these specimens. It is assumed that the corpus of biodiversity literature includes a daily growing corpus of 500 million pages, housed in the many libraries of natural history institutions.\nIn the digital age, these citations of implicit links allow text and datamine these publications, for example to extract the history of names, the use of a physical specimen or a gene sequence. It also allows creation of identifiers for each taxon, taxonomic treatment, figure so that they can be directly cited and reused by anybody, anytime and from anywhere.\nIn this symposium the current status of scientific publications, and their development in botany will be exposed. This includes the semantic structure of publications and how to make them citable and accessible via dedicated research infrastructures. Efforts to annotate the data in legacy publications and new developments in the world of publishing will be discussed, with an emphasis on how the data is immediately reused by the World Flora Online or GBIF.\n","id":11,"permalink":"https://plazi.org/posts/2023/05/new-value-of-scientific-pubs/","tags":["Events"],"title":"The new value of scientific publications in the digital age"},{"categories":["news"],"contents":" Group photo of the participants\nPlazi and SANBI (South African National Biodiversity Institute) conducted a training Course in Pretoria April 18 to 21, 2023, to teach local scientists how to facilitate the access and reuse of scholarly published taxonomic data by themselves as well as the scientific community at large.\nThe course had the participation of 21 researchers, curators, technicians, and Postgraduate students from different South Africa\u0026rsquo;s biodiversity institutions and Nigeria.\nThe compendium included theoretical and practical classes, covering conceptual to practical aspects of taxonomic publishing to data liberation and linking. It is an extension of the previous course on linking material citations and specimens. It made use of GoldenGate to liberate data, and TreatmentBank and the Biodiversity Literature Repository to disseminate and reuse it. The focus has been on South African fauna and flora. During the course 24 publications have been processed and are now available in BLR and GBIF, including 217 treatments, 101 new species, 172 figures and 642 material citations.\nParticipants in the training course\nThe course gave the participants the opportunity to start their certifications as data analysts and data liberators through Plazi’s TreatmentBank, and aimed to prepare new trainers in South Africa to increase data liberation and reuse according to the FAIR data concept.\nThe preparation, organization and conduction of the workshop has been supported by the Arcadia Fund, SANBI, and the BiCIKL and eBiodiv projects and organized by SANBI’s Ian Engelbrecht and Plazi’s Julia Giora and Jonas Castro.\n","id":12,"permalink":"https://plazi.org/posts/2023/05/training-s-african-scientists/","tags":["news","training"],"title":"Enabling South African scientists to convert taxonomic publications into digitally accessible knowledge"},{"categories":[""],"contents":"A goal of natural history institutions is contributing to the understanding of biodiversity and disseminating this knowledge by becoming scientific publishers. Through this, they joined the growing field of scholarly publishing established in 1665 by the Journal des sçavans to publish work from the sciences. One of the characteristics of scientific publishing is to cite previous works and thus linking existing knowledge with new discoveries. To provide access to the growing corpus of publications, libraries began to index individual works. With the advent of the digital age, full text search and text and data mining provide access to the content using machines analysing large corpora of works. The development of the semantic web enabled creating a knowledge graph. A typical example is the growth of knowledge related to a species, including synonyms, cited specimens, figures or DNA sequences. During this symposium the state and recent developments of biodiversity publishing, linking and providing increased access to data by Pensoft, CETAF-publishing group, MNHN Paris, Plazi and the Swiss Institute of Bioinformatics, including over 50 journals in a semantic enhanced format, and supported by a series of projects (BiCIKL, eBioDiv, Arcadia fund, MétoSTeM) will be discussed.\n","id":13,"permalink":"https://plazi.org/posts/2023/05/extending-the-biodiv-knowledge-graph/","tags":["Events","Lecture","BiCIKL","TDWG"],"title":"Publishing: Extending the biodiversity knowledge graph"},{"categories":[""],"contents":"We will demonstrate tools and infrastructure from the BiCIKL project to find and establish links between entities. These include:\nInfrastructure to create and manage machine actionable persistent identifiers (PIDs) for digital specimens. This allows the creation and annotation of digital specimens as new actionable objects on the internet, which provide a surrogate for physical specimens. OpenBiodiv is a linked open data (LOD) knowledge graph extracted from the biodiversity literature. OpenBiodiv can discover hidden links, e.g., between authors, taxa, sequences, material citations, publications and others using general search, SPARQL (query language), user applications and API. TreatmentBank liberates data from within publications, and converts, enhances, links, stores, and disseminates it as findable, accessible, interoperable and reusable (FAIR) data. The API has been enhanced allowing better data linkages through identifiers, matching services, and avenues for annotation. The SIBiLS custom search services provide access to biomedical literature, including PubMed Central (PMC) and MEDLINE. It also provides access to half a million taxonomic treatments curated by Plazi. BICIKL has helped aggregate more publications, enabling the creation of an all-inclusive “One Health” library, the BiodiversityPMC. ChecklistBank is an open data repository with a focus on taxonomy and nomenclature. It has tools for name matching, dataset comparison, and for linking taxonomic names from literature. ","id":14,"permalink":"https://plazi.org/posts/2023/05/toolbox-to-link-biodiv-data/","tags":["Events","Lecture","BiCIKL","TDWG"],"title":"The BiCIKL toolbox to link biodiversity data"},{"categories":[""],"contents":"eBioDiv is a Swissuniversities funded project by a consortium of HES-SO/SIB (Swiss Institute of Bioinformatics), Plazi, and the Natural History Museum of Bern (NMBE) with the goal to build a knowledge infrastructure to bring data in taxonomic literature together with physical specimens digitalized through projects funded by SwissCollNet. The linking is based on that fact that specimen, the basis of taxonomic works, are cited formally in the publications as material citations. They can be made FAIR (Findable, accessible, interoperable and reusable). The eBioDiv matching service allows to discover the cited specimen, especially if the natural history institution uploaded them to the Global Biodiversity Information Facility GBIF. The link furthermore allows to follow-up from a specimen to what is known about it in the published literature. The complementing Horizon Europe funded project Biodiversity Community Integrated Knowledge Library BiCIKL allows further linking of a specimen to taxonomic names or gene sequences.\n","id":15,"permalink":"https://plazi.org/posts/2023/03/linking-specimens-with-literature/","tags":["Events","Lecture","eBioDiv"],"title":"eBioDiv: linking specimens with literature, or to what is known about a specimen"},{"categories":[null],"contents":" Participants in the training course\nPlazi’s first online two day training course “Biodiversity and Digital Media: linking material citations in publications to specimens” organized by Julia Giora and Jonas Castro has successfully ended. The Training school happened online from the 27th to the 28th of February 2023. It is part of a series of training courses in which Plazi expects to teach interested parties how to use data preparation tools to become a data liberator. It was organized in collaborations with the Mobilise COST Action – Mobilising Data, Policies and Experts in Scientific Collections – providing the Training School (TS) “ This hands-on and interactive training initiative gave the opportunity to 20 participants, from 11 countries, and representing 14 institutions of natural history research to get informed and exercised on the accessibility and reuse of taxonomic data using digital platforms and repositories. The main theoretical topics addressed were the concept of findable, accessible, interoperable and reusable data (FAIR) and Plazi workflow, the repositories for taxonomic data, and an outlook on how to convert data from missing publications aiming a future of publishing according to FAIR data requirements. The practical activities were focused on the learning and use of the eBioDiv Matching Service, with participants linking specimens in Global Biodiversity Information Facility (GBIF) to materials citations extracted from scholarly publications and made available through Plazi\u0026rsquo;s TreatmentBank. The development of tools and learning module is co-funded through BiCIKL and Arcadia.\n","id":16,"permalink":"https://plazi.org/posts/2023/03/specimen-material-citation-matching-service-training-course/","tags":["News","Training"],"title":"Specimen - material citation matching service training course"},{"categories":[null],"contents":"Discovery of species is based on specimens and communicated through taxonomic treatments described in and presented as parts of scientific publications. In an ideal world, this would allow us to ask questions such as “what is known about this specimen?” or “what does a gene sequence of a cited physical specimen look like?”, and the results would appear immediately on our computer screens in a format suitable for further analysis.\nThis ideal world of instant insights into biodiversity data is still not here yet, but we have made big steps towards it. Through international initiatives such as DiSSCo and iDigBio, and national ones such as SwissCollNet, tens of millions of specimens are being digitized. These digital specimens are aggregated through the Global Biodiversity Information Facility (GBIF) along with data from other genomic and citizen science projects. In fact, these various sources often refer to the same specimens, thus enriching our knowledge of them.\nMaterial citations are the citation of a specimen in a taxonomic publication, and point to the source of what the scientist discovered through the combined analysis of the specimen and its congeners. Though this relationship of a material citation to its specimen looks simple, it is complex from a technical point of view. The links can be predicted only to some degree of accuracy, for example by a clustering algorithm from GBIF or via a matching algorithm used by Plazi. In either case, they provide a unique starting point for further curation.\nA major issue in digitization is how and what data are collected, and the reliability and quality of the conversion process. Since most of the data is not digitized originally with the intention of linking one data point to another, it is algorithmically not simple to create matches. In other words, human curation is required to accept or reject a proposed match. For this reason, a matching service has been developed allowing users to curate the links. Once a link is accepted, an identifier of the linked specimen can be inserted into the material citations, thereby expanding the knowledge graph one link at a time.\nMatching service user interface. The proposed match is between a material citation and possible specimens in GBIF. Each field provides a matching score with green the highest, as well as an overall score.\nPlazi in collaboration with SIBiLS and COST Mobilise will conduct a training course to interested persons on 27-28 February, 2023 to learn the underlying concepts used in digitizing taxonomic publications and occurrences in GBIF, operating the matching service, and how to decide whether a match is acceptable or not.\nThis digitization service, development of the learning materials and the training course is a collaboration between SIBiLS, Plazi and the Natural History Museum of Bern, Switzerland, supported by Swissuniversities eBioDiv project, Arcadia and Horizon Europe funded BiCIKL and COST Mobilise action projects.\n","id":17,"permalink":"https://plazi.org/posts/2023/02/expanding-the-biodiv-knowledge-graph/","tags":["News"],"title":"Expanding the biodiversity knowledge graph"},{"categories":[""],"contents":"This hands-on and interactive training will give students and professionals of natural history institutions an opportunity to get informed about and practice the liberation and reuse of taxonomic data using digital platforms and repositories. Applicants need to register online\n","id":18,"permalink":"https://plazi.org/posts/2023/01/training-biodiv-digital-media/","tags":["Events","Training"],"title":"Training course"},{"categories":[""],"contents":"Donat Agosti will participate in this event as part of the closing of the COST Mobilise project.\n","id":19,"permalink":"https://plazi.org/posts/2023/01/dissco-futures/","tags":["Events"],"title":"DiSSCo Futures 2023."},{"categories":[""],"contents":" Enabling Published Taxonomic Data to Address the Biodiversity Crisis The use-case of Biodiversity Literature Repository and TreatmentBank\nDate and Time: Oct 17, 2022, 2:00 PM\nAbstract: To understand the loss of species, a benchmark is needed, e.g. the status of biodiversity in 1992 when the Convention on Biological Diversity recognized biodiversity crisis to compare to its status in the successive year. Though we are far from knowning how many species there are on planet Earth, we keep track of their descriptions and number through the information kept in our libraries. Each species discovered is represented therein by at least one taxononic treatment. The library includes an estimated 500 million pages and is updated daily with an estimated 17–18,000 new species annually and over 100,000 treatments augmenting the knowledge of existing species. more\nThe legal landscape of data licensing in publishing Workshop Data liberation for open knowledge systems\nDate and Time: Oct 18, 2022, 4:30 PM\nAbstract: A key question for biodiversity research and applied conservation, which both rely on large, dynamic and multifaceted biodiversity datasets, is how to free biodiversity data to be able to reuse them. The goal of the workshop is to arrive at a better overview of legal rules governing data publication, and empower participants to use appropriate licenses and language that will allow data to be reused. Currently, most small publishers express concerns related to copyright and are uncertain if they are allowed to share data contained within a published paper without a clear statement from the author. Similarly, many authors are also unaware of whether or not they retain copyright on their text and data in publications. Moreover, rights and obligations associated with data presented by digital infrastructures often are not clearly stated. On the basis of initial impulse statements providing background information and distinct perspectives on the topic, the main part of the session will be an interactive exchange among participants and invited legal experts. The outcomes of the workshop will contribute to the development of guidelines for the community. The workshop is jointly organized by members of the Biodiversity Heritage Library, the Consortium of European Taxonomic Facilities (CETAF) e-Publishing working group and the Society for the Preservation of Natural History Collections (SPNHC). more\nA Possible Workflow from New and Legacy Publications Keeping the world flora online up-to-date with new species and augmenting taxonomic treatments\nDate and Time: Oct 20, 2022, 10:15 AM\nAbstract: Thousands of new species are discovered each year, and new results are published to add to the knowledge of existing species. A growing number of these are immediately accessible through the Biodiversity Literature Repository (BLR) and reused by Global Biodiversity Information Facility (GBIF), bringing the number of treatments covering plant species to over 25,000 treatments. This includes the findable, accessible, interoperable, and resuable (FAIR) treatments and related figures, and in many cases the material citation of the holotype, and links to the collection, specimen and gene sequences attribured to the codes. The FAIR data is deposited in the Biodiversity Literature Repository ensuring long-term access, and includes rich, customized metadata describing its content using standard vocabularies (e.g. Darwin Core (DwC) or Open Biological and Biomedical Ontology (OBO) Foundry, as well as links to related items and data reuse (e.g. GBIF and CheckListbank). more\nSynospecies, a Linked Data Application to Explore Taxonomic Names Using linked data to explore taxonomic names\nDate and Time: Oct 21, 2022, 11:00 AM\nAbstract: Synospecies is a linked data application to explore changes in taxonomic names (Gmür and Agosti 2021). The underlying source of truth for the establishment of taxa, the assignment and re-assignment of names, are taxonomic treatments. Taxonomic treatments are sections of publications documenting the features or distribution of taxa in ways adhering to highly formalized conventions, and published in scientific journals, which shape our understanding of global biodiversity (Catapano 2010). Plazi, a not-for-profit organization dedicated to liberating knowledge, extracts the relevant information from these treatments and makes it publicly available in digital form. Depending on the original form of a publication, a treatment undergoes several steps during its processing. All these steps affect the available digital artifacts extracted from the treatment's original publication. The treatments are digitalized, the text is annotated with a specialized editor, and cross-referenced and enhanced with other sources (Agosti and Sautter 2018). After these steps, the annotated text is transformed to the different structured data-formats used by other digital biodiversity platforms (e.g., Global Biodiversity Information Facility: Plazi.org taxonomic treatment database using Darwin Core Archive, generic linked data tools (e.g. lod view; RDF2h Browser) and other consuming applications (e.g Ocellus via Zenodeo using XML; openBioDiv using XML; HMW using XML; Biotic interaction browser using TaxPub XML; opendata.swiss using RDF). more\n","id":20,"permalink":"https://plazi.org/posts/2022/10/tdwg2022/","tags":["Events"],"title":"TDWG 2022"},{"categories":[""],"contents":"Abstract: The Workshop Day on Open/FAIR Natural History Data is intended for curators, managers, and users of natural history collections in Switzerland. It serves as a forum to draw up an overview of the current situation regarding data management and digitization in Swiss institutions and sets the collections and their data in the wider context of the emerging ecosystem of open/FAIR natural history data with its various infrastructures, stakeholders, initiatives, usage scenarios, value-added services, and governance structures.\nThe workshop day is a pre-event of the GLAMhack 2022, organized by the OpenGLAM working group of the Opendata.ch association, hosted by the Natural History Museum of Bern and supported by Plazi.org and the eBioDiv Project, in cooperation with SwissCollNet and Infofauna.\n","id":21,"permalink":"https://plazi.org/posts/2022/09/glamhack-2022/","tags":["Events"],"title":"Workshop Day on Open/FAIR Natural History Data"},{"categories":[""],"contents":"Abstract\nTaxonomy is the science of charting and describing the worlds biodiversity. Organisms are grouped into taxa which are given a given rank building the taxonomic hierarchy. The taxa are described in taxonomic treatments, well defined sections of scientific publications (Catapano 2019). They include a nomenclatural section and one or more sections including descriptions, material citations referring to studied specimens, or notes ecology and behavior. In case the treatment does not describe a new discovered taxon, previous treatments are cited in the form of treatment citations. This citation can refer to a previous treatment and add additional data, or it can be a statement synonymizing the taxon with another taxon. This allows building a citation network, and ultimately is a constituent part of the catalogue of life. Thus treatments play an important role to understand the diversity of life on Earth by providing the scientific argument why group of organism is a new species, or a synonym, and the data provided will increasingly be important to analyze and compare whole genomes of individual genomes.\nTreatments have been extracted by Plazi since 2008 (Agosti and Egloff 2009), and the TaxPub schema has been described by Catapano (Catapano 2019) to complement existing vocabularies to allow annotation of legacy literature and to produce new publications including the respective annotations (Penev et al. 2010). Today, more than 750,000 treatments have been annotated by Plazi’s TreatmenBank and over 400,000 have been made FAIR digital objects in the Biodiversity Literature Repository in a collaboration of Plazi, Zenodo and Pensoft (Ioannidis-Pantopikos and Agosti 2021, Agosti et al. 2019), and are reused by the Global Biodiversity Information Facility (GBIF), Global Biotic Interaction (GloBI), and the Library System of the Swiss Institute of Bioinformatics (SIBiLS).\nEach treatment on the Zenodo repository is findable through its rich metadata. The insertion of custom metadata in Zenodo provides metadata referring to domain specific vocabularies such as Darwin Core (Ioannidis-Pantopikos and Agosti 2021). The treatment are accessible through its DataCite Digital Object Identifier (DOI) for the taxonomic treatment as subtype of a publication. The data is interoperable by machine actionable JSON version of the treatment. A license is provided to assure it is reusable.\nThe richness of data and citations within a treatment provide a stepping stone to add treatments not only to knowledge systems such as Wikidata or openBioDiv, but to provide links to many of the cited objects, such as specimens through the material citations, and thus a well curated assemblage of links. Being a FAIR digital object, treatments can be cited and should ultimately linked to from a taxonomic name used in an identification of an organism.\nMore info\n","id":22,"permalink":"https://plazi.org/posts/2022/08/fair-digital-objects/","tags":["Events"],"title":"Taxonomic Treatments as Open FAIR Digital Objects"},{"categories":[""],"contents":"Abstract\nDivisive Power of Citizenship, one of our Institute’s SNSF-funded project, concludes this year with its closing international conference on 22/23 September 2022 in Basel, Switzerland. This event will offer a broad platform for debate about citizenship in a global context, addressing scholars working in the fields of Global History, Asian Studies, International Law and Digital Humanities, at a time of terrifying topicality.\nThe central themes of discussion will be expat communities in Asia, and civil internment camps during the Second World War in Asia, which have been rarely studied until now. An additional methodological component will investigate the potential of digital methods in this field, which will analyse the impact of access to precise data at scale relating to networks of foreign residents—addressing interferences with citizenship, statelessness and denaturalisation. The conference will also form the launch for new data resources which have been created during the course of the project, covering Foreign Residents in East Asia between 1863 and 1941.\nThe individual panels will promote interdisciplinary exchange across disciplinary borders. Panel I on foreigner status will focus on Expats, imperialists and victims: Citizenship during Transformation Periods; Panel II will discuss sharing of primary sources and secure data preservation; Panel III will address research into civilian internment camp records and introduces next-generation digital resources for historians; Panel IV will investigate legal frameworks protecting civilians, with reference to their citizenship in relation to military conflict (e.g. enemy aliens).\nMore info\n","id":23,"permalink":"https://plazi.org/posts/2022/08/divisive-power/","tags":["Events"],"title":"Publishing Copyright-Free Taxonomic Treatments as New Resources for Biodiversity Research"},{"categories":[""],"contents":"Abstract\nBauhin’s pioneering Flora of Basel „Catalogus Plantarum circa Basileam sponte nascentium“ exactly 400 years ago is a precursor to the artificially identified beginning of taxonomic publishing with Linnaeus Systema plantarum … 1753. Indeed, botanical publishing existed far before Linnaeus’, however his contribution has not only been the Latin Binomen, but the highly structured way he published the taxonomic treatments for each species, and to cite previous works. In the digital age, this kind of structuring of the information and the existing implicit citation links allows building the biodiversity knowledge graph, opening up the entire corpus of taxonomic treatments imprisoned in publications by providing it in a format that can be found and reused, cited and linked to data referenced in the publications. This provides access to all the published data about a specimen in a natural history collection, to trace back the history of a taxonomic name or the traits used to describe and deliminit the species. The Swiss based Biodiversity Literature Repository and TreatmentBank provide access to over 750,000 taxonomic treatments, over 1M material citations and are the largest data set provider to the Global Biodiversity Information Facility (GBIF). This highly automated workflow has and continues to liberate data from over 70,000 publications, which are further converted to RDF and structured into Linked Open Data via the OpenBiodiv Biodiversity Knowledge Graph. At the same time new ways to structure publications so that their resident data can immediately be reused upon publication are being developed in collaboration with Pensoft publishers, the Muséum d’histoire naturelle, Paris and the CETAF publishing group. The Swiss NGO Plazi, Zenodo at CERN and SIBiLS at the Swiss Institute of Bioinformatics are involved in the EU Horizon 2020 project BiCIKL and the Swissuniversities funded project eBioDiv to develop ways to link data in publications to specimens, taxonomic names and genes and vice versa. This lecture will explain the concept and state of the art in taxonomic publishing, access and reuse of its data.\nMore info\n","id":24,"permalink":"https://plazi.org/posts/2022/08/400-years-of-botanical-collections/","tags":["Events"],"title":"400 Years of Botanical Collections – Implications for Present-Day Research"},{"categories":[""],"contents":"Improved understanding of co-habitation of roosts by multiple species of bats is essential for estimating the risks of zoonotic disease transmission. However, ecological data on roosting environments, species richness, bat-bat interactions, viral infections, and other species interactions are scattered throughout the literature, making them difficult to study on a global scale. The research scope for most roost studies has been narrow, focusing on roost type, bat abundance, and locality data while failing to investigate interspecific roosting interactions. To meet this need, we have collaboratively built an open-access dataset of ecological interactions (including co-roosting, trophic, anthropogenic, and parasitic) extracted from the literature to improve our understanding of roost dynamics on a global scale, and to elucidate the role of shared roosts in disease transmission. As of April 2022, \u0026gt; 11,500 interaction records involving \u0026gt; 360 bat species from \u0026gt; 137 countries encompassing a variety of habitats have been extracted from \u0026gt; 175 publications spanning from 1860-2020, all accessible via the Coronavirus-Host community at Zenodo. With this benchmark dataset of open-access digitized interaction data, tools, and workflows, we provide evidence of co-roosting events that we aligned with multiple ontologies (interaction terms, taxonomies, administrative regions) and phylogenies suitable for high-throughput analysis We followed open access and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles for extracting data and choosing methodologies. We identify biases in the coverage of bat interaction records, suggest new tools for biodiversity informatics, and explore obstacles and opportunities in the mining of eco-interactions previously lost in the annals of scientific literature.\nMore info\n","id":25,"permalink":"https://plazi.org/posts/2022/08/what-we-dont-know/","tags":["Events"],"title":"What do we not know – quantifying data gaps and biases in knowledge of bat co-roosting"},{"categories":[""],"contents":"The Open Definition version 2.1 prescribes four conditions for openness of a work \u0026ndash;\nit must be published under an open license or be in the public domain, it must be accessible easily and at no more than the cost of reproduction, preferably be downloadable via the internet it must be readily readable by a computer, and it must be available in an open format. It is clear that openness is not just a matter of sticking an open license. Openness has to be a fundamental goal and part of the design. This becomes all the more important for complicated endeavors such as scientific data that need to exist in a stable form far into the future (\u0026ldquo;forever\u0026rdquo;). At Plazi we have been extracting data from biodiversity literature and making it FAIR and open for more than a decade. To do this, we have built (or helped build) the tools, the procedures, the repositories, the standards, and the partnerships to extract, clean, archive and disseminate data for perpetuity. In this presentation we will share our experience building a lasting research infrastructure that is open by design.\n","id":26,"permalink":"https://plazi.org/posts/2022/05/blr-open-by-design/","tags":["Events"],"title":"The Biodiversity Literature Repository — Open By Design"},{"categories":["news"],"contents":" The Swiss-based Plazi NGO has received a grant of EUR 1.5 million from Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin – to further develop its Biodiversity Literature Repository (BLR) established in collaboration with Zenodo, the open science repository hosted and managed by the European Organization for Nuclear Research (CERN), and the open-access scholarly publisher and technology provider Pensoft.\nSource: Eurekalert\nFigure left: A figure extracted from a scholarly article and made accessible via BLR. The image in preview mode can be downloaded inclusive of source in the metadata. Visible are links to other data elements from the article citing the figure, the source publication, a machine readable license and download alternatives. A deposit of a figure extracted from a scholarly article and made accessible via the Biodiversity Literature Repository. The image in preview mode can be downloaded inclusive of source in the metadata. On the right, there are links to other data elements from the article citing the figure, the source publication, a machine readable license and alternatives to download the data. DOI: 10.5281/zenodo.253091\nFigure right: Search results from the Biodiversity Literature Repository via Ocellus image search. Each of the images provides access to the taxonomic treatment of the species or the source article. Only figures from scholarly publications are provided.\nCredit: Plazi, License: CC BY 4.0\nThe Arcadia-supported project helps rediscover known biodiversity by liberating taxonomic treatments, material citations and images trapped in scholarly biodiversity publications, and making them FAIR and open. The project engages the community in the huge and decisive challenge to understand and preserve the biodiversity of our planet.\nOur knowledge about biodiversity is largely imprisoned in a corpus of more than 500 million pages of scientific research publications that is growing daily. Many of these publications are only available in print, and others are PDFs behind a paywall. These data are not FAIR; they are not findable, accessible, interoperable, or reusable. They cannot be linked to new digital resources such as gene sequences, citizen science observations, taxonomic names, or specimens of digitized natural history collections. Extracting and using text and data from such PDFs comes at very high cost, if possible at all.\nThrough its TreatmentBank production service, Plazi is a leader in providing access to biodiversity data liberated from publications. Thanks to the Arcadia support and in collaboration with Pensoft, Zenodo and the Swiss Institute for Bioinformatics Literature Services (SIBiLS), Plazi provides access to over 750,000 taxonomic treatments, 450,000 figures and over 1.1 million material citations from over 53,000 publications in the BLR. Ian Engelbrecht from the South African National Biodiversity Institute highlights the value of this service: “Reliable, accessible resources for taxonomic data are scarce, and most online resources provide an interpretation of the scientific literature made by the people who built them. TreatmentBank and the BLR are different in that they go straight to the source, providing the data in a dynamic, accessible format exactly as in the original publications.”\n“Having digital access to previously published species hypotheses in structured ways such as through TreatmentBank makes taxonomic research much more reproducible. Furthermore, this digital access to knowledge in a single portal informs new research in many ways as well as encourages and accelerates biodiversity/species discovery,” points out Torsten Dikow, Curator at the Smithsonian Institution (USA).\nPublished research data is one of the best curated data available. Linking extracted research data and connecting infrastructures in order to enable researchers to access services across the data lifecycle is now a part of the recently funded EU-Horizon 2020 project Biodiversity Community Integrated Knowledge Library (BiCIKL). Together with 15 European and world-level research infrastructures, Plazi is a key participant in BiCIKL.\n“Services provided by Plazi to liberate data from the precious legacy of generations of nature explorers are globally unique, given the level of automation and detail they provide,\u0026rsquo;\u0026rsquo; says the BiCIKL coordinator and Pensoft founder Prof. Lyubomir Penev. “We should strive to radically change the way we publish new data and narratives, so that these can immediately become FAIR, saving the costs and efforts of their extraction and liberation”.\nTreatmentBank and BLR are also integrated into the Swissuniversities-funded project eBioDiv to provide access to data about specimens in the Swiss Natural History collections.\nIn the previous Arcadia funded project (2017-2020), Plazi built a now widely used infrastructure, including the creation of terminology to describe taxonomic treatments and material citations, both at the base to communicate biodiversity data, and to make the Zenodo repository highly customizable. It is now also implemented at the Global Biodiversity Information Facility (GBIF), where Plazi is the major data provider for over 90,000 species.\n“GBIF data is greatly improved by the data flow provided by Plazi. Plazi liberates important data that is critical to the 64 member nations of the GBIF network as they work to provide answers to their biodiversity policy needs,” says Joe Miller, GBIF Executive Secretary.\nWith help of the current award, Plazi will focus on liberating more data from a wider array of taxonomic journals, and in collaboration with Data Futures, Plazi will develop new services to enable the broader community to enrich and curate liberated data as part of their research and to preserve the annotations for the long-term. Services and products to visualize and analyze target data, and metrics on how to measure the scientific output will be provided. A series of joint training courses and adequate training materials are also planned.\nThe open access to the liberated data will also serve as the basis for an analysis of the impact of the Bouchout Declaration on Open Biodiversity Knowledge Management, launched in 2014 and signed by more than 90 institutions and 200 individuals, to be presented at a conference in 2024.\nTo participate in the project or for further questions, please email Donat Agosti, President at Plazi.\n","id":27,"permalink":"https://plazi.org/posts/2022/05/arcadia-fund-supports-plazi/","tags":["news"],"title":"Arcadia supports Plazi in its endeavor to rediscover known biodiversity"},{"categories":[""],"contents":" Program Presentation ","id":28,"permalink":"https://plazi.org/posts/2022/01/schweizer-hymenopterologen-tagung/","tags":["Events"],"title":"Einen Einblick und Einstieg in die versteckte Vielfalt der 2021 publizierten Hymenopteren"},{"categories":[""],"contents":"Donat Agosti will present Beyond the Print and PDF Prison: Data About Biodiveristy Want to be Free, showcasing Plazi’s groundbreaking approach to the liberation of data from taxonomic treatments within scholarly publications at the World Biodiversity Forum.\nAbstract: Most of what scientists discovered about biodiversity and published, totaling a corpus of an estimated 500 million printed pages stacked up in our libraries and more recently, in digital format, is not known. It is an amount of information that can not be processed by humans, but even machines can’t cope with it because the publications are either not scanned, or are behind paywalls, or in formats that machines can’t read at scale. It is a tragedy that in this digital age we can’t make use of this data.\nBut it doesn’t need to be like this. Scientific publications are structured, they use standard means to express the results. Arguments cite previous arguments building a network of knowledge. If represented digitally, this knowledge could be represented as a knowledge graph and analysed.\nThe data in publications can be made FAIR: Figures, blocks of texts such as the descriptions of species, or material citations; named entities such as person or taxonomic names can be annotated and linked to reference vocabularies. They can be cited and reused irrespective whether a publication is behind closed doors.\nIn an exemplary way this lecture will show the collaboration between the Biodiversity Literature Repository and the Global Biodiversity Information Facility that has made available data about 80,000 species known in GBIF only because they have been liberated from the publications, thereby enabling the sharing of this scientific knowledge with anybody, anywhere for any purpose.\n","id":29,"permalink":"https://plazi.org/posts/2022/01/world-biodiversity-forum/","tags":["Events"],"title":"Liberation of data from taxonomic treatments"},{"categories":[""],"contents":"Donat Agosti will provide the lecture Nothing in (taxonomic) publishing makes sense except in the light of treatments at the conference of the Society for the Preservation of Natural History Collections (SPNHC).\nThe goal of taxonomic literature is to describe taxonomic diversity as a result of charting the Earth\u0026rsquo;s biological diversity. As research adds to our knowledge, our understanding of taxa increases with additional published results. This is how it has happened since the standard publications by Linnaeus in 1753 and 1758 in the format of taxonomic treatments, clearly delimited sections of text about a particular taxon. Later, material citations were added, providing an explicit link to the material that led to the published research result. Each treatment has a nomenclatural section including the referenced taxonomic name representing the usage of a specific citable name. Thus, the taxonomic name Apis mellifera L, 1758 refers to the taxonomic treatment published by Linnaeus 1758 on page 576. Later usage of the name cites this treatment thereby adding new research to the scientific corpus.\nFrom a semantic point of view, a treatment and a material citation provide context to the content. For example, a geo-coordinate or specimen code in a material citation are references to a specimen which is a reference to the taxon of the treatment in which the material citation is in the text of a publication.\nWhile this can be easily perceived by a human, machines depend on the treatment, treatment citation, material citation and taxonomic name annotations for recognizing this relationship.\nThese elements are in most cases distinct entities discoverable and annotatable by machines, can support human curation, but even better, the annotations can be embedded in prospective publications like those championed by Pensoft and the European Journal of Taxonomy.\nOver the last 15 years, Plazi has been spearheading efforts to develop TaxPub, a schema modeling the taxonomic treatments, as well as material citations. Plazi has developed a processing workflow to discover these elements, make them open and citable using TreatmentBank and the Biodiversity Literature Repository and reused in collaboration with the Global Biodiversity Information Facility. Currently 730,000 treatments and over 1M material citations have been liberated from 51,000 publications. They have also been made reusable by GBIF, including over 80,000 taxa that are in GBIF only because of Plazi-provided treatments.\n","id":30,"permalink":"https://plazi.org/posts/2022/01/spnhc/","tags":["Events"],"title":"Nothing in (taxonomic) publishing makes sense"},{"categories":[""],"contents":"We will take you on a journey into the world of biodiversity data standards and tools and will inform you on the latest developments in the world of Biodiversity Informatics, both internationally and locally. What is the state of the art in initiatives like GBIF, LifeWatch and DiSSCo, etc and how can we let them work for us? We have invited some very interesting international speakers and will inform you on the progress made by these projects in Belgium. During this conference we will build a bridge between Biodiversity informatics and its older relative “Bioinformatics” and investigate what a crossover of these two worlds can bring to Biodiversity Research. Genomic information meets digital taxonomy!\n","id":31,"permalink":"https://plazi.org/posts/2022/01/empowering-biodiversity/","tags":["Events"],"title":"Discovering Known Biodiversity"},{"categories":[""],"contents":"Plazi is a non-profit organisation founded in 2008 to promote the free accessibility of scientific data, in particular taxonomic treatments and images.\nWhat are taxonomic treatments? A taxonomic treatment is the scientific description of a biological species, i.e. an animal species, a plant species, a fungus or a bacterium. If, for example, an unknown animal species is discovered today, a single individual of this species is selected, the so-called holotype. The species is then given a name that has not been used before and formed according to predefined rules, it is described scientifically and the description is published. The holotype is the ultimate reference specimen.\nThe description contains a list of the external characteristics of the individuals of the species, the time and place of the discovery, the etymological derivation of the name and information on which natural history collections they are deposited in. Some descriptions include DNA analyses, information on the distribution of the species or on how the species can be distinguished from similar species, etc. Often such descriptions also include illustrations showing the individual as a whole or specific details. However, a treatment can also be an addition to an already known species description, for example if the distribution of a species has changed or if it turns out that a supposedly new species is identical to an already known species.\nWe do not know what we already know But how can we be sure that a probably new species that has been discovered has really never been described before? By searching the already published treatments for a suitable description and comparing already existing illustrations. Unfortunately, there is neither a complete list of known species nor a searchable database of all published treatments. The majority of treatments are contained in an estimated 500 million pages of books stored in scientific libraries, some of which have been out of print for decades and cannot be accessed digitally. This makes searching enormously time-consuming, inefficient and also simply unfeasible in practice.\nThis is where Plazi comes in, by indicating treatments in scientific articles and books and making them FAIR. FAIR stands for Findable, Accessible, Interoperable and Reuseable. The data are stored in our own databases, in particular in Plazi\u0026rsquo;s own TreatmentBank and in the Biodiversity Literature Repository, and linked to each other so that they can be found, analysed and reused using search engines. The long term preservation of the liberated data is secured through the collaboration with the Zenodo repository at CERN which is hosting the Biodiversity Literature Repository.\nWe take treatments of animals, plants, fungi and bacteria from existing literature on the one hand, and from daily research publications on the other, and integrate them into our TreatmentBank. The data in the TreatmentBank can be freely accessed anywhere and is fed into the Global Biodiversity Information Facility database, for example.\nOur motivation In order to slow down and, if possible, stop the global extinction of species, knowledge about individual species is eminently important. Only what is known can be specifically protected. By making as much information as possible accessible and available efficiently and free of charge, we contribute to the scientific study of our environment as well as to raising humanity\u0026rsquo;s awareness of the diversity of nature and the need to preserve and protect it. We are also helping to narrow the knowledge gap between North and South and to ensure that data is also available in the tropics, where the greatest biodiversity exists.\nSince its foundation, Plazi has opened up access to well over 800,000 treatments and 450,000 illustrations from almost 78,000 articles and books. As a result, Plazi already has the largest digital and freely accessible collection of scientific treatments. It is estimated that there are around 8.7 million species worldwide,1 although the numbers vary considerably depending on the different studies. So far, 2.13 million species have been described scientifically, but this also includes a number of species that have been described several times due to the lack of accessibility of the data. So there is still a lot to do, but our TreatmentBank is growing daily. In 2022, 47,373 treatments from 4,359 papers and about 30,000 images were made available, including 7,385 treatments of species that were only discovered and described as new in 2022!\nMora, Camilo, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, und Boris Worm. \u0026ldquo;How Many Species Are There on Earth and in the Ocean?\u0026rdquo; PLOS Biology 9, Nr. 8 (23. August 2011): e1001127. https://doi.org/10.1371/journal.pbio.1001127.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","id":32,"permalink":"https://plazi.org/about/mission/","tags":["About","Mission"],"title":"Plazi Mission"},{"categories":[""],"contents":"Plazi ist eine 2008 gegründete Non-Profit-Organisation, die sich für die freie Zugänglichkeit wissenschaftlicher Daten, insbesondere taxonomischer Treatments und Bilder, einsetzt.\nWas sind taxonomische Treatments? Ein taxonomisches Treatment ist die wissenschaftliche Beschreibung einer biologischen Art, also einer Tierart, einer Pflanzenart, eines Pilzes oder eines Bakteriums. Wird heute beispielsweise eine noch nicht bekannte Tierart entdeckt, so wird ein einzelnes Individuum dieser Art ausgewählt , der sogenannte Holotype. Die Art wird dann mit einem bisher nicht verwendeten, nach vorgegebenen Regeln gebildeten Namen bezeichnet, sie wird wissenschaftlich beschrieben und die Beschreibung publiziert. Der Holotype ist dabei das gültige Referenzexemplar.\nDie Beschreibung enthält eine Auflistung der äusseren Merkmale der Individuen der Art, Zeitpunkt und Ort der Funde, die etymologische Herleitung des Namens und Angaben darüber, in welchen naturhistorischen Sammlungen sie hinterlegt sind. Manche Beschreibungen enthalten DNA-Analysen, Angaben zur Verbreitung der Art oder dazu, wie die Art von ähnlichen Arten unterschieden werden kann usw. Oft umfassen solche Beschreibungen auch Illustrationen, welche das Individuum als Ganzes oder bestimmte Einzelheiten darstellen. Ein Treatment kann aber auch eine Ergänzung zu einer bereits bekannten Artenbeschreibung sein, zum Beispiel wenn sich die Verbreitung einer Art verändert hat oder wenn sich herausstellt, dass eine vermeintlich neue Art mit einer bereits bekannten Art identisch ist.\nWir wissen nicht, was wir schon wissen Doch wie kann man sicher sein, dass eine wahrscheinlich neue Art, die entdeckt wurde, wirklich noch nie zuvor beschrieben wurde? Indem die bereits veröffentlichten Treatments nach einer passenden Beschreibung durchsucht werden und bereits existierende Abbildungen verglichen werden. Leider existiert aber weder eine vollständige Liste der bekannten Arten noch eine Datenbank, die alle bereits veröffentlichten Treatments enthält und die sich durchsuchen lässt. Die Mehrzahl der Treatments ist in geschätzten 500 Millionen Seiten von Büchern enthalten, die in wissenschaftlichen Bibliotheken lagern, teilweise schon seit Jahrzehnten vergriffen und nicht digital abrufbar sind. Dies macht die Recherche enorm zeitaufwändig, ineffizient und in der Praxis auch schlicht undurchführbar.\nHier setzt Plazi an, indem wir Treatments in wissenschaftlichen Artikeln und Büchern nachweisen und sie FAIR machen. FAIR bedeutet Findable (auffindbar), Accessible (frei zugänglich), Interoperable (computerlesbar) und Reuseable (weiterverwendbar). Die Daten werden in eigenen Datenbanken, insbesondere in der Plazi-eigenen TreatmentBank und im Biodiversity Literature Repository, abgespeichert und miteinander verlinkt, sodass sie mittels Suchmaschinen gefunden, analysiert und weiterverwendet werden können. Die Langzeitspeicherung der Daten wird durch die Zusammenarbeit mit dem Zenodo Speicher von CERN gewährleistet, welcher die Biodiversity Literature Repository einschliesst.\nWir entnehmen Treatments von Tieren, Pflanzen, Pilzen und Bakterien einerseits aus bestehender Literatur, andererseits aus den täglich erscheinenden Forschungspublikationen und integrieren sie in unsere TreatmentBank. Die Daten der TreatmentBank können überall frei abgerufen werden und werden beispielsweise in die Global Biodiversity Information Facility-Datenbank eingespeist.\nUnsere Motivation Um das globale Artensterben zu verlangsamen und nach Möglichkeit zu stoppen, ist das Wissen über die einzelnen Arten eminent wichtig. Nur was bekannt ist, kann gezielt geschützt werden. Indem wir möglichst viele Informationen zugänglich und effizient und kostenlos abrufbar machen, leisten wir einen Beitrag zur wissenschaftlichen Erforschung unserer Umwelt sowie zur Sensibilisierung der Menschheit für die Vielfalt der Natur und die Notwendigkeit, diese zu erhalten und zu schützen. Wir tragen auch dazu bei, dass der Wissensgraben zwischen Nord und Süd verkleinert werden kann und dass die Daten auch in den Tropen, wo die grösste biologischen Vielfalt besteht, vorhanden sind.\nSeit seiner Gründung hat Plazi den Zugang zu weit über 800’000 Treatments und 450,000 Abbildungen aus fast 78’000 Artikeln und Büchern erschlossen. Dadurch verfügt Plazi schon jetzt über die grösste digitale und frei zugängliche Sammlung an wissenschaftlichen Treatments. Schätzungen gehen davon aus, dass es weltweit circa 8,7 Millionen Arten gibt,1 wobei die Zahlen je nach Studie deutlich variieren. Wissenschaftlich beschrieben wurden bisher 2,13 Millionen Arten, worin aber auch etliche Arten enthalten sind, die – wegen der fehlenden Zugänglichkeit der Daten – mehrmals beschrieben wurden.. Es bleibt also noch viel zu tun, aber unsere TreatmentBank erhält täglich Zuwachs. So wurden 2022 aus 4,359 Arbeiten 47,373 Treatments und rund 30'000 Bilder zugänglich gemacht, darunter auch 7,385 von Arten die erst in diesem Jahr entdeckt und als neu beschrieben worden waren!\nMora, Camilo, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, und Boris Worm. „How Many Species Are There on Earth and in the Ocean?“ PLOS Biology 9, Nr. 8 (23. August 2011): e1001127. https://doi.org/10.1371/journal.pbio.1001127.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","id":33,"permalink":"https://plazi.org/about/mission-de/","tags":["About","Mission"],"title":"Was macht Plazi?"},{"categories":[null],"contents":"At year-end, it is common to review the past year’s highlights as we (Plazi) too did, highlighting a few of the almost 9000 species discovered in 2021. A few of the world\u0026rsquo;s leading taxonomic institutions did so as well with their recent discoveries. Since we are interested in not just finding about the species discovered but also learning more in detail about them, we decided to test how easy it was to locate their original taxonomic treatments in the scientific papers that described the species mentioned in the press releases.\nWe took three press releases, one each from the California Academy of Sciences (CAS), the Royal Botanic Gardens Kew and the Natural History Museum of London (NHM) in which 29 species from 29 different scientific articles are mentioned.\nWe scientists, with at least a basic understanding how taxonomic works are published, read the press releases and measured the time we took to locate the original sources of the 29 species. In all three press releases, most of the species are mentioned along with their scientific names that could be copied into Ecosia, a search engine, to find the original source paper. If the original articles were open access, we were able to locate most of them within two minutes. However, some species were mentioned only by their vernacular or descriptive names, not their scientific names (for example, São Tomé’s caecilian). In such cases, the search took up to 8 minutes.\nAll three articles referred to species whose taxonomic data were not accessible openly due to paywalls or registration obstacles that sometimes didn’t work due to technical issues. In two cases, typos in the taxonomic names made it more difficult (up to 15 minutes) to locate the original publication.\nArguably more importantly, none of the articles provided author citations or direct links to the original source publications. If direct links to the source publications had been provided then typographic errors or vernacular names would not have mattered.\nThis is only half the story. Taxonomic names refer to a section of a scientific publication, a taxonomic treatment, delimited by the author, to present and discuss the results of the discovery of a new specie. In the age of the internet, this section could be cited allowing the interested reader to directly and immediately learn about the facts of the new specie, including figures and further links to the specimens in the digital collection. This would make the scientific collection more usable. While writing her article, the scientist has this structure in mind. The current publishing process, with a few notable exceptions, removes this structure by publishing a long text that, while understandable by humans, is machine processable only at great cost. As a result, a press officer or journalist writing an article about new species is unable to easily link to the respective treatment of those species.\nWe therefore explored what it takes to make these treatments and the cited specimen (holotype) open access, citable via the Biodiversity Literature Repository (BLR on Zenodo), and reused by the Global Biodiversity Information Facility (GBIF), where increasingly all observations on biodiversity aggregate.\nThe PDF conversion, annotation and dissemination of the data is automated as much as possible, but errors discovered during the quality control process have to been fixed manually. Since the 29 articles examined in this exercise have been published in 24 different journals, most of them very domain-specific, automation is not easy. Furthermore, processing includes the entire article, not just the target text, and thus the time taken to process reflects the processing of the entire article.\nFor journals where a high degree of automation is possible because of their known/consistent layout, e.g. European Journal of Taxonomy or Zootaxa, processing per page takes between 1 and 2.5 minutes. In this case, the average for all journals was 3 minutes with a maximum of 9.5 minutes. The total average processing time of articles, ranging from 2—93 pages took 57 minutes. These articles included 891 pages, 170 taxonomic treatments and 408 figures, accessible through BLR and GBIF.\nClearly, the digital age is more than telling interesting stories about research in natural history institutions. Taxonomic names linked to their taxonomic treatments are a way to provide access to the fantastic results its scientists provide.\npress release vernacular name taxonomic name source article and accessibility Plazi mediated links CAS Easter egg weevil Pachyrhynchus obumanuvu DOI (Open Access via Researchgate) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype CAS pygmy pipehorse Cylix tupareomanaia DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype CAS scorpion Centruroides catemacoensis DOI (Open Access via Researchgate) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype CAS São Tomé caecilian \u0026nbsp; DOI (Closed access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype CAS Guitarfish Acroteriobatus andysabini DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype CAS sea star Uokeaster ahi DOI (Closed Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW Killer tobacco plant Nicotiana insecticida DOI (Closed access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW hidden banana seed fungus Fusarium chuoi DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW Ghost orchid Didymoplexis stella-silvae DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW blue Barleria Barleria thunbergiiflora DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW Cape primrose Streptocarpus malachiticola DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW Firework flower Ardisia pyrotechnica DOI (Access via Researchgate) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW Bolivian periwinkle Philibertia woodii DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW tooth-fungus Hydnellum nemorosum DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW Voodoo lily Pseudohydrosme ebo DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype KEW bright-blue-fruited rainforest shrubs Chassalia northiana DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM ankylosaur Spicomellus afer DOI (Closed Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM chunky sauropod Rhomaleopakhus turpanensis DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM \u0026nbsp; Brighstoneus simmondsi DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM \u0026nbsp; Pendraig milnerae DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM \u0026nbsp; Megalomys camerhogne DOI (Access via Researchgate) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM Jurassic mouse Borealestes cuillinensis DOI (Closed Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM \u0026nbsp; Amazops amazops DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM moth Xanthopan praedicta DOI (Access via Researchgate) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM \u0026nbsp; Mecopoda sismondoi DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM deep sea polychaete worm Neanthes goodayi DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM giant amphipod Eurythenes atacamensis DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM jewelweed Impatiens versicolor DOI (Access via Researchgate) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype NHM Joseph's racer Platyceps josephi DOI (Open Access) TB treatment BLR treatment GBIF treatment GBIF occurrence holotype ","id":34,"permalink":"https://plazi.org/posts/2022/01/access-to-data-in-press-releases/","tags":["News"],"title":"Access to taxonomic treatments mentioned in press releases"},{"categories":[null],"contents":"In 2021 Plazi provided access to the taxonomic treatments and figures of 8848 new species, 647 new genera and 27 new families discovered as part of 11391 changes in taxonomic status in 2021, published in 4420 articles from 95 journals. Here we present a small selection of 12 spectacular species with links to their complete taxonomic treatment. Since we can only protect what we know, we want people to have free access to taxonomic data, always and everywhere.\nSolmaris flavofinis Schuchert \u0026amp; Collins, 2021 Solmaris flavofinis Schuchert \u0026 Collins, 2021 During 75 night-time dives in the Gulfstream off Florida, 46 species were identified and six newly discovered species were described. One of them is Solmaris flavofinis, a hydromedusa named for its yellow-tipped tentacles. (BioOne) Full Treatment\nGypogyna mexicana Ruiz \u0026amp; Bustamante, 2021 Gypogyna mexicana Ruiz \u0026 Bustamante, 2021 The holotype of the jumping-spider Gypogyna mexicana was found in Jalisco, Mexico in 2014 and is now deposited in the Spencer Entomological Museum of the University of British Columbia. (Zootaxa) Full Treatment\nGhatiana rouxi Pati \u0026amp; Thackeray, 2021 Ghatiana rouxi Pati \u0026 Thackeray, 2021 This deep purple freshwater crab was found among four other newly discovered freshwater crabs in the Western Ghats of Goa and Karnataka, India. It is named after the Swiss zoologist, Dr. Jean Roux. (Zoosystema) Full Treatment\nEnyalioides feiruzae Venegas, Chávez, García-Ayachi, Duran \u0026amp; Torres-Carvajal, 2021 Enyalioides feiruzae Venegas, Chávez, García-Ayachi, Duran \u0026 Torres-Carvajal, 2021 In the Río Huallaga basin in central Peru a colorful wood lizard was newly discovered and named after Feiruz, a green iguana owned by Catherine Thomson that figured as her muse and lifelong friend. (Evolutionary Systematics) Full Treatment\nDanionella cerebrum Britz, Conway \u0026amp; Rüber, 2021 Danionella cerebrum Britz, Conway \u0026 Rüber, 2021 The brain of the transparent fish Danionella cerebrum has already been well studied even before it was known as a separate species, because it was mistaken as Danionella translucida Roberts, 1986, which looks externally almost identical. Danionella cerebrum has the smallest adult brain among vertebrates. (scientific reports) Full Treatment\nRubus longistipularis Espinel-Ortiz \u0026amp; Romol, 2021 Rubus longistipularis Espinel-Ortiz \u0026 Romol, 2021 In the western Andes of Ecuador a new blackberry was discovered: Rubus longistipularis. Its magnificent flowers have deeply concave, pink petals with fuchsia borders. (PhytoKeys) Full Treatment\nBrighstoneus simmondsi Lockwood, Martill \u0026amp; Maidment, 2021 Brighstoneus simmondsi Lockwood, Martill \u0026 Maidment, 2021 On the Isle of Wight, UK, a sensational new species has been discovered which has been extinct for a long time. Brighstoneus simmondsi is a dinosaur that lived in the Lower Cretaceous, and belongs to a newly described genus. (Journal of Systematic Palaeontology) Full Treatment\nOctopus djinda Amor, 2021 Octopus djinda Amor, 2021 On three shallow-water locations off southwest Australia, 25 individuals of a newly described octopus were found. Even though this species was regularly caught by fishermen, it wasn’t known to be a separate species. Djinda means star in the Nyoongar language. (Zootaxa) Full Treatment\nCallogobius falx Fujiwara, Suzuki \u0026amp; Motomura, 2021 Callogobius falx Fujiwara, Suzuki \u0026 Motomura, 2021 The holotype of Callogobius falx was purchased at a local aquarium shop at Ishigaki Island, Japan. Its distribution ranges from southern Japan to the Philippines. (Zootaxa) Full Treatment\nEumillipes persephone Marek, 2021 Eumillipes persephone Marek, 2021 In a drill hole in Western Australia, 60 meters below ground, the first true millipede was discovered. ‘Eu’ means ‘true’ in Greek, ‘millipes’ is a combination of the Latin words for ‘thousand’ and ‘foot’; the name refers to the fact that it’s the first species that really has more than 1000 legs, 1,306 to be exact! (scientific reports) Full Treatment\nHiperantha pikachu Pineda \u0026amp; Barros, 2021 Hiperantha pikachu Pineda \u0026 Barros, 2021 Pokémon are not the only creatures that mimic existing species, it also works the other way around: Due to similarities of the wings of Hiperantha pikachu with the ears of Pikatchu, the newly discovered jewel beetle is named after the Pokémon. (Zootaxa) Full Treatment\nGephyromantis marokoroko Hutter, Andriampenomanana, Andrianasolo, Cobb, Razafindraibe, Abraham \u0026amp; Lambert, 2021 Gephyromantis marokoroko Hutter, Andriampenomanana, Andrianasolo, Cobb, Razafindraibe, Abraham \u0026 Lambert, 2021 The skin texture gave this newly discovered forest frog its name: ‘Marokoroko’ is a Malagasy word meaning ‘rugose’ or ‘rugged’. It can be observed in the mid-altitude rainforest near Andasibe, Madagascar. (Zoosystematics and Evolution) Full Treatment\n","id":35,"permalink":"https://plazi.org/posts/2021/12/new-species-2021/","tags":["News"],"title":"New Species of 2021"},{"categories":[null],"contents":" Figure 1: A snapshot of a published table showing the GenBank accession numbers and localities for the 507 individuals representing 61 species of Cladocera and 19 of Copepoda used in this study. DOI: https://doi.org/10.11646/zootaxa.1839.1.1 Figure 2: The same table as in Figure 1 extracted by TreatmentBank. Full table available in HTML and JSON formats. Linking molecular data to taxonomic names and their extensive taxonomic treatments represents a fundamental component in biodiversity assessment. Voucher specimens for sequenced data can be the key nodes to make these connections.\n— Project 15, Biohackathon 2021\nTaxonomic treatments are research results related to a taxon published in a standard way as sections of a publication. Ideally, and in the context of reproducible science, all the data or identifiers to the data are available. For taxonomic publications “all the data” include material citations to the specimens used and genomic data generated as basis for the analysis. Thus a curated link is provided in a publication between a species, more precisely a taxonomic name providing the expert identification of a specimen, the specimen itself represented by its material citation, and one or more DNA sequences. Furthermore, the taxonomic name can be linked to the cited treatment, opening up further data about the species. The treatment includes other data such as traits that can be used to annotate gene sequences.\nIn an ideal world, the linking within the treatment can be used to either start from a specimen or a gene sequence to discover more information about it such as its geographical range, the source specimen, or its location in the phylogenetic tree.\nThis would require these relationships to be explicitly expressed in the publication, or at least available for machine-processing — how else could we process millions of printed pages? The natural candidates for expressing this relationship are either the material citation that includes the data about one specimen or tables that include all these data in one well organized location.\nThe reality however is that an increasing number of tables are produced listing the specimen and its gene accession numbers or the specimen code or geographic location information. This would be reasonable if tables were not hard to extract, especially if they span multiple pages, and then to provide the rows as material citations representing the links between specimen and genes.\nThe other reality is that many of these tables are published as supplementary material in the form of MS-Excel or MS-Word formatted documents, without identifiers and thus can not be automatically discovered, extracted and analyzed programmatically. Of course, the publications that are behind paywalls do not even allow reuse of the data without institutional access or upfront payment\nIn this hackathon, we focused on the most promising source for specimens– and genes– related data to find and extract tables, and check them for quality. We analysed 14,000 tables already extracted by TreatmentBank to develop a quality control mechanism for determining the correctness of the extracted tables, and are now continuing to develop a more refined mechanism to represent the quality of extraction.\nThis work is based on TreatmentBank’s algorithm to extract tables, even tables that may span multiple pages, and deliver them in different formats such as JATS.\nThe results are very promising. We continue to work to make tables first-class digital citizens and extract the respective data as citable units, ultimately accessible in GBIF as material citations that express the relationship of genomic data with specimen and literature-based data. With this, we will be able to close the gap between specimens, genes and literature and provide an expert-curated link between them.\n","id":36,"permalink":"https://plazi.org/posts/2021/11/annotating-gene-sequences/","tags":["BiCIKL"],"title":"Annotating genes sequences with data from herbarium sheets and publications"},{"categories":[null],"contents":" e-BioDiv: processing data liberation At today’s meeting of the Swiss Systematics Society, the species of the year — a species discovered and described by a Swiss scientist — will be selected.\nCurrently 89 species have been listed out of 31 publications, of which 69% are open access and 31% closed access, and the Revue Suisse de Zoologie is the most prolific journal. New species are hardly ever the only taxonomic treatment of a publication. This is shown by the 314 treatments contained within these publications.\ne-BioDiv: summary of liberated data The conversion of the articles into digital accessible knowledge shows more in detail the output of the scientists. For example, 358 figures and 24 tables have been published based on 1,600 specimens examined.\nA geospatial analysis shows the scientific network of the Swiss scientists based on the affiliations of their co-authors. The analysis of the material citations shows where the specimens have been collected and where the material is stored. This shows that the collecting efforts are focused on Southeast Asia, but also that scientists are active worldwide.\nA taxonomic analysis shows that taxonomic efforts are dominated by studies on animals.\nThe data are provided by TreatmentBank with support by the Swissuniversities-funded project eBioDiv. All the data are open, well-documented and available in TreatmentBank, the Biodiversity Literature Repository, and are reused by the Global Biodiversity Information Facility.\nAll this work is a part of a project to liberate data in publications produced by and covering Swiss-based natural history collections, and to link material citations to specimens, and specimens back to the material citations. This is a work-in-progress and any recommendations and feedback are welcomed.\n","id":37,"permalink":"https://plazi.org/posts/2021/11/behind-new-species-swiss-made/","tags":[null],"title":"Behind New Species Swiss-Made"},{"categories":[""],"contents":"Plazi will participate in the Biohackthon Europe. Each November ELIXIR organizes BioHackathon Europe, which brings together bioinformaticians from around the world. The event takes place in different locations around Europe. The BioHackathon offers an intense week of hacking, with over 160 international participants who work on diverse and exciting projects. The week starts with a half-day symposium to introduce these projects, and is followed by five days of hacking with one sole aim: coding to address problems in bioinformatics.\nPlazi will participate in project 15: CAB2: A step towards Biodiversity data enrichment. Guido Sautter will be participating on site.\nAbstract Project 15: CAB2: A step towards Biodiversity data enrichment Linking molecular data to taxonomic names and their extensive taxonomic treatments represents a fundamental component in biodiversity assessment. Voucher specimens for sequenced data can be the key nodes to make these connections. During Biohackathon 2020, several projects investigated how sequence (meta)data could be retrieved from ENA and connected to taxonomic treatment or specimen databases like TreatmentBank and GBIF.\nWith this proposal, we aim to link more voucher specimens to sequences by applying machine learning techniques to specimen images, retrieving sequencing metadata physically on the specimen that can facilitate and maximize the linking process. We will then employ these metadata to improve the ENA linking process, allowing wider data discovery and enhancement. We also aim to develop a standard module to compare ENA, GBIF, and TB geographical data related to specific taxa and return the results in an interactive data exploration dashboard. The improvements will also address the gap-filling of gene names embedded in scientific papers relative to the accession numbers.\nResults obtained in this project will reflect the importance of integrating different data sources in order to deliver consistent and complete biodiversity data to the scientific community and feed into European biodiversity projects such as Bioscan, BiCIKL and ERGA.\nExpected outcomes\nAn adaptable workflow which finds sequenced specimens, captures sequencing data and uses this information to find the sequences. Voucher specimen records with explicit connections to DNA sequence records. Publication in BioHackRxiv.\nExpected audience\nParticipants: Maarten Trekels, Steven Verstockt, Sofie Meeus, Kenzo Milleville, Krishna Kumar, Thirukokaranam Chandrasekar, Bachir Balech, Donat Agosti, Alberto Brusati, Anna Sandionigi, Dario Pescini and Marcus Guidoti\nSkillsets: sequence and specimen databases image analysis text detection (OCR, HTR) text mining and matching scientific literature mining\nNumber of expected hacking days: 4\n","id":38,"permalink":"https://plazi.org/posts/2021/11/biohackathon-europe/","tags":["Events"],"title":"Biohackthon Europe"},{"categories":[""],"contents":"Plazi will participate in the SSSDay organised by the Swiss Systematics Society (SSS) to answer questions regarding the Swiss universities funded eBiodiv project and the EU Horizon 2020 funded BiCIKL project. The event is hosted by the Musée de Zoologie Lausanne.\nSchedule\n","id":39,"permalink":"https://plazi.org/posts/2021/11/sssday-2021/","tags":["Events"],"title":"SSSDay Q\u0026A"},{"categories":[""],"contents":"Our colleague Prof. Roderick Page recently wrote about quality issues with the data we are extracting from scientific literature and making available for anyone to use anywhere for any purpose. We are thankful for Rod\u0026rsquo;s insightful and helpful criticism. In this post (and several others to come), we explain material citations and other data that we extract, address the data quality issues — why they exist, what we are doing about them — and why the extracted data are still important and useful despite the inherent errors.\nDisplayed above are the material citations for which geo-coordinates have been included in the publication. Each dot on the map is linked to a material citation, the treatment and publications. This allows immediate access to check whether the liberated data is correct and to suggest improvements when necessary. A total of 1M material citations have been extracted of which 400,000 (160,000 with geo-coordinates) are available on GBIF. The difference in numbers is due to quality control (QC) required before data are transferred to GBIF. The above map is based on a graphic by Rod Page. Our interpretation is released under CC0 Public Domain Dedication. Material citations are one of the data types in taxonomic publications that spur a lot of interest and discussions. They cite the specimens used for the research, linked to a specific taxon, either by being included in a respective taxonomic treatment, table or supplementary materials. They represent the expert’s identification of the specimens, and because of that, are the best possible documentation of the identification of a specimen. This is unlike many digitized specimens from natural history collections that have not passed expert curation, peer-review and publication in scholarly articles, especially revisionary works. The person who identified them as well as the source of the citation can always be tracked with their respective links to the treatment. This kind of material citations are highly valued because of their richness of facts. These facts can include the collecting location, country, habitat, collecting methods, collector, collecting date, and specimen code. Because of this, they play an increasingly important role to answer questions such as who collected where, when, what, or which specimen has been reused.\nAs pointed out in the Plazi symposium at the TDWG 2021 conference, the “key bits of information” produced by TreatmentBank, complementing named entities such as taxonomic names, are taxonomic treatments, treatment citations and figures that are extracted mainly from PDFs, made FAIR and thus made ready for human and machine consumption.\nMaterials citations are a downstream product in Plazi’s workflow, especially the parsing of details, as part of the treatment and the source publication. We recognize their availability in this format as highly valuable for third party processing despite all their shortcomings, providing unique data especially for less known species, and in collections not yet digitized. While less than 100% accurate, they hopefully contribute to developing best practices on how to publish specimens in the future.\nThe data quality problems with materials citations are a result of many reasons – for one, they cite one or more specimens in a highly variable way and are embedded in unstructured text. Additionally, OCR procedures, especially of PDFs scanned from historical publications, are prone to a relatively higher amount of errors. At the moment they are published for humans to read and understand, but certainly not for machine consumption.\nIn spite of all these quality issues, we believe there is value in extracting and liberating materials citations. Open and FAIR data lend themselves to further refinement, or to build applications to visualize annotations in text, reuse them on maps, or display them as individual occurrences with rich metadata such as by GBIF, or be reused by other publications (Plazi mediated data is used in over 500 scientific papers). For this, we need to not just improve the quality of the extracted data but make the data more structured and extractable to begin with. The recent collaboration between the European Journal of Taxonomy, Pensoft Publishers and Plazi led to guidelines on how best to publish material citations.\nTreatments and material citations of up to 80,000 species are the only source of information about these taxa in the Global Biodiversity Facility (GBIF), indicating the important role of research publications to understanding the long tail of little known species.\nQuality control and feedback mechanisms help improve material citations. We believe that 1 million liberated and usable material citations is an important first step despite the errors inherent in the test mining process. Material citations are part of a treatment, and thus the taxonomic identification is not an issue. The omnipresent link to the treatment and article and TreatmentBank allows curating individual material citations. We continue to work on QC issues and evolve this process to accommodate large scale, algorithmical curation. Ongoing developments to label the granularity of markup and quality control, together with increasing involvement of users, will make material citations fit for more use cases.\nNo machine translation, especially that of printed literature meant for human-consumption, is going to be 100% accurate. But the scale of potentially valuable data trapped in articles necessitates machine translation as the only viable way to liberate this data and make it available for secondary use. This will and does result in data that are not 100% accurate. Just like any other un- or partially-reviewed source, the material citations should be verified before further use, especially for further research, by comparing against the original text. To facilitate this, the material citations are submitted to an initial quality control stage on TreatmentBank, and we provide links back to the original text making data verification easy. Providing millions of facts in an easily searchable form allows for valuable analysis and synthesis over a vast quantity of research previously impossible. Making the data available results in an inevitable compromise in accuracy but not making it would result in a 100% inaccuracy by way of a complete gap in knowledge about them.\nWe continue to make investments in data QC that will lead to better quality, but we need crowdsourced human-curation from all interested users. In keeping with the spirit of open source that there are fewer bugs when many more eyes are looking at it, this trade-off between data availability and accuracy is hopefully temporary. Increased data availability will lead to increased use which will hopefully lead to increased reporting of inaccuracy that we will fix. The overall result will be beneficial to everyone.\n","id":40,"permalink":"https://plazi.org/posts/2021/10/liberation-first-step-toward-quality/","tags":["Data Quality"],"title":"Liberating material citations as a 1st step to better data"},{"categories":["news"],"contents":"On August 25, the new term request by Plazi for a new term for “a reference to or citation of one, a part of, or multiple specimens in scholarly publications” went through the public review process and resulted in the ratified term and in the Quick Reference Guide.\nThe original term name materialCitation has been changed to follow Darwin Core convention of using upper camel case for Class names.\n","id":41,"permalink":"https://plazi.org/posts/materialcitation-accepted-as-a-new-term/","tags":["news"],"title":"MaterialCitation accepted as a new TDWG Darwin Core standard"},{"categories":[""],"contents":"As we continue to connect people and biodiversity data globally by virtual means in these globally uncertain times, TDWG 2021 will be composed of symposia, workshops, contributed oral and poster presentations, demos, and discussions, as well as keynotes and social events. The Interest/Task Group working sessions will be held separately during the month following the virtual conference (November 2021). TDWG Symposia and TDWG Schedule\nOct 20, 8:30PM CET SYM18 Discovering known biodiversity: Digital accessible knowledge Session Type: Symposium (no unsolicited presentations considered)\nOrganizers: Donat Agosti, Plazi, Bern, Switzerland; Alexandros Ioannidis-Pantopikos, Zenodo — CERN, Meyrin, Switzerland\nScientific publishing is building up our knowledge by connecting facts using an intricate network of implicit and explicit citations. Taxonomic publications are exceptionally rich beginning with the explicit citation of publications, to implicit citations provided by taxonomic names, treatment citations, materials, actors, collections to a domain-specific vocabulary. In a sense, we are still living in an analog world because most of our knowledge is not digital or, if digital, is designed for human consumption at best and not as digital accessible knowledge (DAK). DAK are facts that are both human and machine readable and are open, findable, accessible, interoperable and reusable, proven as such by their reuse by external services such as the Global Biodiversity Information Facility (GBIF). A second aspect of DAK is that its citations are annotated with respective identifiers. This allows easy integration and connection with existing DAK.\nDiscovering known biodiversity is a challenge taken on by Plazi. This requires defined target resolution of data, sustainable infrastructures, workflows, reference vocabularies, resources, and strategies to discover and convert a rapidly growing corpus of a daunting circa 500 million publications. Current technical and persistent identifier developments will be highlighted.\nLectures\nAgosti D 2021. (Re)Discovering Known Biodiversity: Introduction. Biodiversity Information Science and Standards 5: e75491. doi: 10.3897/biss.5.75491 Guidoti M, Sokolowicz C, Simoes F, Gonçalves V, Ruschel T, Alvares DJ, Agosti D 2021. TreatmentBank: Plazi\u0026rsquo;s strategies and its implementation to most efficiently liberate data from scholarly publications. Biodiversity Information Science and Standards 5: e75690. doi: 10.3897/biss.5.75690 Miller JA, Agosti D, Guidoti M, Rivera Quiroz FA (2021) Linking and the Role of the Material Citation. Biodiversity Information Science and Standards 5: e75543. https://doi.org/10.3897/biss.5.75543 Simoes F, Agosti D, Guidoti M 2021. Delivering Fit-for-Use Data: Quality control. Biodiversity Information Science and Standards 5: e75432. doi: 10.3897/biss.5.75432 Ioannidis-Pantopikos A, Agosti D 2021. Biodiversity Literature Repository: Building the customized FAIR repository by using custom metadata. Biodiversity Information Science and Standards 5: e75147. doi: 10.3897/biss.5.75147 Gmür R, Agosti D 2021. Synospecies, an application to reflect changes in taxonomic names based on a triple store based on taxonomic data liberated from publication. Biodiversity Information Science and Standards 5: e75641. doi: 10.3897/biss.5.75641 Sokolowicz C, Guidoti M, Agosti D 2021. Discovering Known Biodiversity: Digital accessible knowledge — Getting the community involved. Biodiversity Information Science and Standards 5: e74369. doi: 10.3897/biss.5.74369 Oct 20, 3:30PM CET SYM04 Where and how to find, store and use links between biodiversity data: the BiCIKL perspective Lectures\nMeeus S, Addink W, Agosti D, Arvanitidis C, Dimitrova M, González-Aranda JM, Holetschek J, Islam S, Jeppesen TS, Mietchen D, Robertson T, Sanchez Cano FM, Trekels M, Groom Q (2021) Hacking Infrastructures Together: Towards better interoperability of infrastructures. Biodiversity Information Science and Standards 5: e74325. doi: 10.3897/biss.5.74325 Penev L, Koureas D, Groom Q, Lanfear J, Agosti D, Casino A, Miller J, Arvanitidis C, Cochrane G, Barov B, Hobern D, Banki O, Addink W, Köljalg U, Ruch P, Copas K, Mergen P, Güntsch A, Benichou L, Gonzalez Lopez JB 2021. Towards Interlinked FAIR Biodiversity Knowledge: The BiCIKL perspective. Biodiversity Information Science and Standards 5: e74233. doi: 10.3897/biss.5.74233 Oct 21, 11PM CET SYM11 Building collaborative resiliency through the Biodiversity Heritage Library Lecture\nAlvares DJ, Guidoti M, Simoes F, Sokolowicz C, Agosti D 2021. The BHL-Plazi Partnership: Getting data from the 1800s directly into 21st century, reused digital accessible knowledge. Biodiversity Information Science and Standards 5: e75604. doi: 10.3897/biss.5.75604 ","id":42,"permalink":"https://plazi.org/posts/tdwg-2021-virtual-annual-conference/","tags":["Events"],"title":"Connecting the world of biodiversity data"},{"categories":[null],"contents":"We report progress towards automatically transforming existing analyses of scientific literature into annotations based on W3C\u0026rsquo;s Web Annotation Data Model (WADM). Case studies are presented from the life sciences, and social sciences and humanities, in which these developments have led to the creation of new unrestricted data services for the research community. We discuss the cross- domain potential of annotation infrastructure for releasing scientific facts reported in research literature from copyright restrictions, and demonstrate the utility of common standards-based preservation and discovery methods in disparate activities. We suggest that scientific treatments of literature using WADM annotation can lead to new mechanisms for access to and reuse of research data, and accelerate convergence with the FAIR Principles.\nFull program\n","id":43,"permalink":"https://plazi.org/posts/digital-preservation-conference/","tags":["Events"],"title":"Progress with improving preservation and reuse of scientific research data"},{"categories":["news"],"contents":"\nThree years ago we set off on a journey, guided by a philosophy that scientific data trapped in scholarly articles should be free, powered by the creativity and hardwork of a small team scattered around the globe, and supported by Arcadia that decided to invest in our vision. Earlier this month we completed that journey, by all measures, a success. We liberated more data than we set out to liberate, we ended up with more partners than we started with, and we empowered more publications and publishers than three years ago. Data liberated by us is already being used by scientists to ask and answer new questions, to pursue new science – a true scientific data lifecycle in constant motion.\nWe are in danger of speaking with less humility than we should, but we are justifiably proud that not only did we do everything we aimed and claimed to do, all of it is visible. While there may be occasional dead links that we will endeavor to fix, there are no smoke and mirrors, there is no empty promise. With Arcadia\u0026rsquo;s support, we liberated almost 400,000 taxonomic treatments, and every one of them is available at Zenodo and GBIF and TreatmentBank and Biolit Repo and Zenodeo and Synospecies and Ocellus, for anyone to use, anywhere in the world, for any purpose they wish.\n","id":44,"permalink":"https://plazi.org/posts/hindsight-is-20x20000/","tags":["news"],"title":"Hindsight is 20x20000"},{"categories":[""],"contents":"Digital Accessible Knowledge, in the context of Plazi, describes data fit for use to understand the global biodiversity, to create a list of all known species on planet Earth, data that is machine-actionable and ready for use by anybody, anyhwere at any time.\n","id":45,"permalink":"https://plazi.org/about/digital-accessible-knowledge/","tags":["About","DAK"],"title":"Digital Accessible Knowledge"},{"categories":[""],"contents":"At the Naturhistorisches Museum in Bern.\n","id":46,"permalink":"https://plazi.org/posts/launch-ebiodiv/","tags":["Events"],"title":"eBiodiv Launch"},{"categories":["news"],"contents":"\nBHL and Plazi have agreed to collaborate to create a workflow from existing literature to liberate digital accessible data to its reuse in GBIF. This includes the following steps:\nFind, digitize, and provide access to biodiversity-related documents with basic metadata and scanned images for each and every page Text recognition that allows efficient conversion and processing downstream Provide access at publication level, including standard, citable URLs from the scholarly communication ecosystem (e.g. CrossRef DOIs, DataCite DOIs) Conversion into generic JATS XML or TEI including figures into PDFs with included text and stable URLs Liberate data from these biodiversity-related documents; identify, semantically enhance, and link with external resources Names entities Constituent text sections Figures Tables Bibliographic references Create and publish FAIR data on Zenodo, or other repositories with the same capabilities, particularly those providing access enabling reuse Interact with targeted reuser of data for data and visualization improvements (eg. GBIF, openBioDiv) More about the partnership\n","id":47,"permalink":"https://plazi.org/posts/bhl-and-plazi-partnership/","tags":["news"],"title":"BHL and Plazi partnership"},{"categories":[""],"contents":"Treatment Statistics and Article Statistics Introduction\nThis is an analysis and mining tool for the data contained in TreatmentBank. Access is provided at article and treatment level.\nSelect fields, explore and download Plazi treatment and article data and statistics in multiple formats. Selected fields turn from gray to green. Data on all materials (cited specimens) are available through the “Materials Citations Data” domain; these data are summarized by treatment in the “Materials Data” domain. Select the relevant operation (e.g., show individual values, count distinct values, count all values, minimum value, maximum value) for each selected field. The “Get Statistics” button runs the selected query. You can save the results using one of the links indicating desired format. It may be useful to choose an appropriate file name and add an extension to the file (e.g., .csv for comma separated values).\nTreatment Statistics\nArticle Statistics\nRefindit ReFindit provides an easy search function, based on a simple interface, which also collates and sorts the results from the search engines for presentation to the user to read and with the option to refine the results presented or submit a new search.\nRefBank RefBank is a website where you can search for bibliographic references. No registration is required (everyone can upload and edit). Results can be converted in different styles and formats. It serves as the “dirty bucket” of references and source for “clean buckets” such as the references in the Biodiversity Literature Repository.\n","id":48,"permalink":"https://plazi.org/treatmentbank/data-and-statistics/","tags":["Treatment Bank"],"title":"Data and Statistics"},{"categories":[""],"contents":"GoldenGATE editor versions The GoldenGATE Document Editor is a visual editor for marking up documents in XML. It is designed to do most of the markup automatically; manual work is reduced to correcting the output of automated components. For these corrections, there are many specialized dialogs and document views, which display the required information in a concise fashion and provide high-level assistance to the user. In addition, the editor provides assistance for editing and marking up documents manually. A flexible, plug-in-based software architecture allows for quickly integrating new components, and for deploying upgrades of existing ones. This is not restricted to components for automated markup and document views, but also comprises handling of different data formats, and different types of data storage, e.g. the local file system, databases, and web-based data providers.\nThe automated markup for taxonomic documents includes finding taxonomic names, figuring out their genus, species, etc., and obtaining an LSID for them. In addition, there are functions for marking up taxonomic treatments and their inner structure, i.e. which part of the treatment provides a morphological description of a taxon, which one lists materials examined, etc. A parser for extracting individual collecting events from the latter is under development.\nOnline documentation material is available at the community portal which also offers support.\nStandard version This is a stable version that uses as input html documents. Minor uploads are updated at the launch of the program.\nDownload Manual Imagine (PDF based) version This is an alpha version. In case of updates, the user will be asked to update the program if access to the Internet is allowed in the start phase.\nThe main goal of the Imagine version is to provide a markup tool for digital born PDFs and thus to cut short the OCR- or text conversion process, as well as to allow incremental markup and upload to SRS.\nIf you are interested in using this new version, please contact Plazi Info.\nDownload Manual GoldenGATE Web services Asynchronous web service The asynchronous webservice allows parsing bibliographic references (Bib Ref Parser), date tagger, geo-coordinate tagger, materials citation extractor, quantity tagger, taxon name tagger, and TaxPub materials ctiation extractor.\nSynchronous web service The synchronous webservice tests the synchronous version of GoldenGATE Web Services, with calls consisting of a simple request and response, and no user interactivity possible.\n","id":49,"permalink":"https://plazi.org/treatmentbank/desktop-data-mining/","tags":["Treatment Bank"],"title":"Desktop data mining and extraction"},{"categories":[""],"contents":"Golden Gate is a family of software including a server, an editor, a web-based editor, and web-services. Golden Gate uses the Image Markup File (IMF) format for data storage, archival and exchange.\n","id":50,"permalink":"https://plazi.org/data-apis-tools/golden-gate/","tags":["Golden Gate","Data, API and Tools"],"title":"Golden Gate"},{"categories":[""],"contents":"\nImage Markup File (IMF) is a file format used to store and exchange annotations made to a PDF reliably. IMF is an open source format. The file is based on a star schema with a series of CSV, PNG and the source file enclosed in a .ZIP file.\nIMF can be created, opened with the open source editor GoldenGate, from where the parts can be exported as XML, or Darwin Core Archives or uploaded to TreatmentBank.\nIMF is used by Plazi to mine and annotate documents in PDF format.\n","id":51,"permalink":"https://plazi.org/data-apis-tools/image-markup-file/","tags":["Golden Gate","Data, API and Tools"],"title":"Image Markup File (IMF)"},{"categories":[""],"contents":" Ocellus is a frontend to the Biodiversity Literature Repository (BLR) community with images on Zenodo, taxonomic treatments in TreatmentBank, and citations on RefBank. Ocellus depends on Zenodeo, a nodejs API that queries, analyzes and aggregates results from these various repositories via a single, unified interface.\n","id":52,"permalink":"https://plazi.org/data-apis-tools/ocellus/","tags":["Data, API and Tools"],"title":"Ocellus"},{"categories":[""],"contents":"TaxPub TaxPub is an extension to the U.S. National Library of Medicine/National Center for Biotechnology Information Journal Article XML Document Type Definition (DTD) providing domain-specific markup for taxonomic information in articles published in the area of biological systematics. TaxPub is described in detall in the article TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions. See also TaxPub Documentation.\nCurrently, the following journals use TaxPub:\nBiodiversity Data Journal Cytogenetics Deutsche Entomologische Zeitschrift International Journal of Myriapodology Journal of Hymenoptera Research Mycokeys Nature Conservation Neobiota Nota Lepidopterologica Phytokeys Subterranean Biology Zookeys Zoosystematics and Evolution TaxonX Taxonx is a XML schema for encoding taxonomic literature in order to\nCreate open, stable, persistent, full-text digital surrogates of taxonomic treatments Identify taxonomic treatments and their major structural components to enable networked reference and citation Identify lower level textual data such as scientific names, localities, morphological characters, and bibliographic citations to facilitate their extraction by and integration with external applications and resources Study and describe the structure of systematics publications by creating few typical corpora of literature, such as entire journal (eg AMNH Novitates), across taxa (e.g all ant systematics papers post 1995), or faunistic (e.g. all ant systematics paper covering Madagascar ranging from 1758 to 2006) TaxonX is a lightweight and flexible schema which can be quickly learned and applied to a wide variety of formatting present in legacy document as well as in new publications. It permits, and sometimes relies on (see use of MODS for file-level bibliographical metadata) external schemata. It has loose content requirements that allow for instances to be encoded over time and at many levels of granularity while maintaining validity through iterations. Additionally, TaxonX contains mechanisms for semantic normalization of the data contained in treatments.\nTreatment Ontology The Github repo for the ontologies used in representing data from taxonomic treatments in RDF.\n","id":53,"permalink":"https://plazi.org/treatmentbank/schemas-and-ontologies/","tags":["Treatment Bank"],"title":"Schemas and Ontologies"},{"categories":[""],"contents":" TreatmentBank GoldenGATE and related software TaxPub Ocellus Zenodeo Synospecies Synolib ","id":54,"permalink":"https://plazi.org/source-code/","tags":["source code","repos"],"title":"Source Code"},{"categories":[""],"contents":"SynoSpecies is a tool developed by FactsMission AG to leverage the RDF data provided by Plazi. The RDF data of all treatments is stored in an AllegroGraph triple store allowing SPARQL queries over the data. Synospecies allows manually writing and submitting such queries in the advanced mode and send such queries in the background when using the easier interface.\n","id":55,"permalink":"https://plazi.org/data-apis-tools/synospecies/","tags":["Data, API and Tools"],"title":"Synospecies"},{"categories":[""],"contents":"\nThe Plazi TreatmentBank deals with scientific, published, biosystematic literature documenting and describing all the world’s ca 1.9 Million known species in an estimated corpus of more than 500 Million published pages. The cited publications in Plazi are all available at the Biodiversity Literature Repository at Zenodo/CERN.\nTreatments are well-defined parts of articles that describe the particular usage of a scientific name by an author at the time of the publication. In other words, each scientific name has one or more treatments, depending on whether there exists only an original description of a species, or there are subsequent re-descriptions. Similar to bibliographic references, treatments can be cited, and subsequent usages of names cite earlier treatments.\nTreatments are a synthesis of the knowledge of a given species at a given time. They can be very rich in data, explicitly or implicitly, detailed or summarized, and include many references to external data sources, such as scientific names, collection codes, or DNA-codes.\nThe data can be semantically enhanced, and linked. But treatments, as parts of publication, first need be identified and extracted. Most recently, treatments are tagged in electronic publications with the National Library of Medicine’s Journal Article Tag Suites (JATS) TaxPub extension which allows for their automatic extraction. Still, the majority of the ca. 2000 journals and books publishing treatments use the PDF format at best. Plazi has the tools to extract treatments, enhance the embedded data and import it into TreatmentBank for public access where they may be viewed as HTML, XML, RDF, or harvested with the protocols provided below. The data are also provided for harvesting as Darwin Core-Archives.\nFurther information Catapano T 2010. TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions. Proceedings of the Journal Article Tag Suite Conference 2010 DOI: 10.5281/zenodo.3484285 Article\nTechnical definition\n","id":56,"permalink":"https://plazi.org/treatmentbank/what-treatment/","tags":["Treatment Bank"],"title":"What is a Treatment?"},{"categories":[""],"contents":"Zenodeo is a nodejs-based REST API to CERN\u0026rsquo;s Zenodo.\n","id":57,"permalink":"https://plazi.org/data-apis-tools/zenodeo/","tags":["Data, API and Tools"],"title":"Zenodeo"},{"categories":[""],"contents":"For questions, suggestions, contributions, please contact us via email.\n","id":58,"permalink":"https://plazi.org/contact/","tags":[""],"title":"Contact"},{"categories":[""],"contents":" Dashboard Here Taxonomy Distribution Map Specimens Downloads Version History Kronestedt, Torbjörn \u0026amp; Marusik, Yuri M., 2011, Studies on species of Holarctic Pardosa groups (Araneae, Lycosidae). VII. The Pardosa tesquorum group, Zootaxa 3131, pp. 1-34: 25-28\nscientific name Pardosa zyuzini status sp. nov. publication ID http://doi.org/10.5281/zenodo.399649 persistent identifier http://treatment.plazi.org/id/730087F2-1E00-FF81-FF61-FC34FDA561A5 treatment provided by Plazi (2016-04-12 02:05:59, last updated by Leidenlab19 2019-12-06 14:06:18) Treatment Pardosa zyuzini sp. nov.\nFigs 7 –8, 22–23, 28, 31, 94– 106, 116\nPardosa paratesquorum (misidentification, in part): Schenkel 1963: 360, fig. 208 b (♀, not ♂). Pardosa paratesquorum (misidentification): Logunov \u0026amp; Marusik 1995: 115; Marusik et al. 1996: 35 –36; Logunov et al. 1998: 139; Marusik \u0026amp; Logunov 1999: 247.\nPardosa cf. paratesquorum: Marusik et al. 2000: 84; Marusik \u0026amp; Buchar 2003: 157; Logunov \u0026amp; Marusik 2004: 63. Pardosa sp. 2: Marusik \u0026amp; Logunov 2009: 151.\nType material. Holotype ♂ and allotype ♀ from MONGOLIA, Övörkhangai Aimag, Zuunbayan-Ulaan Somon, Zamtyn Davaa (46 º 43 ’N 102 º 51 ’E), 2000 m, 14–18 June 1997 (Y.M. Marusik) in ZMMU. – Paratypes. MONGO- LIA. Övörkhangai Aimag: same data as holotype ( CAS, ISEA, IZAS, NHRS, ZMMU), 110 3 44 ♀. Bayankhongor Aimag: Gurvanbulag Somon, Lake Khokh-Nuur (47 o 32 ’N 98 o 32 ’E), 2600 m, 7–10 June 1997 (Y.M., IBPN), 103. Assonge, Tola (Tuul) River , 1909 (du Chazaud, MNHN), 1 ♀. Arkhangai Aimag: Uu-bulan, Saikhany saravi, 24 June 1976 (Tsug Enkhtuyaa, IBPN), 13 1 ♀. – RUSSIA. Altai: 8 km S of Chagan-Uzun Village (50 º04’N, 88 º 24 ’E), 1800m, grassy bank of Chuya River , 13 June 2009 (A.A. Fomichev, ISEA), 2 ♀; 2 km SE of Kosh-Agach , 27 June 1996 (A. \u0026amp; R. Dudko, ISEA), 13; 70–75 km W of Kosh-Agach, 40–45 km W of Bel’tir, Taltura (Chagan-Uzun) River canyon, 2300–2500 m, mountain stony steppe, 26–28 June 1999 (V.V. Glupov, ISEA), 23 1 ♀; Kosh-Agach Village (50 º01’N, 88 º 38 ’E), 1800m, saline swamps, 13 July 2009 (A.A. Fomichev, ISEA), 13 2 ♀. Tuva: Mongun- Taiga Distr., 12 km downstream from Mugur-Aksy by Kargy River , 1800 m, river bank, 14 June 1989 (D.L., ISEA: SZM 001.1505), 23 1 ♀; SE part of Kyzyl , steppe, 22–24 July 1996 (Y.M., IBPN), 33 2 ♀; Ovyur Distr, pass between Sagly and Onachy rivers, 2200 m, ca 20–25 km W of Sagly Village, wet habitats, 13 June 1989 (D.L., ISEA: SZM 001.1506), 23; Ulug-Khem Dist., 6–7 km E of Choduraa, Chulaanych site, near creek, 10 May 1990 (D.L., ISEA: SZM 001.1514), 143; Tere-Khol’ Lake, Sharlaa stand and around (50 o 1.47 ’N 95 o 3.45 ’E), 1050 m, 6– 14 July 1996 (Y.M., ISEA), 193 6 ♀; 30–35 km W of Erzin, Shara-Nur Lake (50 ° 12 ’N, 94 ° 32 ’E), 900 m, 8 June 1995 (Y.M., ISEA: SZM 001.1512), 53 7 ♀; Erzin Distr., 20 km NW of Erzin Village, Dus-Khol’ Lake, Tes-Khem River , 800 m, 31 May 1989 (D.L., ISEA: SZM 0 0 1.1515 \u0026amp; 001.1517), 393 14 ♀; ~ 20 km WNW of Erzin, Dus- Khol’ Lake shore (50 ° 19 ’N, 95 °01’E), among and under stones, 1050 m, 10 June 1992 (D.L., ISEA: SZM 001.1511), 1 ♀; Sangelen Mt. Range, nr Moren Village (50 o 20.53 ’N 95 o 22.92 ’E), 1150 m, pitfall traps in steppe, 14–18 July 1996 (D.O., IBPN), 1303 40 ♀; Sangelen Mt. Range, middle flow of Dzhen-Aryk Creek (50 o 24.31 ’N 95 o 26.28 ’E), 1450 m, pitfall traps, 14–18 July 1996 (Y.M. \u0026amp; D.O., IBPN), 93 1 ♀. 3–5 km S of Erzin Village, Tes- Khem River valley , birch-willow- Caragana forest, 1100 m, 14 August 1989 (D.L., ISEA: SZM 001.1507), 2 ♀; Erzin Distr., 3–5 km S of Erzin Village, Tes-Khem River valley 1100m, dried up bog, near water, 14–15 August 1989 (D.L., ISEA: SZM 001.1513), 16 ♀; Tes-Khem River valley (50 ° 19 ’N, 95 °01’E), 10.06. 1995 (Y.M., ISEA 001.1510), 33 2 ♀; Khol’-Oozhu River valley (50 ° 41 ’N, 95 ° 13 ’E), 16.06. 1995, (Y.M., ISEA: SZM 001.1508), 33. Chita Area: Kyra Dist., ca 3 km E of Kyra Village, Kyra River valley , wet meadow, 850 m, 30 May 1991 (D.L., ISEA: SZM 001.1509), 53. Additional paratypes from Russia (sub Pardosa paratesquorum) are mentioned in the papers referred to above.\nEtymology. The specific name is a patronym in honor of our colleague and friend Alexey A. Zyuzin (Almaty) in recognition of his contribution to the knowledge of Palearctic wolf spiders.\nRemark. The species described by Schenkel (1963) as Pardosa paratesquorum was based on a few males and a single female. The material originated from China (Gansu) and Mongolia. Schenkel (1963) explicitly selected a male as the type (= holotype), regrettably without locality information, and expressed doubts as to whether the female (from Mongolia) was conspecific with the described male. From the fresh material now available to us, and after examining a part of the original material of P. paratesquorum (13, 1♀), it is evident that the female is not conspecific with the male but belongs to P. z y u z i n i sp. nov. as described here.\nDiagnosis. Males can be distinguished from other members of the tesquorum group by long hairs on metatarsus and tarsus I ( Fig. 106). In addition, males are distinguished by the widened embolus abruptly narrowing in apical part before truncate apex ( Figs 98, 102), as well as the shape of the conductor and terminal apophysis ( Figs 95, 96, 102- 104). Females can be recognised by the amphoral shape of the epigynal septum, which fills out the epigynal cavities (cf. P. tesquorumoides) ( Figs 22 –23, 28).\nDescription. Male (holotype). Total length 5.4. Carapace 2.85 long, 2.05 wide.\nProsoma. Carapace ( Fig. 7) blackish-brown with yellowish narrow median band in thoracic part and yellowish unbroken lateral bands, latter often darkened and hardly traceable. Thoracic part with recumbent black pubescence, in median band in addition with whitish hairs. Clypeus yellowish, at least in part (more or less sooty). Chelicerae yellow, more or less sooty, with sooty longitudinal veins, retromargin with 2 teeth. Sternum sooty brown with narrow lighter streak in front (may be absent).\nEyes. Width of row I 43 (slightly procurved when seen from in front), row II 63, row III 85, row II –III 61. Diameter of AME 10, ALE 8, PME 23, PLE 18. Distance between AMEs 8, between AME and ALE 2.\nOpisthosoma. Dorsum ( Fig. 7) dark greyish-brown with yellowish lanceolate spot followed rearwards by a series of yellowish spots in pairs (often obscured), each pair sometimes joined to a transverse bar, each spot with a dark dot in the middle (pattern darkened and obscured in presumably older males). Venter yellowish to dark greyish with light recumbent pubescence and scattered, more erect dark hairs.\nLegs ( Table 1). Yellowish. Femora, except distally, more or less sooty, Fe III and IV sometimes with pseudoannulation. Patellae and tibiae sometimes with faint darker longitudinal streaks or blotches dorsally. Mt+Ta I with numerous thin, long, dark, erect hairs ( Figs 99, 106), notably laterally (fewer of these hairs also present in Mt+Ta II). Ti I with one retrolateral spine in distal half.\nPalp ( Figs 94–98, 102– 105): Pt 0.50, Ti 0.45, Cy 1.05. Palp dark brownish-grey, more or less suffused with black and with dark pubescence, patella dorsally largely yellowish brown; cymbium dark in proximal part, lighter distally ( Fig. 97). Tegular apophysis stout, rugose, curved retrolaterad, with small but distinct hooked process basally ( Figs 94, 105). Conductor prominent, terminating in a sclerotized process slightly bent forwards ( Figs 95– 96, 102– 103). Terminal apophysis directed obliquely ventrad, continuing into a sclerite that surrounds the conductor and extends backwards, ending in a sclerotized, triangular basal paleal process ( Figs 95–96, 102– 103). Embolus laminar, grooved, ventral edge turned forward; widening in distal half, then abruptly narrowing, tip truncated ( Figs 95, 98, 102– 104).\nFemale (allotype). Total length 5.3. Carapace 2.70 long, 1.95 wide.\nProsoma and opisthosoma ( Fig. 8). Lighter than in male. Carapace brown with bright yellow median band distinctly widening in postocular area. Lateral bands bright yellow, jagged. Clypeus and chelicerae bright yellow, latter with thin brownish streaks. Abdomen with more contrasting pattern than in male. Palp yellow with darker blotches.\nEyes. Width of row I 41 (slightly procurved when seen from in front), row II 58, row III 79, row II –III 56. Diameter of AME 9, ALE 8, PME 20, PLE 16. Distance between AMEs 6, between AME and ALE 2.\nLegs ( Table 1). Yellow with dark streaks and blotches dorsally (pseudoannulation-like in femora).\nEpigyne ( Figs 22 –23, 28, 31, 100 – 101). Comparatively narrow, with two separated anterior pockets and with amphora-like septum covering epigyneal cavities. Receptacles long, more or less parallel, with spermathecae ovoid ( Figs 31, 101).\nSize variation. Carapace length: males 2.50–2.95 (n= 10), females 2.50–2.90 (n= 10).\nHabitat. In Mongolia the species has been collected in lake shores (lowlands below 1100 m), pebbly lake shores (highland 2500 m), pebbly river banks and adjacent overgrazed pasture, overgrazed swampy meadows, within stones in dry river beds, pitfall traps in forest opening ( Marusik \u0026amp; Logunov 1999).\nDistribution ( Fig. 116). Mongolia and Russia (Siberia: Altai, Tuva, Chita Area). This species may occur in China (Xinjiang), which borders Altai.\nReferences Logunov, D. V. \u0026amp; Marusik, Y. M. (1995) Spiders of the family Lycosidae (Aranei) from the Sokhondo Reserve (Chita area, east Siberia). Beitrage zur Araneologie, 4 [1994], 109 - 122.\nLogunov, D. V., Marusik, Y. M. \u0026amp; Koponen, S. (1998) A check-list of the spiders in Tuva, South Siberia with analysis of their habitat distribution. Berichte des naturwissenschaftlich-medizinischen Vereins in Innsbruck, 85, 125 - 159.\nLogunov, D. V. \u0026amp; Marusik, Y. M. (2004) [Order Araneae - spiders.] In: Dubatolov, V. V. et al. Biodiversity of the Sokhondo Nature Reserve. Arthropoda. Novosibirsk-Chita, pp. 41 - 80 (In Russian).\nMarusik, Y. M., Hippa, H. \u0026amp; Koponen, S. (1996) Spiders (Araneae) from the Altai area, southern Siberia. Acta zoologica fennica, 201, 11 - 45.\nMarusik, Y. M. \u0026amp; Logunov, D. V. (1999) On the spiders (Aranei) collected in central Mongolia during a joint American-Mongolian-Russian expedition in 1997. Arthropoda Selecta, 7 [1998], 233 - 254.\nMarusik, Y. M., Logunov, D. V. \u0026amp; Koponen, S. (2000) Spiders of Tuva, South Siberia. Institute for Biological Problems of the North, Russian Academy of Sciences Far East Branch, Magadan, 252 pp.\nMarusik, Y. M. \u0026amp; Buchar, J. (2003) A survey of the East Palaearctic Lycosidae (Aranei). 3. On the wolf spiders collected in Mongolia by Z. Kaszab in 1966 - 1968. Arthropoda Selecta, 12, 149 - 158.\nMarusik, Y. M. \u0026amp; Logunov, D. V. (2009) New faunistic records of spiders collected from the mountain Altai (Arachnida: Aranei). Arthropoda Selecta, 18, 145 - 152.\nSchenkel, E. (1963) Ostasiatische Spinnen aus dem Museum d\u0026rsquo;Histoire naturelle de Paris. Memoirs de la Museum national d\u0026rsquo;Histoire naturelle, Paris, (N. S.) (A, Zool.) 25 (1), 1 - 288, (2), 289 - 494.\nFigures FIGURE 116. Distribution of Pardosa eskovi sp. nov. (), P. mulaiki (), P. paratesquorum (), P. tesquorumoides () and P. z y u z i n i sp. nov. (). One symbol may refer to more than one collecting locality. Arrows indicate type localities (if known). For P. eskovi sp. nov. and P. z y u z i n i sp. nov. known localities are shown. For P. tesquorumoides, P. mulaiki only selected literature records are shown, and for P. paratesquorum only localities from which we examined material are shown. FIGURES 1 – 8. Habitus, dorsal view. 1, Pardosa eskovi sp. nov. Ƥ from Yakutia: Suntar. 2 – 3, P. mulaiki Gertsch 3 (2) Ƥ (3), both from Saskatchewan: Rosetown. 4, P. paratesquorum Schenkel 3 (paratype of P. daqingshanica Tang, Urita \u0026amp; Song) from Inner Mongolia: Mt Daqing Shan. 5 – 6, P. tesquorumoides Song \u0026amp; Yu 3 (5) Ƥ (6), both from Sichuan: Hongyuan Co. 7 – 8, P. zyuzini sp. nov. 3 (7) Ƥ (8), both from Tuva: Lake Tere Khol’. Scale line (applies to all) 1 mm. FIGURES 19 – 24. Epigynes in ventral view. 19, Pardosa eskovi sp. nov. (from Yakutia: Suntar). 20, P. m u l a i k i Gertsch (from Saskatchewan). 21, P. tesquorumoides Song \u0026amp; Yu (from Sichuan: Hongyuan Co.). 22 – 23, P. z y u z i n i sp. nov. (from Tuva: Lake Tere-Khol’). 24, P. paratesquorum Schenkel (from Shanxi: Mt Huo Shan). Scale lines 0.1 mm. FIGURES 25 – 28. Epigynes in ventral view. 25, Pardosa eskovi sp. nov. (Yakutia: Suntar). 26, P. mulaiki Gertsch (from Saskatchewan: Hanley). 27, P. tesquorumoides Song \u0026amp; Yu (from Sichuan: Hongyuan Co.). 28, P. z y u z i n i sp. nov. (from type locality). Scale line (applies to all) 300 Μm. FIGURES 29 – 31. Epigynes in dorsal view. 29, Pardosa eskovi sp. nov. (from Yakutia: Suntar). 30, P. mulaiki Gertsch (from Saskatchewan: Hanley). 31, P. z y u z i n i sp. nov. (from type locality). cd, copulatory duct; sp, spermatheca. Scale line (applies to all) 300 Μm. FIGURES 94 – 101. Pardosa zyuzini sp. nov. 94, left bulbus, ventral view. 95 – 96, left terminal part of bulbus in ventral (95) and retrolateral (96) view. 97, left male palp (patella, tibia and cymbium), dorsal view. 98, embolus of left palp, frontal view. 99, male metatarsus I. 100 – 101, epigyne in ventral (100) and dorsal (101) view. Scale lines 0.1 mm (94 – 98, 100 – 102), 0.5 mm (99). FIGURES 102 – 106. Pardosa zyuzini sp. nov., male (from type locality). 99, terminal part of left bulbus in ventral (102), retrolateral (103) and ventro-frontal (104) view. 105, left tegulum with tegular apophysis in ventral view. 106, tarsus and metatarsus of first leg in dorsal view. ba. pr, basal process of palea; cond, conductor; emb, embolus; pal, palea; stg, subtegulum; tg, tegulum; tg. ap, tegular apophysis; tl. ap, terminal apophysis of palea. Scale lines 300 Μm (102 – 105), 1000 Μm (106). Tables Abbreviations ZMMU Zoological Museum, Moscow Lomonosov State University CAS California Academy of Sciences IZAS Institut Zoologii Akademii Nauk Ukraini - Institute of Zoology of the Academy of Sciences of Ukraine NHRS Swedish Museum of Natural History, Entomology Collections MNHN Museum National d\u0026rsquo;Histoire Naturelle Copyright notice No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation.\n","id":59,"permalink":"https://plazi.org/treatment-example/","tags":[""],"title":"Pardosa zyuzini, Kronestedt, Torbjörn \u0026 Marusik, Yuri M., 2011"},{"categories":[""],"contents":"Biodiversity Literature Repository (BLR) is a Zenodo-community to share publications related to bio-systematics. It has been initiated by Plazi, Pensoft and Zenodo at CERN.Its goal is to provide open access to publications cited in publications or in combination with scientific names and to provide in collaboration with Zenodo a digital object identifier (DOI) to the publications to enable citation of the publications, including direct access to its digital representation. Its focus is on thematic corpora, such as particular taxonomic groups, eg. ants, projects or research journals. Illustrations or subarticle elements can be uploaded. Plazi provides a service to convert back issues into semantically enhanced documents and to disseminate taxonomic treatments through TreatmentBank. BLR offers journals to obtain DOIs for their new issues. Currently, the following journals joined the BLR community and use Zenodo DataCite DOIs:\nHalteres Journal of the Ocean Science Foundation Mitteilungen der Schweizerischen Entomologischen Gesellschaft Odonatologica Revue Suisse de Zoologie Revue de Paléobiologie ","id":60,"permalink":"https://plazi.org/data-apis-tools/biolit-repo/","tags":["Data, API and Tools"],"title":"Biodiversity Literature Repository"},{"categories":[""],"contents":"Biowikifarm is a shared technical platform supporting a number of mediawiki installations used by a large number projects in biological research for open content publishing. The primary purpose of the shared platform is to be able to maintain the published data and information in a long term sustainable way, to work more efficiently and distribute administrative and maintenance work among several partners. Furthermore, the biowikifarm operates a shared media repository, enabling synergies in re-using media content.\n","id":61,"permalink":"https://plazi.org/data-apis-tools/biowikifarm/","tags":["Data, API and Tools"],"title":"BioWikiFarm"},{"categories":[""],"contents":"The website plazi.org is provided by:\nDonat Agosti\nSpecialist for taxonomic literature\nPresident Plazi\nDirector Plazi GmbH\nEmail: [email protected]\nLimitation of liability for internal content\nThe content of our website has been compiled with meticulous care and to the best of our knowledge. However, we cannot assume any liability for the up-to-dateness, completeness or accuracy of any of the pages.\nLimitation of liability for external links\nOur website contains links to the websites of third parties (“external links”). As the content of these websites is not under our control, we cannot assume any liability for such external content. In all cases, the provider of information of the linked websites is liable for the content and accuracy of the information provided. At the point in time when the links were placed, no infringements of the law were recognisable to us. As soon as an infringement of the law becomes known to us, we will immediately remove the link in question.\nCopyright\nAs per the copyright laws of Switzerland, copyright in the content on this website is held by Plazi, and made available to the public under a CC0 Public Domain Dedication. You are not required to give us attribution for using any content published herein, but it would be nice if you do.\n","id":62,"permalink":"https://plazi.org/disclaimer/","tags":[""],"title":"Disclaimer"},{"categories":[""],"contents":"Plazi offers RSS feeds for extracted treatments and updates via Twitter\nNotification RSS-feed (html) RSS-feed (XML) Twitter Plazi Treatment Repo Plazi ","id":63,"permalink":"https://plazi.org/resources/feeds/","tags":[""],"title":"Feeds"},{"categories":[""],"contents":"This privacy policy sets out how Plazi uses and protects any information that you give Plazi” when you use this website.\nPlazi is committed to ensuring that your privacy is protected. Should we ask you to provide certain information by which you can be identified when using this website, then you can be assured that it will only be used in accordance with this privacy statement.\nPlazi may change this policy from time to time by updating this page. You should check this page from time to time to ensure that you are happy with any changes.\nWhat we do with the information we gather: We require this information to understand your needs and provide you with a better service, and in particular for the following reasons:\nInternal record keeping. We may use the information to improve our services. We may periodically send information emails Security We are committed to ensuring that your information is secure. In order to prevent unauthorised access or disclosure we have put in place suitable physical, electronic and managerial procedures to safeguard and secure the information we collect online.\nHow we use cookies: A cookie is a small file which asks permission to be placed on your computer. Once you agree, the file is added and the cookie helps lets you know when you visit a particular site. Cookies allow web applications to respond to you as an individual. The web application can tailor its operations to your needs, likes and dislikes by gathering and remembering information about your preferences.\nOverall, cookies help us provide you with a better website, by enabling us to monitor which pages you find useful and which you do not. A cookie in no way gives us access to your computer or any information about you, other than the data you choose to share with us.\nYou can choose to accept or decline cookies. Most web browsers automatically accept cookies, but you can usually modify your browser setting to decline cookies if you prefer. This may prevent you from taking full advantage of the website.\nLinks to other websites: Our website may contain links to other websites of interest. However, once you have used these links to leave our site, you should note that we do not have any control over that other website. Therefore, we cannot be responsible for the protection and privacy of any information which you provide whilst visiting such sites and such sites are not governed by this privacy statement. You should exercise caution and look at the privacy statement applicable to the website in question.\n","id":64,"permalink":"https://plazi.org/privacy/","tags":[""],"title":"Privacy"},{"categories":[""],"contents":"RefBank is a website where you can search for bibliographic references. No registration is required (everyone can upload and edit). Results can be converted in different styles and formats. It serves as the “dirty bucket” of references and source for “clean buckets” such as the references in the Biodiversity Literature Repository\n","id":65,"permalink":"https://plazi.org/data-apis-tools/refbank/","tags":["Data, API and Tools"],"title":"RefBank"},{"categories":[""],"contents":"ReFindit provides an easy search function, based on a simple interface, which also collates and sorts the results from the search engines for presentation to the user to read and with the option to refine the results presented or submit a new search.\n","id":66,"permalink":"https://plazi.org/data-apis-tools/refindit/","tags":["Data, API and Tools"],"title":"ReFindit"},{"categories":[""],"contents":"TaxPub TaxPub is an extension to the U.S. National Library of Medicine/National Center for Biotechnology Information Journal Article XML Document Type Definition (DTD) providing domain-specific markup for taxonomic information in articles published in the area of biological systematics. more\nTaxonX Taxonx is a XML schema for encoding legacy taxonomic literature in order to identify taxonomic treatments and lover level textual data such as scientific names, localities, morphological characters, and bibliographic citations. more\nTreatment Ontology Ontologies for use in representing data from taxonomic treatments in RDF. more\n","id":67,"permalink":"https://plazi.org/resources/schema-and-ontologies/","tags":[""],"title":"Schemas and Ontologies"},{"categories":[""],"contents":"The Bibliography of Life (BoL) is a contribution towards building a reference bibliography for biodiversity with direct citable, open access to digital object of bibliographic references.\nBiodiversity Literature Repository Biodiversity Literature Repository (BLR) is a Zenodo-community to share publications related to bio-systematics. It has been initiated by Plazi, Pensoft and Zenodo at CERN.Its goal is to provide open access to publications cited in publications or in combination with scientific names and to provide in collaboration with Zenodo a digital object identifier (DOI) to the publications to enable citation of the publications, including direct access to its digital representation. Its focus is on thematic corpora, such as particular taxonomic groups, eg. ants, projects or research journals. Illustrations or subarticle elements can be uploaded. Plazi provides a service to convert back issues into semantically enhanced documents and to disseminate taxonomic treatments through TreatmentBank. BLR offers journals to obtain DOIs for their new issues. Currently, the following journals joined the BLR community and use Zenodo DataCite DOIs:\nHalteres Journal of the Ocean Science Foundation Mitteilungen der Schweizerischen Entomologischen Gesellschaft Odonatologica Revue Suisse de Zoologie Revue de Paléobiologie RefBank RefBank is a website where you can search for bibliographic references. No registration is required (everyone can upload and edit). Results can be converted in different styles and formats. It serves as the “dirty bucket” of references and source for “clean buckets” such as the references in the Biodiversity Literature Repository\nRefindit provides an easy search function, based on a simple interface, which also collates and sorts the results from the search engines for presentation to the user to read and with the option to refine the results presented or submit a new search. ","id":68,"permalink":"https://plazi.org/resources/bol/","tags":[null],"title":"The Bibliography of Life"},{"categories":[""],"contents":" Access to Plazi Treatments What is a treatment? What is a DarwinCore Archive? The Darwin Core Archive used by Plazi Treatment Data representation in Plazi Plazi API Obtaining a list of all the treatments available from Plazi HTTP GET - RSS HTTP GET - ZIP Archive HTTP GET - Page displaying Treatment HTTP GET - generic XML HTTP GET - TaxonX List of Plazi\u0026rsquo;s available DwC-Archives from GBIF API HTTP GET - JSON Appendix: Darwin Core Archive Content Further reading Downloads Support and Questions Version Access to Plazi Treatments What is a treatment? The Plazi TreatmentBank deals with scientific, published, biosystematic literature. It is the literature documenting and describing all the world’s ca 1.9 Million known species in an estimated corpus of over 500 Million published pages. The cited publications in Plazi are all available at the Biodiversity Literature Repository at Zenodo/CERN.\nTreatments are well defined parts of articles that define the particular usage of a scientific name by an author at a given time (the publication)1. With other words, each scientific name has one to several treatments, depending whether there exists only an original description of a species, or whether there are subsequent re-descriptions. Similar to bibliographic references, treatments can be cited, and subsequent usages of names cite earlier treatments.\nTreatments are a synthesis of the knowledge of a given species at a given time. They can be very rich in data, explicitly or implicitly, detailed or summarized, and include many references to external data sources, such as scientific names, collection codes, DNA-codes.\nThe data can be semantically enhanced, and linked. Treatments as parts of publication need be extracted. Most recently, treatments are tagged in electronic publications with the National Library of Medicine’s Journal Article Tag Suites (JATS) TaxPub extension 1. This allows automatic extraction. Still the majority of the ca. 2000 journals and books publishing treatments use the PDF format at best. Plazi has tools to extract treatments, enhance the embedded data and import it into its SRS- Treatment Search Portal for public online access.\nThe data, that is, treatments and observation data, can be viewed as HTML, XML, RDF, or can be harvested with the protocols provided below. The data is provided for harvesting as Darwin Core-Archives.\nWhat is a DarwinCore Archive? The Darwin Core Archive format is a simple and extensible schema for sharing biodiversity data, especially catalogue data based on the ratified Darwin Core terms and the Darwin Core text guidelines [4]. Darwin Core is a standard for describing sample data in the Biodiversity Informatics community. It has been developed by the Global Biodiversity Information Facility (GBIF).. DarwinCore Archives use a table-based, \u0026ldquo;spreadsheet-style\u0026rdquo; format that is more comfortable and familiar to biologists. It uses plain text-files but it is tied to processes that support consistency and stability.\nFig. Schematic representation of a Darwin Core Archive and its components 2\nThe GBIF GNA format consists of a set of files where one (or more) files represents the \u0026lsquo;core\u0026rsquo; taxonomic data where a single row represents a single taxon reference. The DarwinCore Taxon class provides the majority of concepts supported in the format that enable taxonomic and nomenclatural semantics and syntax (classification, taxonomic and nomenclatural synonymy, status, etc.) to be expressed.\nOther files represent \u0026ldquo;extensions\u0026rdquo; to this core table and allow additional data elements to be linked to a taxon in the core table with a many to one relationship. The overall topology of one or more of these extensions to the core table is referred to as a \u0026ldquo;star schema\u0026rdquo; and provides a compromise between an overly simple flat-file representation of data and more complex multi-related files. In addition to these files, an additional descriptor file named “meta.xml” serves as a key to the other files. Collectively, these files can be further zipped into a single compressed archive file for portability. This compressed file is known as a Darwin Core Archive (DwCA) file 2.\nThe Darwin Core Archive used by Plazi There is one archive per article stored in Plazi, containing the data from all the treatments in the article. Archives contain nine files:\nmeta.xml: description of columns in data files eml.xml: archive meta data, i.e., bibliographic citation of article, etc. taxa.txt: the archive core file, containing one row per taxon in the nomenclature section of a treatment, thus one or multiple rows per treatment, with any after the first for each treatment handling synonymizations. occurrences.txt: occurrence data, containing one row per materials citation, with an ID reference to taxa.txt description.txt: description data, containing one row per descriptive treatment section, with an ID reference to taxa.txt distribution.txt: general distribution data, one row per distribution statement, with an ID reference to taxa.txt media.txt: full text treatments with HTML markup with additional meta data like a bibliographic citation, one row per treatment, with an ID reference to taxa.txt references.txt: bibliographic references to individual treatments, one row per treatment, with an ID reference to taxa.txt vernaculars.txt: vernacular names of treatment taxa, currently empty, as we do not have or mark this kind of data For a detailed description of the content of each file see Appendix: Darwin Core Archive Content\nTreatment Data representation in Plazi The treatment data is stored in the Treatment Search Portal in native, generic XML included in tagged original publications. The tagged elements are (a) additionally stored in dedicated index structures to support search and (b) extracted and exported in several formats, including DwCA.\nA treatment document includes two main elements, the header including the metadata based on the Metadata Object Description Schema (MODS) and the body.\n\u0026lt;tax:taxonx\u0026gt; \u0026lt;tax:taxonxHeader\u0026gt; \u0026lt;tax:taxonxBody\u0026gt; The data XML can be converted via XSLT into HTML, TaxonX XML (a schema developed to model biosystematics legacy literature), and RDF and HTML\nHTML:\nhttp://treatment.plazi.org/id/31F96F41-E3E0-02BD-8898-5A4F3A20E45A\n(this is also the persistent httpURI used as identifier for treatments)\nPlain XML:\nhttp://tb.plazi.org/GgServer/xslt/31F96F41E3E002BD88985A4F3A20E45A\nTaxonX XML:\nhttp://tb.plazi.org/GgServer/taxonx/31F96F41E3E002BD88985A4F3A20E45A\nRDF:\nhttp://tb.plazi.org/GgServer/rdf/31F96F41E3E002BD88985A4F3A20E45A\nor\nhttp://treatment.plazi.org/id/31F96F41-E3E0-02BD-8898-5A4F3A20E45A.rdf\nThe terms used in TaxonX and RDF are either imported from existing schemas (such as Darwin Core for observation records, MODS for bibliographic data) or are, if not available, defined in schemas (TaxonX) or ontologies (RDF: in development)\nPlazi API Treatment data is open access and can be accessed via HTTP GET as described in detail below. The treatment data is provided in HTML, various XML flavors, and RDF.\nObtaining a list of all the treatments available from Plazi HTTP GET - RSS http://tb.plazi.org/GgServer/xml.rss.xml\nResponse (RSS, in Atom XML, encoded in UTF-8)\nEntries of interest\nchannel/item/link: the link to the XML treatment channel/item/title: the taxon name and authority Accessing a particular DwC-Archive\nHTTP GET - ZIP Archive tb.plazi.org/GgServer/dwca/\u0026lt;dataSetUUID\u0026gt;.zip\nReplace \u0026lt;dataSetUUID\u0026gt; with any UUID from the GBIF-provided listing (see below). It is also possible to directly use the endpoint URL from that listing list.\nExample:\nhttp://tb.plazi.org/GgServer/dwca/23A1465DDF212F7DA589F41341B83FCC.zip\nResponse (ZIP Archive, containing XML and tab separated TXT files, all encoded in UTF-8)\nEntries of interest:\neml.xml: an XML file containing the meta data of the publication, in MODS format taxa.txt: a tab separated TXT file listing the taxa and treatments the DwC-Archive contains, plus higher taxonomy; the Identifier column takes the form \u0026lt;treatmentUUID\u0026gt;.taxon, and the treatment UUID can be used to access the treatment on the Plazi servers (see below) occurrences.txt: a tab separated TXT file containing occurrence data; the TaxonID column references the Identifier column in taxa.txt, the data column headers are DwC terms media.txt: a tab separated TXT file containing HTML versions of the treatments; the TaxonID column references the Identifier column in taxa.txt, the HTML treatments are located in the Description column references.txt: A detailed description of contents can be found here\nhttp://github.com/plazi/Plazi-Communications/wiki/GBIF#darwin-core-archive\nAccessing a particular treatment on the Plazi servers\nHTTP GET - Page displaying Treatment tb.plazi.org/GgServer/html/\u0026lt;treatmentUUID\u0026gt;;\nReplace \u0026lt;treatmentUUID\u0026gt; with the actual treatment UUID from the taxa.txt file found in DwC-Archives\nExample:\nhttp://tb.plazi.org/GgServer/html/8C4CE845A6DEE6FDFD1600A70D5BC71B\nResponse (HTML, encoded in UTF-8): a web page displaying the treatment\nHTTP GET - generic XML tb.plazi.org/GgServer/xml/\u0026lt;treatmentUUID\u0026gt;;\nReplace \u0026lt;treatmentUUID\u0026gt; with the actual treatment UUID from the taxa.txt file found in DwC-Archives\nExample:\nhttp://tb.plazi.org/GgServer/xml/8C4CE845A6DEE6FDFD1600A70D5BC71B\nResponse (XML, encoded in UTF-8): the raw, generic XML version of the treatment, which all other representations are generated from\nHTTP GET - TaxonX tb.plazi.org/GgServer/taxonx/\u0026lt;treatmentUUID\u0026gt;;\nReplace \u0026lt;treatmentUUID\u0026gt; with the actual treatment UUID from the taxa.txt file found in DwC-Archives\nExample:\nhttp://tb.plazi.org/GgServer/taxonx/8C4CE845A6DEE6FDFD1600A70D5BC71B\nResponse (XML, encoded in UTF-8): a TaxonX XML version of the treatment\nList of Plazi\u0026rsquo;s available DwC-Archives from GBIF API GBIF is a regular harvester of Plazi data and can be used as an alternative site.\nHTTP GET - JSON api.gbif.org/v1/organization/7ce8aef0-9e92-11dc-8738-b8a03c50a862/publishedDataset;\nReplace \u0026lt;20k\u0026gt; with any multiple of 20 (including 0) to page through the list. It is also possible to use a limit other than 20, with the offset then being a multiple of that other limit.\nExample (first 20 datasets):\nhttp://api.gbif.org/v1/organization/7ce8aef0-9e92-11dc-8738-b8a03c50a862/publishedDataset?limit=20\u0026amp;offset=0\nResponse (JSON)\n{ \u0026#34;offset\u0026#34;: 0, \u0026#34;limit\u0026#34;: 1, \u0026#34;endOfRecords\u0026#34;: false, \u0026#34;count\u0026#34;: 1129, \u0026#34;results\u0026#34;: [{ \u0026#34;key\u0026#34;: \u0026#34;3e8b196b-c482-47f1-9574-772141310c40\u0026#34;, \u0026#34;installationKey\u0026#34;: \u0026#34;7ce8aef1-9e92-11dc-8740-b8a03c50a999\u0026#34;, \u0026#34;publishingOrganizationKey\u0026#34;: \u0026#34;7ce8aef0-9e92-11dc-8738-b8a03c50a862\u0026#34;, \u0026#34;external\u0026#34;: false, \u0026#34;numConstituents\u0026#34;: 0, \u0026#34;type\u0026#34;: \u0026#34;CHECKLIST\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;Revision of the ant genus Myrmoteras in the Malay Archipelago (Hymenoptera, Formicidae).\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;UNAVAILABLE\u0026#34;, \u0026#34;language\u0026#34;: \u0026#34;eng\u0026#34;, \u0026#34;homepage\u0026#34;: \u0026#34;http://tb.plazi.org/GgServer/summary/23A1465DDF212F7DA589F41341B83FCC\u0026#34;, \u0026#34;citation\u0026#34;: { \u0026#34;text\u0026#34;: \u0026#34;Plazi.org taxonomic treatments database: Revision of the ant genus Myrmoteras in the Malay Archipelago (Hymenoptera, Formicidae).\u0026#34; }, \u0026#34;rights\u0026#34;: \u0026#34;No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation.\u0026#34;, \u0026#34;lockedForAutoUpdate\u0026#34;: false, \u0026#34;createdBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;modifiedBy\u0026#34;: \u0026#34;crawler.gbif.org\u0026#34;, \u0026#34;created\u0026#34;: \u0026#34;2014-06-28T12:55:54.089+0000\u0026#34;, \u0026#34;modified\u0026#34;: \u0026#34;2014-11-25T13:29:20.716+0000\u0026#34;, \u0026#34;contacts\u0026#34;: […], \u0026#34;endpoints\u0026#34;: [{ \u0026#34;key\u0026#34;: 45389, \u0026#34;type\u0026#34;: \u0026#34;DWC_ARCHIVE\u0026#34;, \u0026#34;url\u0026#34;: \u0026#34;http://plazi.cs.umb.edu/GgServer/dwca/23A1465DDF212F7DA589F41341B83FCC.zip\u0026#34;, \u0026#34;createdBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;modifiedBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;created\u0026#34;: \u0026#34;2014-06-28T12:55:54.604+0000\u0026#34;, \u0026#34;modified\u0026#34;: \u0026#34;2014-06-28T12:55:54.604+0000\u0026#34;, \u0026#34;machineTags\u0026#34;: [] }], \u0026#34;machineTags\u0026#34;: [...], \u0026#34;tags\u0026#34;: [], \u0026#34;identifiers\u0026#34;: [{ \u0026#34;key\u0026#34;: 23594, \u0026#34;type\u0026#34;: \u0026#34;UUID\u0026#34;, \u0026#34;identifier\u0026#34;: \u0026#34;23A1465DDF212F7DA589F41341B83FCC\u0026#34;, \u0026#34;createdBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;created\u0026#34;: \u0026#34;2014-06-28T12:55:54.334+0000\u0026#34; }], \u0026#34;comments\u0026#34;: [], \u0026#34;bibliographicCitations\u0026#34;: [], \u0026#34;curatorialUnits\u0026#34;: [], \u0026#34;taxonomicCoverages\u0026#34;: [], \u0026#34;geographicCoverages\u0026#34;: [], \u0026#34;temporalCoverages\u0026#34;: [], \u0026#34;keywordCollections\u0026#34;: [], \u0026#34;countryCoverage\u0026#34;: [], \u0026#34;collections\u0026#34;: [], \u0026#34;dataDescriptions\u0026#34;: [] }] } Entries of interest:\nendOfRecords: if false, increasing offset will return further datasets count: total number of available Plazi datasets results.endpoints.url: the URL of the DwC-Archive containing the data on results.identifiers.identifier: the UUID of the dataset results.homepage: the URL of an HTML page listing the taxonomic treatments whose data is contained in the DwC-Archive Appendix: Darwin Core Archive Content taxa.txt\nhttp://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon for taxon, treatment ID + .syn for new junior synonyms http://rs.tdwg.org/dwc/terms/namePublishedIn: reference string of original description http://rs.tdwg.org/dwc/terms/acceptedNameUsageID: blank, except for new junior synonyms http://rs.tdwg.org/dwc/terms/parentNameUsageID: blank http://rs.tdwg.org/dwc/terms/originalNameUsageID: blank http://rs.tdwg.org/dwc/terms/kingdom: taxon@kingdom http://rs.tdwg.org/dwc/terms/phylum: taxon@phylum http://rs.tdwg.org/dwc/terms/class: taxon@class http://rs.tdwg.org/dwc/terms/order: taxon@order http://rs.tdwg.org/dwc/terms/family: taxon@family http://rs.tdwg.org/dwc/terms/genus: taxon@genus http://rs.tdwg.org/dwc/terms/taxonRank: taxon@rank http://rs.tdwg.org/dwc/terms/scientificName: taxon name http://rs.tdwg.org/dwc/terms/taxonomicStatus: blank except for new junior synonyms, where \u0026ldquo;synonym\u0026rdquo;, \u0026ldquo;homotypicSynonym\u0026rdquo; if we have a syntype http://rs.tdwg.org/dwc/terms/nomenclaturalStatus: blank http://purl.org/dc/terms/references: HTTP URI of treatment occurrences.txt\nhttp://rs.tdwg.org/dwc/terms/occurrenceID: treatment UUID + \u0026ldquo;.mc.\u0026rdquo; + materials citation ID http://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + \u0026ldquo;.taxon\u0026rdquo;, referencing taxa.txt http://rs.tdwg.org/dwc/terms/catalogNumber: mc@specimenCode (explode to one record per specimen code if possible) http://rs.tdwg.org/dwc/terms/collectionCode: mc@collectionCode (explode to one record per collection code if possible) http://rs.tdwg.org/dwc/terms/institutionCode: blank http://rs.tdwg.org/dwc/terms/typeStatus: mc@typeStatus (blank if none given) http://rs.gbif.org/terms/1.0/verbatimLabel: mc text http://rs.tdwg.org/dwc/terms/sex: mc@sex (also other specimen types like \u0026ldquo;queen\u0026rdquo;, \u0026ldquo;worker\u0026rdquo;, etc.) http://rs.tdwg.org/dwc/terms/individualCount: mc@specimenCount (explode things like \u0026ldquo;5 workers, 2 females\u0026rdquo; to one record per typified specimen count if possible) http://rs.tdwg.org/dwc/terms/eventDate: mc@collectingDate http://rs.tdwg.org/dwc/terms/recordedBy: mc@collectorName http://rs.tdwg.org/dwc/terms/recordNumber: blank http://rs.tdwg.org/dwc/terms/decimalLatitude: mc@latitude http://rs.tdwg.org/dwc/terms/decimalLongitude: mc@longitude http://rs.tdwg.org/dwc/terms/minimumElevationInMeters: mc@elevation, or mc@elevationMin if given http://rs.tdwg.org/dwc/terms/maximumElevationInMeters: mc@elevationMax if given http://rs.tdwg.org/dwc/terms/country: mc@collectingCountry http://rs.tdwg.org/dwc/terms/stateProvince: mc@stateProvince or mc@collectingRegion http://rs.tdwg.org/dwc/terms/municipality: mc@collectingMunicipality http://rs.tdwg.org/dwc/terms/locality: mc@location http://purl.org/dc/terms/references: HTTP URI of treatment description.txt\nhttp://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon, referencing taxa.txt http://purl.org/dc/terms/type: subSubSection@type http://purl.org/dc/terms/description: subSubSection text http://purl.org/dc/terms/language: blank (except if we have language detection (might be reusable from spell checker)) http://purl.org/dc/terms/source: article citation distribution.txt\nhttp://rs.tdwg.org/dwc/terms/locationID: treatment UUID + \u0026ldquo;.\u0026rdquo; + location UUID http://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon, referencing taxa.txt http://rs.tdwg.org/dwc/terms/country: mc@collectinCountry http://rs.tdwg.org/dwc/terms/locality: mc@location http://rs.tdwg.org/dwc/terms/occurrenceStatus: mc@typeStatus media.txt\nhttp://purl.org/dc/terms/identifier: treatment UUID + .text http://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon, referencing taxa.txt http://purl.org/dc/terms/type: purl.org/dc/dcmitype/Text http://iptc.org/std/Iptc4xmpExt/1.0/xmlns/CVterm: \u0026ldquo;http://rs.tdwg.org/ontology/voc/SPMInfoItems#GeneralDescription\u0026rdquo; http://purl.org/dc/terms/format: text/html http://purl.org/dc/terms/title: taxon + author + year http://purl.org/dc/terms/description: treatment HTML http://rs.tdwg.org/dwc/terms/additionalInformationURL: treatment HTTP URI http://ns.adobe.com/xap/1.0/rights/UsageTerms: Public Domain http://purl.org/dc/terms/rights: No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation. http://ns.adobe.com/xap/1.0/rights/Owner: blank http://purl.org/dc/terms/contributor: ((Pensoft|Zootaxa) via )?Plazi http://purl.org/dc/terms/creator: author list, semicolon separated http://purl.org/dc/terms/bibliographicCitation: bibliographic reference string references.txt\nhttp://purl.org/dc/terms/identifier: treatment UUID + .ref for article (treatment) reference, cited treatment ID (from treatmentCitation@httpUri) + .ref for original description reference http://rs.tdwg.org/dwc/terms/taxonID: treatment ID + .taxon, referencing taxa.txt http://eol.org/schema/reference/publicationType: bibRef@type http://eol.org/schema/reference/full_reference: reference text http://eol.org/schema/reference/primaryTitle: bibRef@title http://purl.org/dc/terms/title: bibRef@journal or bibRef@volumeTitle http://purl.org/ontology/bibo/pages: blank http://purl.org/ontology/bibo/pageStart: treatment first page http://purl.org/ontology/bibo/pageEnd: treatment last page http://purl.org/ontology/bibo/journal: bibRef@journal http://purl.org/ontology/bibo/volume: bibRef@part http://purl.org/dc/terms/publisher: bibRef@publisher http://purl.org/ontology/bibo/authorList: bibRef@author, semicolon separated http://purl.org/ontology/bibo/editorList: bibRef@editor, semicolon separated http://purl.org/dc/terms/created: bibRef@year http://purl.org/dc/terms/language: blank http://purl.org/ontology/bibo/uri: bibRef@URL, if available http://purl.org/ontology/bibo/doi: bibRef@DOI, if available vernaculars.txt\nhttp://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon, referencing taxa.txt http://purl.org/dc/terms/language: en http://rs.tdwg.org/dwc/terms/vernacularName: vernacular name Further reading Plazi background documents\nDownloads Download the description as PDF\nSupport and Questions For support and questions, please contact our support.\nVersion 20150223\n","id":69,"permalink":"https://plazi.org/treatmentbank/treatment-data-access/","tags":["Treatment Bank"],"title":"Treatment Data Access"},{"categories":[""],"contents":"TreatmentBank is a resource that stores and provides access to the treatements and data therein.\nLatest Treatments What is a treatment? The Plazi TreatmentBank deals with scientific, published, biosystematic literature. It is the literature documenting and describing all the world’s ca 1.9 Million known species in an estimated corpus of over 500 Million published pages. The cited publications in Plazi are all available at the Biodiversity Literature Repository at Zenodo/CERN.\nTreatments are well defined parts of articles that define the particular usage of a scientific name by an author at a given time (the publication). With other words, each scientific name has one to several treatments, depending whether there exists only an original description of a species, or whether there are subsequent re-descriptions. Similar to bibliographic references, treatments can be cited, and subsequent usages of names cite earlier treatments.\nTreatments are a synthesis of the knowledge of a given species at a given time. They can be very rich in data, explicitly or implicitly, detailed or summarized, and include many references to external data sources, such as scientific names, collection codes, DNA-codes.\nThe data can be semantically enhanced, and linked. Treatments as parts of publication need be extracted. Most recently, treatments are tagged in electronic publications with the National Library of Medicine’s Journal Article Tag Suites (JATS) TaxPub extension. This allows automatic extraction. Still the majority of the ca. 2000 journals and books publishing treatments use the PDF format at best. Plazi has tools to extract treatments, enhance the embedded data and import it into its SRS- Treatment Search Portal for public online access.\nThe data, that is, treatments and observation data, can be viewed as HTML, XML, RDF, or can be harvested with the protocols provided below. The data is provided for harvesting as Darwin Core-Archives.\nWhat is TreatmentBank? Scientists describe and communicate the discovery of new biological species with taxonomic treatments that are bound to the names used to refer to these taxa. Often they are very rich in content and increasingly linked to external resources. TreatmentBank is a resource that stores and provides access to the treatements and data therein.\n","id":70,"permalink":"https://plazi.org/data-apis-tools/treatmentbank/","tags":["Treatment Bank","Data, API and Tools"],"title":"TreatmentBank"},{"categories":[""],"contents":"This is an analysis and mining tool for the data contained in TreatmentBank. Access is provided at article and treatment level. Documentation is available.\nThe user can select fields and download Plazi treatment data in multiple formats. Selected fields turn from gray to green. Data on all materials (cited specimens) are available through the “Materials Citations Data” domain; these data are summarized by treatment in the “Materials Data” domain. Select the relevant operation (e.g., show individual values, count distinct values, count all values, minimum value, maximum value) for each selected field. The “Get Statistics” button runs the selected query. You can save the results using one of the links indicating desired format. It may be useful to choose an appropriate file name and add an extension to the file (e.g., .csv for comma separated values).\nTreatment Statistics Article Statistics ","id":71,"permalink":"https://plazi.org/data-apis-tools/statistics/","tags":["Treatment Bank","Data, API and Tools"],"title":"Statistics"},{"categories":[""],"contents":"What is a treatment? The Plazi TreatmentBank [1] deals with scientific, published, biosystematic literature. It is the literature documenting and describing all the world’s ca 1.9 Million known species in an estimated corpus of over 500 Million published pages. The cited publications in Plazi are all available at the Biodiversity Literature Repository [2] at Zenodo/CERN.\nTreatments are well defined parts of articles that define the particular usage of a scientific name by an author at a given time (the publication) [3]. With other words, each scientific name has one to several treatments, depending whether there exists only an original description of a species, or whether there are subsequent re-descriptions. Similar to bibliographic references, treatments can be cited, and subsequent usages of names cite earlier treatments.\nTreatments are a synthesis of the knowledge of a given species at a given time. They can be very rich in data, explicitly or implicitly, detailed or summarized, and include many references to external data sources, such as scientific names, collection codes, DNA-codes.\nThe data can be semantically enhanced, and linked. Treatments as parts of publication need be extracted. Most recently, treatments are tagged in electronic publications with the National Library of Medicine’s Journal Article Tag Suites (JATS) TaxPub extension [3]. This allows automatic extraction. Still the majority of the ca. 2000 journals and books publishing treatments use the PDF format at best. Plazi has tools to extract treatments, enhance the embedded data and import it into its SRS- Treatment Search Portal for public online access.\nThe data, that is, treatments and observation data, can be viewed as HTML, XML, RDF, or can be harvested with the protocols provided below. The data is provided for harvesting as Darwin Core-Archives.\nWhat is a DarwinCore Archive? The Darwin Core Archive format is a simple and extensible schema for sharing biodiversity data, especially catalogue data based on the ratified Darwin Core terms and the Darwin Core text guidelines [4]. Darwin Core is a standard for describing sample data in the Biodiversity Informatics community. It has been developed by the Global Biodiversity Information Facility (GBIF).. DarwinCore Archives use a table-based, \u0026ldquo;spreadsheet-style\u0026rdquo; format that is more comfortable and familiar to biologists. It uses plain text-files but it is tied to processes that support consistency and stability.\nFig. Schematic representation of a Darwin Core Archive and its components [4]\nThe GBIF GNA format consists of a set of files where one (or more) files represents the \u0026lsquo;core\u0026rsquo; taxonomic data where a single row represents a single taxon reference. The DarwinCore Taxon class provides the majority of concepts supported in the format that enable taxonomic and nomenclatural semantics and syntax (classification, taxonomic and nomenclatural synonymy, status, etc.) to be expressed.\nOther files represent \u0026ldquo;extensions\u0026rdquo; to this core table and allow additional data elements to be linked to a taxon in the core table with a many to one relationship. The overall topology of one or more of these extensions to the core table is referred to as a \u0026ldquo;star schema\u0026rdquo; and provides a compromise between an overly simple flat-file representation of data and more complex multi-related files. In addition to these files, an additional descriptor file named “meta.xml” serves as a key to the other files. Collectively, these files can be further zipped into a single compressed archive file for portability. This compressed file is known as a Darwin Core Archive (DwCA) file [4].\nThe Darwin Core Archive used by Plazi There is one archive per article stored in Plazi, containing the data from all the treatments in the article. Archives contain nine files:\nmeta.xml: description of columns in data files eml.xml: archive meta data, i.e., bibliographic citation of article, etc. taxa.txt: the archive core file, containing one row per taxon in the nomenclature section of a treatment, thus one or multiple rows per treatment, with any after the first for each treatment handling synonymizations. occurrences.txt: occurrence data, containing one row per materials citation, with an ID reference to taxa.txt description.txt: description data, containing one row per descriptive treatment section, with an ID reference to taxa.txt distribution.txt: general distribution data, one row per distribution statement, with an ID reference to taxa.txt media.txt: full text treatments with HTML markup with additional meta data like a bibliographic citation, one row per treatment, with an ID reference to taxa.txt references.txt: bibliographic references to individual treatments, one row per treatment, with an ID reference to taxa.txt vernaculars.txt: vernacular names of treatment taxa, currently empty, as we do not have or mark this kind of data\nFor a detailed description of the content of each file see Appendix: Darwin Core Archive Content\nTreatment Data representation in Plazi The treatment data is stored in the Treatment Search Portal in native, generic XML included in tagged original publications. The tagged elements are (a) additionally stored in dedicated index structures to support search and (b) extracted and exported in several formats, including DwCA.\nA treatment document includes two main elements, the header including the metadata based on the Metadata Object Description Schema (MODS) and the body.\ntax:taxonx tax:taxonxHeader tax:taxonxBody The data XML can be converted via XSLT into HTML, TaxonX XML (a schema developed to model biosystematics legacy literature), and RDF and HTML\nHTML: http://treatment.plazi.org/id/31F96F41-E3E0-02BD-8898-5A4F3A20E45A (this is also the persistent httpURI used as identifier for treatments)\nPlain XML: http://tb.plazi.org/GgServer/xslt/31F96F41E3E002BD88985A4F3A20E45A\nTaxonX XML: http://tb.plazi.org/GgServer/taxonx/31F96F41E3E002BD88985A4F3A20E45A\nRDF: http://tb.plazi.org/GgServer/rdf/31F96F41E3E002BD88985A4F3A20E45A or http://treatment.plazi.org/id/31F96F41-E3E0-02BD-8898-5A4F3A20E45A.rdf\nThe terms used in TaxonX and RDF are either imported from existing schemas (such as Darwin Core for observation records, MODS for bibliographic data) or are, if not available, defined in schemas (TaxonX) or ontologies (RDF: in development)\nPlazi API Treatment data is open access and can be accessed via HTTP GET as described in detail below. The treatment data is provided in HTML, various XML flavors, and RDF.\nObtaining a list of all the treatments available from Plazi\nHTTP GET http://tb.plazi.org/GgServer/xml.rss.xml\nResponse (RSS, in Atom XML, encoded in UTF-8)\nEntries of interest\nchannel/item/link: the link to the XML treatment channel/item/title: the taxon name and authority Accessing a particular DwC-Archive\nHTTP GET http://tb.plazi.org/GgServer/dwca/.zip\nReplace with any UUID from the GBIF-provided listing (see below). It is also possible to directly use the endpoint URL from that listing list.\nExample:\nhttp://tb.plazi.org/GgServer/dwca/23A1465DDF212F7DA589F41341B83FCC.zip Response (ZIP Archive, containing XML and tab separated TXT files, all encoded in UTF-8)\nEntries of interest:\neml.xml: an XML file containing the meta data of the publication, in MODS format taxa.txt: a tab separated TXT file listing the taxa and treatments the DwC-Archive contains, plus higher taxonomy; the Identifier column takes the form - .taxon, and the treatment UUID can be used to access the treatment on the Plazi servers (see below) occurrences.txt: a tab separated TXT file containing occurrence data; the TaxonID column references the Identifier column in taxa.txt, the data column - headers are DwC terms media.txt: a tab separated TXT file containing HTML versions of the treatments; the TaxonID column references the Identifier column in taxa.txt, the HTML - treatments are located in the Description column references.txt: A detailed description of contents can be found here http://github.com/plazi/Plazi-Communications/wiki/GBIF#darwin-core-archive Accessing a particular treatment on the Plazi servers\nHTTP GET tb.plazi.org/GgServer/html/\u0026lt;treatmentUUID\u0026gt;\nReplace with the actual treatment UUID from the taxa.txt file found in DwC-Archives\nExample:\nhttp://tb.plazi.org/GgServer/html/8C4CE845A6DEE6FDFD1600A70D5BC71B Response (HTML, encoded in UTF-8): a web page displaying the treatment\nHTTP GET http://tb.plazi.org/GgServer/xml/\u0026lt;treatmentUUID\u0026gt;\nReplace with the actual treatment UUID from the taxa.txt file found in DwC-Archives\nExample:\nhttp://tb.plazi.org/GgServer/xml/8C4CE845A6DEE6FDFD1600A70D5BC71B Response (XML, encoded in UTF-8): the raw, generic XML version of the treatment, which all other representations are generated from\nHTTP GET http://tb.plazi.org/GgServer/taxonx/\u0026lt;treatmentUUID\u0026gt;\nReplace with the actual treatment UUID from the taxa.txt file found in DwC-Archives\nExample:\nhttp://tb.plazi.org/GgServer/taxonx/8C4CE845A6DEE6FDFD1600A70D5BC71B Response (XML, encoded in UTF-8): a TaxonX XML version of the treatment\nList of Plazi\u0026rsquo;s available DwC-Archives from GBIF API\nGBIF is a regular harvester of Plazi data and can be used as an alternative site.\nHTTP GET https://api.gbif.org/v1/organization/7ce8aef0-9e92-11dc-8738-b8a03c50a862/publishedDataset;\nReplace \u0026lt;20k\u0026gt; with any multiple of 20 (including 0) to page through the list. It is also possible to use a limit other than 20, with the offset then being a multiple of that other limit.\nExample (first 20 datasets):\nhttp://api.gbif.org/v1/organization/7ce8aef0-9e92-11dc-8738-b8a03c50a862/publishedDataset?limit=20\u0026amp;offset=0\nResponse (JSON)\n{ \u0026#34;offset\u0026#34;: 0, \u0026#34;limit\u0026#34;: 1, \u0026#34;endOfRecords\u0026#34;: false, \u0026#34;count\u0026#34;: 1129, \u0026#34;results\u0026#34;: [ { \u0026#34;key\u0026#34;: \u0026#34;3e8b196b-c482-47f1-9574-772141310c40\u0026#34;, \u0026#34;installationKey\u0026#34;: \u0026#34;7ce8aef1-9e92-11dc-8740-b8a03c50a999\u0026#34;, \u0026#34;publishingOrganizationKey\u0026#34;: \u0026#34;7ce8aef0-9e92-11dc-8738-b8a03c50a862\u0026#34;, \u0026#34;external\u0026#34;: false, \u0026#34;numConstituents\u0026#34;: 0, \u0026#34;type\u0026#34;: \u0026#34;CHECKLIST\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;Revision of the ant genus Myrmoteras in the Malay Archipelago (Hymenoptera, Formicidae).\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;UNAVAILABLE\u0026#34;, \u0026#34;language\u0026#34;: \u0026#34;eng\u0026#34;, \u0026#34;homepage\u0026#34;: \u0026#34;http://tb.plazi.org/GgServer/summary/23A1465DDF212F7DA589F41341B83FCC\u0026#34;, \u0026#34;citation\u0026#34;: { \u0026#34;text\u0026#34;: \u0026#34;Plazi.org taxonomic treatments database: Revision of the ant genus Myrmoteras in the Malay Archipelago (Hymenoptera, Formicidae).\u0026#34; }, \u0026#34;rights\u0026#34;: \u0026#34;No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation.\u0026#34;, \u0026#34;lockedForAutoUpdate\u0026#34;: false, \u0026#34;createdBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;modifiedBy\u0026#34;: \u0026#34;crawler.gbif.org\u0026#34;, \u0026#34;created\u0026#34;: \u0026#34;2014-06-28T12:55:54.089+0000\u0026#34;, \u0026#34;modified\u0026#34;: \u0026#34;2014-11-25T13:29:20.716+0000\u0026#34;, \u0026#34;contacts\u0026#34;: [...], \u0026#34;endpoints\u0026#34;: [{ \u0026#34;key\u0026#34;: 45389, \u0026#34;type\u0026#34;: \u0026#34;DWC_ARCHIVE\u0026#34;, \u0026#34;url\u0026#34;: \u0026#34;http://plazi.cs.umb.edu/GgServer/dwca/23A1465DDF212F7DA589F41341B83FCC.zip\u0026#34;, \u0026#34;createdBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;modifiedBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;created\u0026#34;: \u0026#34;2014-06-28T12:55:54.604+0000\u0026#34;, \u0026#34;modified\u0026#34;: \u0026#34;2014-06-28T12:55:54.604+0000\u0026#34;, \u0026#34;machineTags\u0026#34;: [] }], \u0026#34;machineTags\u0026#34;: [...], \u0026#34;tags\u0026#34;: [], \u0026#34;identifiers\u0026#34;: [{ \u0026#34;key\u0026#34;: 23594, \u0026#34;type\u0026#34;: \u0026#34;UUID\u0026#34;, \u0026#34;identifier\u0026#34;: \u0026#34;23A1465DDF212F7DA589F41341B83FCC\u0026#34;, \u0026#34;createdBy\u0026#34;: \u0026#34;plazi\u0026#34;, \u0026#34;created\u0026#34;: \u0026#34;2014-06-28T12:55:54.334+0000\u0026#34; }], \u0026#34;comments\u0026#34;: [], \u0026#34;bibliographicCitations\u0026#34;: [], \u0026#34;curatorialUnits\u0026#34;: [], \u0026#34;taxonomicCoverages\u0026#34;: [], \u0026#34;geographicCoverages\u0026#34;: [], \u0026#34;temporalCoverages\u0026#34;: [], \u0026#34;keywordCollections\u0026#34;: [], \u0026#34;countryCoverage\u0026#34;: [], \u0026#34;collections\u0026#34;: [], \u0026#34;dataDescriptions\u0026#34;: [] } ] } Entries of interest:\nendOfRecords: if false, increasing offset will return further datasets count: total number of available Plazi datasets results.endpoints.url: the URL of the DwC-Archive containing the data on results.identifiers.identifier: the UUID of the dataset results.homepage: the URL of an HTML page listing the taxonomic treatments whose data is contained in the DwC-Archive References Plazi http://plazi.org Biodiversity Literature Repository. https://zenodo.org/collection/user-biosyslit Catapano T. 2010. TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions. Proceedings of the Journal Article Tag 1. ite Conference 2010 (pdf) Darwin Core Archive Appendix: Darwin Core Archive Content taxa.txt\nhttp://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon for taxon, treatment ID + .syn for new junior synonyms http://rs.tdwg.org/dwc/terms/namePublishedIn: reference string of original description http://rs.tdwg.org/dwc/terms/acceptedNameUsageID: blank, except for new junior synonyms http://rs.tdwg.org/dwc/terms/parentNameUsageID: blank http://rs.tdwg.org/dwc/terms/originalNameUsageID: blank http://rs.tdwg.org/dwc/terms/kingdom: taxon@kingdom http://rs.tdwg.org/dwc/terms/phylum: taxon@phylum http://rs.tdwg.org/dwc/terms/class: taxon@class http://rs.tdwg.org/dwc/terms/order: taxon@order http://rs.tdwg.org/dwc/terms/family: taxon@family http://rs.tdwg.org/dwc/terms/genus: taxon@genus http://rs.tdwg.org/dwc/terms/taxonRank: taxon@rank http://rs.tdwg.org/dwc/terms/scientificName: taxon name http://rs.tdwg.org/dwc/terms/taxonomicStatus: blank except for new junior synonyms, where \u0026#34;synonym\u0026#34;, \u0026#34;homotypicSynonym\u0026#34; if we have a syntype http://rs.tdwg.org/dwc/terms/nomenclaturalStatus: blank http://purl.org/dc/terms/references: HTTP URI of treatment occurrences.txt\nhttp://rs.tdwg.org/dwc/terms/occurrenceID: treatment UUID + \u0026#34;.mc.\u0026#34; + materials citation ID http://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + \u0026#34;.taxon\u0026#34;, referencing taxa.txt http://rs.tdwg.org/dwc/terms/catalogNumber: mc@specimenCode (explode to one record per specimen code if possible) http://rs.tdwg.org/dwc/terms/collectionCode: mc@collectionCode (explode to one record per collection code if possible) http://rs.tdwg.org/dwc/terms/institutionCode: blank http://rs.tdwg.org/dwc/terms/typeStatus: mc@typeStatus (blank if none given) http://rs.gbif.org/terms/1.0/verbatimLabel: mc text http://rs.tdwg.org/dwc/terms/sex: mc@sex (also other specimen types like \u0026#34;queen\u0026#34;, \u0026#34;worker\u0026#34;, etc.) http://rs.tdwg.org/dwc/terms/individualCount: mc@specimenCount (explode things like \u0026#34;5 workers, 2 females\u0026#34; to one record per typified specimen count if possible) http://rs.tdwg.org/dwc/terms/eventDate: mc@collectingDate http://rs.tdwg.org/dwc/terms/recordedBy: mc@collectorName http://rs.tdwg.org/dwc/terms/recordNumber: blank http://rs.tdwg.org/dwc/terms/decimalLatitude: mc@latitude http://rs.tdwg.org/dwc/terms/decimalLongitude: mc@longitude http://rs.tdwg.org/dwc/terms/minimumElevationInMeters: mc@elevation, or mc@elevationMin if given http://rs.tdwg.org/dwc/terms/maximumElevationInMeters: mc@elevationMax if given http://rs.tdwg.org/dwc/terms/country: mc@collectingCountry http://rs.tdwg.org/dwc/terms/stateProvince: mc@stateProvince or mc@collectingRegion http://rs.tdwg.org/dwc/terms/municipality: mc@collectingMunicipality http://rs.tdwg.org/dwc/terms/locality: mc@location http://purl.org/dc/terms/references: HTTP URI of treatment description.txt\nhttp://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + \u0026#34;.taxon\u0026#34;, referencing taxa.txt http://purl.org/dc/terms/type: subSubSection@type http://purl.org/dc/terms/description: subSubSection text http://purl.org/dc/terms/language: blank (except if we have language detection (might be reusable from spell checker)) http://purl.org/dc/terms/source: article citation distribution.txt\nhttp://rs.tdwg.org/dwc/terms/locationID: treatment UUID + \u0026#34;.\u0026#34; + location UUID http://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon, referencing taxa.txt http://rs.tdwg.org/dwc/terms/country: mc@collectinCountry http://rs.tdwg.org/dwc/terms/locality: mc@location http://rs.tdwg.org/dwc/terms/occurrenceStatus: mc@typeStatus media.txt\nhttp://purl.org/dc/terms/identifier: treatment UUID + .text http://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon, referencing taxa.txt http://purl.org/dc/terms/type: purl.org/dc/dcmitype/Text http://iptc.org/std/Iptc4xmpExt/1.0/xmlns/CVterm: \u0026#34;http://rs.tdwg.org/ontology/voc/SPMInfoItems#GeneralDescription\u0026#34; http://purl.org/dc/terms/format: text/html http://purl.org/dc/terms/title: taxon + author + year http://purl.org/dc/terms/description: treatment HTML http://rs.tdwg.org/dwc/terms/additionalInformationURL: treatment HTTP URI http://ns.adobe.com/xap/1.0/rights/UsageTerms: Public Domain http://purl.org/dc/terms/rights: No known copyright restrictions apply. See Agosti, D., Egloff, W., 2009. Taxonomic information exchange and copyright: the Plazi approach. BMC Research Notes 2009, 2:53 for further explanation. http://ns.adobe.com/xap/1.0/rights/Owner: blank http://purl.org/dc/terms/contributor: ((Pensoft|Zootaxa) via )?Plazi http://purl.org/dc/terms/creator: author list, semicolon separated http://purl.org/dc/terms/bibliographicCitation: bibliographic reference string references.txt\nhttp://purl.org/dc/terms/identifier: treatment UUID + .ref for article (treatment) reference, cited treatment ID (from treatmentCitation@httpUri) + .ref for original description reference http://rs.tdwg.org/dwc/terms/taxonID: treatment ID + .taxon, referencing taxa.txt http://eol.org/schema/reference/publicationType: bibRef@type http://eol.org/schema/reference/full_reference: reference text http://eol.org/schema/reference/primaryTitle: bibRef@title http://purl.org/dc/terms/title: bibRef@journal or bibRef@volumeTitle http://purl.org/ontology/bibo/pages: blank http://purl.org/ontology/bibo/pageStart: treatment first page http://purl.org/ontology/bibo/pageEnd: treatment last page http://purl.org/ontology/bibo/journal: bibRef@journal http://purl.org/ontology/bibo/volume: bibRef@part http://purl.org/dc/terms/publisher: bibRef@publisher http://purl.org/ontology/bibo/authorList: bibRef@author, semicolon separated http://purl.org/ontology/bibo/editorList: bibRef@editor, semicolon separated http://purl.org/dc/terms/created: bibRef@year http://purl.org/dc/terms/language: blank http://purl.org/ontology/bibo/uri: bibRef@URL, if available http://purl.org/ontology/bibo/doi: bibRef@DOI, if available vernaculars.txt\nhttp://rs.tdwg.org/dwc/terms/taxonID: treatment UUID + .taxon, referencing taxa.txt http://purl.org/dc/terms/language: en http://rs.tdwg.org/dwc/terms/vernacularName: vernacular name Notes Plazi background documents Download the description as PDF Support and Questions: Please contact our support with any questions Version: 20150223 ","id":72,"permalink":"https://plazi.org/data-apis-tools/treatmentbank-api/","tags":["Treatment Bank","Data, API and Tools"],"title":"TreatmentBank API"},{"categories":[""],"contents":"\nAll the data extracted from the taxonomic literature is available for visualization. For that Plazi offers a tool to create a number of Google Charts. Please read the documentation on using the visualization tools and an overview of the available fields.\n","id":73,"permalink":"https://plazi.org/data-apis-tools/visualization-tools/","tags":["Treatment Bank","Data, API and Tools"],"title":"Visualization Tools"},{"categories":[""],"contents":"All our publications are published under open licenses. If you are unable to locate a copy, please contact us and we will be happy to send it to you.\n2023 Agosti D, Penev L, Ruch P, Benichou L, Ioannidis-Pantopikos A (2023) The Ecosystem of Linked Biodiversity Publications: General Picture of Tools and Services Created by Plazi, Pensoft, MNHN, CETAF, Zenodo, and SIBiLS. Biodiversity Information Science and Standards 7: e110681. https://doi.org/10.3897/biss.7.110681 Benichou L, Agosti D, Egloff W, Hermann E, Kageyama M, Mergen P, Rinaldo C, Buschbom J (2023) Joint statement by CETAF, SPNHC and BHL on DATA within scientific publications: clarification of [non]copyrightability. Research Ideas and Outcomes 9: e115466. https://doi.org/10.3897/rio.9.e115466\u003e Caucheteur D, Pendlington ZM, Roncaglia P, Gobeill J, Mottin L, Matentzoglu N, Agosti D, Osumi-Sutherland D, Parkinson H, Ruch P 2000., COVoc and COVTriage: novel resources to support literature triage, Bioinformatics, 39 (1): btac800, doi: ","id":74,"permalink":"https://plazi.org/about/publications/","tags":["Publications"],"title":"Publications"},{"categories":["news"],"contents":"One of the big challenges in biology - how many species are on planet Earth, where do they live and what do they do - is increasingly relevant in the age of the biodiversity crisis. In the digital age this turns awkward, because it becomes obvious that we do not even know what we know about biodiversity. Whilst the libraries have been the undisputed source of knowledge in the analogue age, having hard copies in libraries does not serve the needs anymore. Neither does the Portable Document Format (PDF), the currently most widespread publishing format and required by the Codes of nomenclature. None of them allows machines to access the facts in the estimated 500 Million pages of legacy publications. It requires a huge amount of scientists’ time to catalogue the content, for example to build the Catalogue of Life, a goal the latest since the Rio Earth Summit in 1992. But a catalogue itself isn’t enough because this does not allow to explore the many implicit links, the taxonomic treatments, traits and specimens used to describe new or enhance the knowledge on existing species with additional data.\nIn fact, in the digital age, an expert\u0026rsquo;s identification of a specimen should include the reference to the cited treatments, analogous to depositing a voucher specimen in genomic or ecological studies. An expert opinion is without such evidence a cul du sac in the digital age.\nThe missing digital access it is even more annoying, because scientists have been building, albeit so far implicitly, a huge citation network: citing specimens, previous treatments of taxa, publications and using a standard shared domain specific vocabulary such as the Latin Binomen for species names, or morphological terms that allows to compare different data sets.\nFinally, the rapid growth of technology allowing unprecedented analyses, visualization, or making the network navigable is increasing the gap between the way we operate and what’s possible.\nTo know known biodiversity is a complex endeavour. This spans from getting the attention of the scientists, publishers and funders, that publications are not mere tools for career enhancement, but that its data is a contribution to building a knowledge graph, where the scientist is not primarily consuming one publication after the other, but analysing the data from within many to very many publications. This includes getting access to the literature, converting it into a machine readable format, in the most extreme from scanning the source, text conversion, to modeling knowledge, creating and applying domain specific vocabularies to build new, sustainable infrastructures. It requires changes in the sciences to include funding of new infrastructures to measures science output using alternative metrics based on data beyond the articles per se.\nA combination of highly automated and human interaction based tools is needed to control the large amount of converted and annotated data to contribute to build the biodiversity knowledge graph. These controls encompass from proper extraction of text streams to complex citations, such as treatment citations, the building blocks of the catalogue of life, include elements such as taxonomic names and bibliographic reference citations that can only be checked with support of automated processes.\nThe sheer number of facts and links embedded in a single publication, not to speak about the annual production or the backlog waiting in the libraries, can only be processed using automated workflow such as our TreatmentBank service and its interaction with the Biodiversity Literature Repository at Zenodo.\nThanks to sophisticated tools and collaboration with partners quality control can be delivered. For example, thanks to our Synospecies service and its underlying AllegroGraph Database we noticed that several defining treatments appear to be missing, and further analysis with SPARQL helped us quickly find the reason for the problem.\nTogether this will help us not only to provide data for reuse in the Biodiversity Literature Repository to unravel and understand known biodiversity, but provide data that is fit for use by other services such as the Global Biodiversity Information Facility (GBIF), SIBliS, LINDAS, and thus disseminate this data recognized by the Convention of Biological Diversity as the basis of the Global Taxonomic Impediment hampering the conservation of biodiversity.\n","id":75,"permalink":"https://plazi.org/posts/not-knowing-what-we-know/","tags":["news"],"title":"The digital age and the tragedy of not knowing what we know about biodiversity"},{"categories":[""],"contents":"biowikifarm.net is a shared technical platform supporting a number of mediawiki installations used by a large number projects in biological research for open content publishing. The primary purpose of the shared platform is to be able to maintain the published data and information in a long term sustainable way, to work more efficiently and distribute administrative and maintenance work among several partners. Furthermore, the biowikifarm operates a shared media repository, enabling synergies in re-using media content.\n","id":76,"permalink":"https://plazi.org/services/biowikifarm/","tags":[""],"title":"Biowikifarm"},{"categories":[""],"contents":" Donat Agosti\nTaxonomic literature, systematics and conservation\nPresident Plazi\nDirector Plazi GmbH\nResearch Associate\nNaturhistorisches Museum, Bern\nBerne, Switzerland\[email protected] Klemens Böhm\nMember of Board (2008-)\nProfessor\nLehrstuhl für Systeme der Informationsverwaltung\nKarlsruher Institute of Technology\nKarlsruhe, GERMANY\[email protected] Terry Catapano\nLead developer of TaxonX XML schema and Taxpub Extension of the NLM/NCBI JATS DTD\nVice-President Plazi\nColumbia University\nNew York, USA\[email protected] Hong Cui\nNatural language processing, character modeling\nAssociate Professor, Information Technologies\nCollege of Social and Behavioral Sciences\nThe University of Arizona\nTucson, USA\[email protected] Willi Egloff\nLegal Issues\nDirector Plazi GmbH\nLawyer and Partner\nAdvocomplex, Bern, Switzerland\[email protected] Brian M. Fisher\nMember of Board (2008-)\nCurator\nEntomology\nCalifornia Academy of Sciences\nSan Francisco, CA, USA\[email protected] Rob Guralnick\nEcology, modeling\nAssociate Professor\nEcology and Evolutionary Biology\nUniversity of Colorado\nBoulder, Colorado, USA\[email protected] Gregor Hagedorn\nDirector, Biowikifarm\nHead of Digital World\nMuseum für Naturkunde\nLeibniz-Institut für Evolutions- und Biodiversitätsforschung\nBerlin, Germany\[email protected] Hubert Höfer\nCurator\nInvertebrates\nState Museum of Natural History\nKarlsruhe, Germany\[email protected] Puneet Kishor\nOpen Science and Data Policy Advocate\[email protected] Christiana Klingenberg\nTaxonomist, designer of the markup process and manual, Quality Control\nMember of Board (2008-)\nKarlsruhe, Germany\[email protected] Norman F. Johnson\nMember of Board (2008-)\nProfessor\nMuseum of Biological Diversity\nOhio State University\nColumbus, OH, USA\[email protected] Daniel Mietchen\nWikipedian\nResearcher at Museum für Naturkunde Berlin Jena, Germany [email protected] Jeremy Miller\nTaxonomist and bioinformatician\nWorkflow design\nNaturalis, Leiden NL\[email protected] Rod Page\nMember of Board\nProfessor\nDivision of Ecology and Evolutionary Biology\nFaculty of Biomedical and Life Sciences\nUniversity of Glasgow\nGlasgow G12 8QQ, United Kingdom\[email protected] David Patterson\nInterests in a names-based cyberinfrastructure for the management of digital biodiversity data, theGlobal Names project, legal issues, and taxonomy Emeritus Professor\nUniversity of Sydney\nSydney, Australia\[email protected] Lyubo Penev\nProfessor\nBulgarian Academy of Sciences, Sofia\nManaging Director, Pensoft Ltd\nSofia, Bulgaria\[email protected] Rich Pyle\nMember of Board\nAssociate Zoologist\nHawaii Biological Survey\nBernice Pauahi Biship Museum\nHonolulu, Hawai'i, USA\[email protected] Guido Sautter\nDeveloper of the GoldenGATE editor, SRS, TreatmentBank, and Refbank\nKarlsruhe, Germany\[email protected] Diego Janisch Alvares\nData Quality Analyst, Template Creator\nPorto Alegre, Brazil\[email protected] Felipe Simoes\nProject Analyst, Template Creator\nPorto Alegre, Brazil\[email protected] Jonas Castro\nData Quality Analyst, Learning Materials\nPorto Alegre, Brazil\[email protected] Julia Giora\nData Quality Analyst, Learning Materials\nPorto Alegre, Brazil\[email protected] Juliana Wingert\nData Quality Analyst\nViamão, Brazil\[email protected] Tatiana Ruschel\nData Quality Analyst, Learning Materials\nLangenfeld, Germany\[email protected] Valdenar Gonçalves\nData Quality Analyst\nPorto Alegre, Brazil\[email protected] ","id":77,"permalink":"https://plazi.org/about/members/","tags":["About","Members"],"title":"Plazi Members"},{"categories":[""],"contents":"Data Visualization Tools All the data extracted from the taxonomic literature is available for visualization. For that Plazi offers a tool to create a number of Google Charts. Please read the documentation on using the visualization tools and an overview of the available fields.\n","id":78,"permalink":"https://plazi.org/treatmentbank/visualization-tools/","tags":["Treatment Bank","Data Visualization"],"title":"Visualization Tools"},{"categories":["news"],"contents":"Specimens are a critical basis for taxonomic research. They are cited in taxonomic treatments and other works. Increasingly these material citations are extracted from publications and submitted as part of data sets to GBIF and reused in studies.\nAs of today, GBIF includes 33,393 datasets derived from taxonomic publications and 415,858 material citations which are labeled as occurrences. 335 publications reuse this data. An estimate of 45,000 species are only represented in GBIF through material citations from publications, mainly covering new species that are extracted from the literature just after the publishing time. This cuts short the time to enter taxonomic names and re-submit datasets to GBIF.\nHowever, the currently used term occurrence leads to confusion and discussions which need to be resolved.\nSemantically, material citations are not specimens per se, but the citation of a specimen. Furthermore, material citations can be part of a specimen, a specimen, or groups of specimens. They can be very verbose or very cursory, transcription of the original label data or its interpretation.\nFor this reason, Plazi submitted a new term \u0026ldquo;materialCitation\u0026rdquo; for dicussion for an eventual inclusion in the TDWG Darwin Core standard.\nLinks: doi.org/10.30848/PJB2021-5(1)) www.gbif.org/occurrence/3070917303\n","id":79,"permalink":"https://plazi.org/posts/materialcitation-submitted-as-a-new-term/","tags":["news"],"title":"materialCitation as a new class submitted to TDWG Darwin Core standard"},{"categories":[""],"contents":"Over the next years, Plazi’s goal is to link 1 Million taxonomic name usages with their corresponding treatments. This will allow to bridge the gap from the general use of scientific names with the underlying taxonomic literature. It will provide a direct, citable link to the section in scientific articles that include the treatment and the implicit and explicit link to external resources.\nThe treatments are a unique sources for objects (i.e. scientific names) in NCBI, taxa in Wikidata or GBIF.\nIt is the only and most direct access to understand what a particular author had in mind when describing a new species or referring to an existing one. It is also the section in the taxonomic literature where nomenclatoral changes occur and thus the source for building the catalogue of life.\nSome of the treatments listed in the TreatmentBank are stubs, that is they are not yet populated with the respective content. You can help to get the content by uploading the original article (if not digitized, online accessible and with a forseen with a Digital Object Identifier DOI).\nThe Blue List Bouchout-Declaratioin Uses of Data ","id":80,"permalink":"https://plazi.org/services/legal-issues-menu/","tags":["Legal Issues"],"title":"Legal Issues"},{"categories":[""],"contents":"Zenodo Zenodo, Plazi and Pensoft collaborate in developing the Biodiversity Literature Repository as an open access repository to articles and liberated open FAIR data including figures and taxonomic treatments related to biodiversity. A customized access is provided for all its content via the BLR website and to images via Ocellus.\nread more\nGBIF Plazi joined GBIF 2001 and is since 2015 an associate participant of GBIF. Plazi provides occurrences liberated and extracted from the scientific literature, especially from treatments. The assignment of persistent identifiers to treatments allows to cite the source of the occurence taxon.\nread more\nSwiss Institute of Bioinformatics Plazi is a taxonomic treatment data provider for the Swiss Institute of Bionformatics Library System, Switzerland.\nread more\nBiodiversity Hertiage Library The Biodiversity Hertiage Library and Plazi collaborate to make data imprisoned in legacy literature digital accessible knowledg, and to import it to GBIF.\nFactsmission Factsmission collaborates with Plazi to build the knowledge graph Synospecies\nFranz Inc. Franz Inc. supports Plazi with access to its Allegrograph in our triple store Synospecies\nNCBI Plazi provides a traget for NCBI taxonomic linkout. Access to the published data inferred to by the use of an organism name is unique.\nread more\nNLM Journal Archiving and Interchange Tags Suite Together with the Journal Archiving and Interchange Tagsuit team, Plazi and Pensoft developed and maintain TaxPub, a biodiversity domain specific flavor of JATS, now used in taxonomic publications. Besides semantically enhancing the articles, this allows at the same time to import the articles into PubMed Central and extract content directly into TreatmentBank.\nread more\nPensoft Pensoft and Plazi have a longstanding collaboration in contributing to the next generation, open access, semantically enhanced scientific communication and infrastructure.\nread more\neCH Plazi is a member of eCH, a Swiss association to develop and promote standards in the area of e-Government. Specifically, Plazi is engaged with providing scientific taxonomic names, their synonyms and data behind as glue to link species specific data from government agencies, the industry, conservation sector and citizens.\nWikidata Plazi initiated in Wikidata accesss to treatments and data included therein, and through this provides potentially access to all the treatments hold in TreatmentBank.\nread more\nZoobank Zoobank and Plazi cooperate in linking nomenclatorial acts, especially new taxonomic names, with its treatments, and the usage of the same UUIDs for the same concept.\nread more\nCETAF Plazi is a member of the the CETAF\u0026rsquo;s e-Publishing Working Group.\nDatacamp Datacamp provides Plazi with free data education to improve our skills of our Plazi members.\nOthers Biodiversity Information Standards (TDWG) Force11 RDA/CODATA legal interoperability IG Biodiversity Data Integration IG ClickUp supports Plazi with access to its work place management tool. ","id":81,"permalink":"https://plazi.org/about/partners/","tags":["About","Partners"],"title":"Partners"},{"categories":[""],"contents":"\nThe purpose of the Bouchout Declaration is to help make digital data about our biodiversity openly available. It offers members of the biodiversity community a way to demonstrate their commitment to open science.\nAs signatories, we encourage an overarching approach to Open Biodiversity Knowledge Management which is based on the following fundamental principles:\nThe free and open use of digital resources about biodiversity and associated access services; Licenses or waivers that grant or allow all users a free, irrevocable, world-wide, right to copy, use, distribute, transmit and display - the work publicly as well as to build on the work and to make derivative works, subject to proper attribution consistent with community - practices, while recognizing that providers may develop commercial products with more restrictive licensing. Policy developments that will foster free and open access to biodiversity data; Tracking the use of identifiers in links and citations to ensure that sources and suppliers of data are assigned credit for their - contributions; An agreed infrastructure, standards and protocols to improve access to and use of open data; Registers for content and services to allow discovery, access and use of open data; Persistent identifiers for data objects and physical objects such as specimens, images and taxonomic treatments with standard mechanisms - to take users directly to content and data; Linking data using agreed vocabularies, both within and beyond biodiversity, that enable participation in the Linked Open Data Cloud; Dialogue to refine the concept, priorities and technical requirements of Open Biodiversity Knowledge Management; A sustainable Open Biodiversity Knowledge Management that is attentive to scientific, sociological, legal, and financial aspects. You can read here more about the Bouchout Declaration or please send us an email to [email protected].\n","id":82,"permalink":"https://plazi.org/services/bouchout-declaration/","tags":["Legal Issues"],"title":"Bouchout Declaration"},{"categories":[""],"contents":" Current Projects Arcadia 2022-2025 The grant will help build on the momentum created in the first Arcadia supported project (years 2018-2021) by: (1) leading the data extraction effort and building a critical mass of FAIR scientific data and related tools involving the community; and (2) establishing a long lasting, self-sustaining research infrastructure.\nBiCIKL (EU) Biodiversity Community Integrated Knowledge Library (BiCIKL) is building the Biodiversity Knowledge Hub (BKH). more\nNIH-R21 Viral Spillover (NIH) The goal of this proposed project is to harness the collective power of bioinformatics, artificial intelligence (AI), and ecology to transform our understanding of taxon-specific zoonotic risk. more\neBioDiv (Swissuniversities) e-BiodDiv will provide a service for Swiss biodiversity scientists to access and disseminate their research data about species in legacy and prospective publications, and provide access to data about their collections, scientists and specimens. more\nMétoTaxa (Fondation Nationale de Science Ouverte, France) Le projet déposé ici vise à mettre en relation les différents partenaires pour adapter la chaîne de production Métopes (partenaire 1) aux besoins spécifiques de la discipline taxonomique.\nPast Projects ICEDIG 2018-2020 ICEDIG provided a blueprint to build DiSSCo more\nArcadia 2018-2021 Arcadia supported a three-year project to liberate 300,000 taxonomic treatments, build together with Zenodo and Pensoft the Biodiversity Literature Repository (BLR) at Zenodo/CERN, and develop within TreatmentBank tools and services to liberate data and make them accessible.\nEU BON EU BON seeks ways to better integrate biodiversity information and implement into policy and decision-making of biodiversity monitoring and management in the EU.\nLinkD LinkD (pronounced 'linked') will unite European biodiversity scientists behind a common set of infrastructures and goals leveraging existing investment in biodiversity science infrastructures in support of the long-term community inspired vision of modelling the biosphere.\npro-iBiosphere The aim of pro-iBiosphere is to prepare, through a coordination action, the ground for an integrative system for intelligent management of biodiversity knowledge.\nWiki4R Wiki4R will create an innovative virtual research environment (VRE) for Open Science at scale, engaging both professional researchers and citizen data scientists in new and potentially transformative forms of collaboration.\n","id":83,"permalink":"https://plazi.org/about/projects/","tags":["Projects"],"title":"Ongoing and past projects"},{"categories":["news"],"contents":"\nPlazi’s goal is to discover known biodiversity, and make it widely available using the tools available in the digital age. This means liberating data hidden in libraries, and more recently, and increasingly hidden in the PDF-prison, paywalls and unstructured text.\nThe grand challenge is an estimated 500 Million published pages of scholarly biodiversity related publications, and 17,000 annually new discovered and described species with a multiple of annotations on already known species.\nThis clearly cannot be done by an individual institution, despite the rapidly developing technology. It needs collaboration and a strategy. With its rapidly growing corpus of liberated taxonomic treatments and technology, Plazi hopes to inspire others to join this endeavour. While the greatest interest and richness in the data lies in the details - who collected when where what species - this can only be achieved, if its context is liberated, that is accessible as FAIR taxonomic treatments.\nOur current strategy is to make one million taxonomic treatments openly accessible, including taxonomic names, and by leveraging treatment citations, build the catalogue of life with each name linked to the scientific argument and data. We are well on our way.\nTo tackle this, tools and infrastructures are developed, if necessary. They allow converting PDF documents into a text and multimedia stream that can be further processed by adding tags enhancing single words to entire sections with a meaning (semantic enhancement) so humans and machines can understand its content. This is quite a technical challenge.\nMaterials citations attract great attention. They reference specimens in various formats used in the research process leading to the research results - in the biodiversity world - the published taxonomic treatments. However, they are a collateral in the current production. We try to discover them, make them accessible via the Global Biodiversity Information Facility (GBIF) as occurrences. We routinely quality control holotype citations whether they are properly extracted, and let the others pass, unless dedicated resources are available.\nWe believe that, despite this raw form, we provide a services by drawing attention to specimens in collections and species that are not recorded in GBIF, and that this will eventually lead to better data. It will raise the awareness of authors and publishers of the value to publish specimen data in a standardized way, as proposed by EJT and Pensoft.\nWe also believe that together with your input, we can make the material citations fitter for more uses.\nThe launch of the new feedback button in GBIF reflects the commitment by GBIF and Plazi to care about the data. It allows a user to send a request for a review of specific data to Plazi. Its care team will respond in a very short time and notify the result, once the issue has been resolved, including the update of the record in GBIF. At the same time, these requests will be collected to analyse the reason for the issues and develop respective measures to mitigate them. With the users input we can improve our production to provide fitter data.\nLiberating data is a great challenge from building technologies, operations to funding on the one hand, and on the other hopefully contributes more to a new way scientists publish their research results.\n","id":84,"permalink":"https://plazi.org/posts/data-issue-feedback-loop/","tags":["news"],"title":"A new GBIF Plazi data issue feedback loop"},{"categories":[""],"contents":"An impediment to open sharing of biological content is uncertainty as to whether and how intellectual property rights apply to biodiversity information. To clarify the situation, and in collaboration with the Global Names project, Plazi organized a workshop in Tempe, Arizona in April 2013 in which we brought together providers and users of taxonomic information, data managers, and Intellectual property Rights lawyers from Europe and the USA. The perspectives of interested parties were submitted via a SNARL (Scientific Names Attributes, Rights and Licensing) wiki. The outcomes of the workshop were published by Patterson, D. J., Egloff, W., Agosti, D., Eades, D., Franz, N., Hagedorn, G., Rees, J. A. and Remsen, D. P. in 2014 (doi: 10.1186/1756-0500-7-79). A legal analysis of copyright and scientific images has been added in 2017 and those elements have been added in the list below, marked by * (doi: 10.3897/rio.3.e12502).\nCopyright is not applicable to facts or those elements that are normally included in taxonomic sources. The \u0026lsquo;blue list\u0026rsquo; identifies those elements of scientific publications, databases, monographs, classifications, checklists etc. to which copyright does not apply, and that can be re-used without permission. Permission will be required if a data-use agreement is in place and agreed to by both parties; and all users are reminded that it is appropriate to inform the sources of any re-use and to provide appropriate credit to sources.\nA hierarchical organization (classification), in which, as examples, species are nested in genera, genera in families, families in orders, and so on. Alphabetical, chronological, phylogenetic, palaeontological, geographical, ecological, host-based, or feature-based (e.g. life-form) ordering of taxa. Scientific names of genera or other uninomial taxa, species epithets of species names, binomial combinations as species names, or names of infraspecific taxa; with or without the author of the name and the date when it was first introduced. An analysis and/or reasoning as to the nomenclatural and taxonomic status of the name is a familiar component of a treatment. Information about the etymology of the name; statements as to the correct, alternate or erroneous spellings; reference or citation to the literature where the name was introduced or changed. Rank, composition and/or apomorphy of taxon. For species and subordinate taxa that have been placed in different genera, the author (with or without date) of the basionym of the name or the author (with or without date) of the combination or replacement name. Lists of synonyms and/or chresonyms or concepts, including analyses and/or reasoning as to the status or validity of each. Citations of publications that include taxonomic and nomenclatural acts, including typifications. Reference to the type species of a genus or to other type taxa. References to type material, including current or previous location of type material, collection name or abbreviation thereof, specimen codes, and status of type. Data about materials examined. References to image(s) or other media with information about the taxon. Information on overall distribution and ecology, perhaps with a map. Known uses, common names, and conservation status (including Red List status recommendation). Description and / or circumscription of the taxon (features or traits together with the applicable values), diagnostic characters of taxon, possibly with the means (such as a key) by which the taxon can be distinguished from relatives. General information including but not limited to: taxonomic history, morphology and anatomy, reproductive biology, ecology and habitat, biogeography, conservation status, systematic position and phylogenetic relationships of and within the taxon, and references to relevant literature. Photographs (or other image or series of images) by a person or persons using a recording device such as a scanner or camera, whether or not associated with light- or electron-microscopes, using X-rays, acoustics, tomography, electromagnetic resonance or other electromagnetic sources, of whole organisms, groups, colonies, life stages especially from dorsal, lateral, anterior, posterior, apical or other widely used perspectives and designed to show overall aspect of organism.* Photographs (or other image or series of images) by a person or persons using a recording device such as a camera associated with light- or electron-microscopes, using X-rays, acoustics, tomography, electromagnetic resonance images or other electromagnetic sources) of parts of organisms such as but not limited to appendages, mouthparts, anatomical features, ultrastructural features, flowers, fruiting bodies, foliage, intra-organismic and inter-organismic connections, of compounds and analyses of compounds extracted from organisms that demonstrate the characteristics of an individual or taxon and/or allow comparison among individuals/taxa.* Photographs (or other image or series of images) of whole organisms, groups, colonies, life stages, parts of organisms made by camera or scanner or comparable devices using automated procedures.* Drawings of organisms or parts of organisms made by a person or persons to demonstrate the characteristics of an individual/taxon or to allow comparisons among taxa.* Graphical/diagrammatic representation (such as, but not limited to, scatter plots with or without trend lines, histograms, or pie charts) of quantifiable features of one or more individuals or taxa for the purposes of showing the characteristics of or allowing comparison of individuals or taxa and made by a person or persons.* ","id":85,"permalink":"https://plazi.org/services/the-blue-list/","tags":["Legal Issues","Blue List"],"title":"The Blue List"},{"categories":[""],"contents":"Lectures by Plazi team members are available at BLR and SlideShare.\n","id":86,"permalink":"https://plazi.org/about/lectures/","tags":["About","Lectures"],"title":"Lectures"},{"categories":[""],"contents":"Publications on-going Publications using GBIF datasets including Plazi mediated datasets\n2017 Hugo W, Hobern D, Kõljalg U, Ó Tuama É, Saarenma H 2017. Global Infrastructures for Biodiversity Data and Services. In: Walters M and Scholes RJ (eds.), The GEO Handbook on Biodiversity Observation Networks, doi: 10.1007/978-3-319-27288-7_11)\n2016 Alonso LA, Agosti D 2016. Ants. In: Larsen TH (ed.) 2016. Core Standardized Methods for Rapid Biological Field Assessment. Conservation International, Arlington, VA. pp. doi: 10.5281/zenodo.50824\nFaulwetter S, Pafilis E, Fanini L, Bailly N, Agosti D, Arvantidis C, Boicenco L, Catapano T, Claus S, Dekeyzer S, Georgiev T, Legaki A, Mavraki D, Oulas A, Papastefanou G, Penev L, Sautter G, Schigel, Senderov V, Teaca A, Tsompanou M (2016) EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases. Research Ideas and Outcomes 2: e9774. doi: 10.3897/rio.2.e9774\n2015 Dikow T, Agosti D. 2015. Utilizing online resources for taxonomy: a cybercatalog of Afrotropical apiocerid flies (Insecta: Diptera: Apioceridae). Biodiversity Data Journal 3: e5707 (06 Oct 2015. doi: 10.3897/BDJ.3.e5707\nMiller JA, Agosti D, Penev L, Sautter G, Georgiev T, Catapano T, Patterson D, King D, Pereira S, Vos RA, Sierra S 2015. Integrating and visualizing primary data from prospective and legacy taxonomic literature. Biodiversity Data Journal 3: e5063 (12 May 2015) doi: 10.3897/BDJ.3.e5063\nGroom Q. 2015. Piecing together the biogeographic history of Chenopodium vulvaria L. using botanical literature and collections. PeerJ January 8, 2015 doi: 10.7717/peerj.723\n","id":87,"permalink":"https://plazi.org/services/uses-of-data/","tags":["Legal Issues"],"title":"Uses of Data"},{"categories":[""],"contents":"\nPlacidus a Spescha was not only one of the first alpinists and naturalists but was also deeply committed to the ideals of the Enlightenment. He often ran afoul of his superiors who felt he neglected his priestly duties in favor of his scientific pursuits. Worse yet, he further upset the status quo through his sympathy and support of the French in the follow-up of the wars spurred by the French Revolution which brought invading French and Austrian troops into his home, Disentis, in the Vorderrhein Tal in Switzerland. A substantial part of his notes were lost when his cloister burned down, or perhaps because it had been part of a contribution to the French during the war, supposedly finding its way ultimately to the Muséum Nationale d’Histoire Naturelle in Paris.\nPlazi came to life at a meeting of our Digital Library project in Karlsruhe on 25 March, 2007. Until then our activities had been channeled through the American Museum of Natural History’s server which, due to administrative restrictions, increasingly proved to be a barrier to future development.\nAnother major hurdle we faced was the issue of copyright. Copyright is one of the biggest impediments for charting and monitoring global biodiversity. The well over 500 million pages of scientific publications-to which every year are added descriptions of more than 20,000 new species–contain a vast amount of knowledge. Were this content at our fingertips or better a mouse-click away, we would work with one of the richest scientific resources. But copyright prevents this. Moreover, misunderstanding (or overly prudent interpretation) of copyright in our scientific domain, where publishing aims to disseminate new research to the widest possible audience, prohibits the introduction and use of this huge body of knowledge into the digital realm.\nSwitzerland, the home of Placidus Spescha, has, like each country, its own copyright law based on the concept of the “werk” (work). From a legal point of view, we assume that descriptions, and most likely the entire scientific publication, do not qualify as such a work, since they are not sufficiently original. Taxonomic description follows strict rules set by the International Codes of Nomenclature on which content is required for a valid description, as well as other standards developed in the particular domains of zoololgy, botany, mycology or virology. Because these codes mandate that descriptions have to be published, they have the quality of a quasi-legal document. Additionally, by their very nature, descriptions of species are presentations of data repeated for the descriptions and re-descriptions of millions of new species in journals that in most cases follow peer-review.\nPlazi provides access to the content of taxonomic literature in a variety of formats. Whenever possible, a PDF version of the original publication is made available on the Biodiversity Literature Repository hosted on CERN’s Zenodo. An increasing number of publications also have been been encoded in the TaxonX XML schema which identifies and delineates the significant “atoms” of information that comprise taxonomic descriptions in order to facilitate retrieval, anlysis, and integration with other e-science resources. During the mark up process, the documents are enhanced with links to name servers, digital bibliographies, or specimen databases. Our TreatmentBank provides a search interface to the marked up descriptions allowing for mining a rich source of taxonomic data as well as visualization of the content.\nFollowing Placidus Spescha’s tradition, our ideas, methods, and data will be shared and discussed at meetings, through TreatmentBank, in newspaper articles and any other avenue which might enlighten our fellow citizens.\n","id":88,"permalink":"https://plazi.org/about/the-story-behind-plazi/","tags":["About"],"title":"The Story behind Plazi"},{"categories":["news"],"contents":"\nManaging collaborations involving multiple locations, time zones and complementing operations is challenging. As an example, Plazi liberates data from scholarly publications by making it, in collaboration with Zenodo at CERN, findable, citable reusable for anybody, anywhere, at any time. This process happens in multiple locations in Europe, Brazil and North America. The increasingly automated processes are based on agile programming. The maintenance and improvement of high quality data requires an additional amount of time critical interactions.\nTo operate efficiently, Plazi was exploring an adequate management system allowing assigning tasks, visualizing progress, and at the same time providing means for communication between two and multiple users.\nAfter testing multiple systems ClickUp has been selected as the best performing system. \u0026ldquo;From the initial two weeks of usage, we have already noticed less time chatting about tasks, a better and more effective Q\u0026amp;A system (for our use, questions are posted as tasks), and better productivity reports using their brand new dashboard systems\u0026rdquo; summarizes Marco Guidoti, Plazi\u0026rsquo;s head of its Brazilian office.\nPlazi is very grateful that ClickUp decided to partner and provide our team with access to their powerful task management system.\nLinks: clickup.com\n","id":89,"permalink":"https://plazi.org/posts/clickup-supporting-plazi/","tags":["news"],"title":"ClickUp supporting Plazi with task management system "},{"categories":["news"],"contents":"\nThree months ago we started with Pensoft an effort (see also press release) to contribute to a better understanding of the SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) aka corona virus that led to the global COVID-19 pandemic, and in the followup joined the CETAF-Covid-19 task force. With the end of the three months long taskforce we present here our view. A multimedia presentation by the knowledge base task force sub group has been streamed on July 17.\nAccess to data in bat related biodiversity publications\nWe started our effort under the assumptions that taxonomic knowledge of bats, the main known source of zoonotic virus spill-overs to humans, is reasonably well documented. Liberating treatments and figures from digital copies of bat literature was thus our initial aim. The situation, however, turned out to be rather different.\nThe taxonomy of bats is, in fact, not up-to-date on the main taxonomic name services. The Catalogue of Life was missing many taxa and synonyms. Batnames.org, the community resource for bat research, had an online list, but to get the entire list, including synonyms, required contacting the authors first. Mammal Species of the World had an online list of bat species, including subspecies that could be downloaded as a MS-Excel spreadsheet, an excerpt of the published catalogue, and eventually the names would link to the PDF including the cited references. The most recent catalogue, the volume 9, Chiroptera in the Handbook of the Mammals of the World was published in 2019 as a book only priced at EUR 160, came with a CD-ROM of the cited literature. One section covered biology, another taxonomy. All these resources were short of changes of status of names, and publications augmenting the knowledge about a taxon.\nAccess to literature was only to some extent available, often only indirectly by the combination of taxon authority and year (e.g. Rhinolophus sinicus K. Andersen, 1905), at best as a bibliographic reference hardly ever complemented with a digital object identifier (DOI) or a link to a digital copy of the referenced work.\nWith this barrier, access to biotic interactions, behavioral traits such as co-roosting as source for lateral transmission of viruses, predator-prey relationships, a main use case by our task force is seriously limited, invoking a major effort to advance. Access to literature was furthermore complicated by the confinement which led to a separation of the scientists from their resources, such as collections of PDFs or hard copies.\nThe first conclusion thus is that at the launch of the task force work, much more time for preparation to data liberation, such as creating a catalogue of bats with extended bibliographic references as starting point to finding the original publications, and for that finding and contacting the appropriate scientists was needed. Even for groups with well curated catalogues, the step from a referenced bibliographic record to have all the digital copies at hand comes at a great cost.\nThe CETAF task force as an ad hoc group\nThe formation of the CETAF Covid-19 task force we joined early on, has been a very unusual experience. The very wide expertise in the group, the curiosity to find out what the other members do, and sharing of resources led to a very constructive team, as part of an equally inspiring task force membership. The expertise included taxonomy, ecology, biodiversity informatics, and skills from liberating data, cataloguing, building repositories, reuse of data in GBIF and GloBI to scientific analyses.\nAspects of the virus spill-over has been the guiding theme. Discussing and implementing the input workflow to answer respective research questions, being confronted with all the pitfalls, inspired and helped the creation of prospective publishing tools by Pensoft, GloBI and Plazi. The aim was to publish biotic interactions in a format that can be reused by machine without any further semantic conversion. It also inspired the insertion of biotic interactions specific metadata at Zenodo. This allows annotating relevant publications not only with the host and virus names mentioned in the article, but also with the specific interactions needed for further analyses.\nThe second conclusion is that the circumstances led to an ad hoc team of scientists that openly shared data, expertise and time that exceeded what research team normally has. This setup advanced substantially the knowledge of the biology of the bat virus interactions, with implications well beyond the theme of the task force itself.\nThe role of taxonomic names\nTaxonomic names are one way to organize the knowledge of biodiversity. Over time and with increasing knowledge, the names associated with taxon concepts can change, covering more or less inclusive concepts. These are based on specimens, a.k.a. vouchers, which would not necessarily change their names in due course, especially in the databases. Thus there is a substantial amount of data available which will not be discovered with the names in use in 2020 because the links to previous name usages are not available, or because specimens have been misidentified.\nThe taxonomic literature has a convention to cite previous taxonomic treatments and type the relationship, making it explicit when a name changes, and explaining at the same time why. This relationship can be tagged and are at the base of Synospecies, a tool based on facts (RDF triples) in a triple store provided by Franz. This service can provide the current status of a taxonomic name and all its alternatives, for example for the main bat focus species in Covid-19, Rhinolophus sinicus Andersen, 1905. Having the link to the treatment allows access to the data a scientist used to propose, augment or deprecate a taxon.\nFor bats, 1412 treatments covering 1072 taxonomic names, 505 treatment citations, and 63 new species from 95 publications are now available on Plazi. It also allows to link cited specimens in a treatment to the actual specimen and potentially to link a specimen to the treatments where it has been used.\nThe third conclusion is that we have new, complementary tools to mine and visualize data in the biodiversity literature, such as treatments and treatment citations as baseline for the catalogue of life and at the same time provide access to all the data about the taxa.\nAccess to and data extraction\nOur tool and service to mine biodiversity literature is the Plazi data liberation workflow. Its first step is preparing the respective literature for machine conversion. Two bat specific resources batnames.org and Mammal Species of the World were online accessible, whereas all the other previous catalogues only exist in printed format. As a consequence, we purchased printed copies of the catalogues, cut the spine of the books off, scanned, OCRed and deposited a digital copy in the Biodiversity Literature Repository and the Corona Host Virus community at Zenodo (e.g. Honacki et al. 1982, Corbet \u0026amp; Hill, 1991; Burgin, 2019). The Dutch digitization company Picturae supported the work by scanning for us our purchased Bats catalogues.\nAnother source has been BioOne and Acta Chiropterologica which agreed to collaborate to explore and provide access to data in their bat specific scientific journal. The conversion of 44 articles liberated 552 treatments, 401 figures, 26 new species (Summary; Detail).\nA third source has been a pre-processed copy of Linnaeus’ Systema Naturae from 1758, the starting point of modern taxonomy supplied by Richard Pyle from Zoobank. This converted copy originated from Dave Remsen who converted the original copy into text which then got refined by Richard Pyle (TreatmentBank; GBIF)\nA forth source has been from Gabor Csorba who provided an extensive collection of PDFs covering the target groups of Rhinolophidae and Hipposideridae- These files will now be made available on the Biodiversity Literature Repository and Coronavirus Host communities at Zenodo, and chained up to be mined.\nThe processed publications (see stats, list) are available on Plazi’s TreatmentBank and the Biodiversity Literature Repository. They are accessible through Plazi stats, or through more specific tools like Ocellus for images and Synospecies for names respectively. They are also available at GBIF and can easily be found, querying the DOI in the search box.\nNon-taxonomic literature including data on virus host relationship has been collected starting with the most recent review and research publications and following the cited publications. They have been uploaded to a staging area in Zotero and manually attributed with the host, virus and host-virus relationship and finally uploaded to BLR and the covhiho community at Zenodo. In the 160 articles 150 hosts, 247 viruses and 1,146 biotic interactions have been extracted.\nThe fourth conclusion is that even for such prominent taxa as bats are, taxonomic and biodiversity data are not readily accessible in a digital format. This access is very time-consuming to provide, and thus are largely absent in research. Even finding the publications requires quite some domain knowledge.\nAll the taxonomic treatments liberated are ultimately based on specimens, as physical specimens and more recently as genes or observations. This relationship can be very loose (Linnaeus refers to his collection) to be very specific by providing a list of materials citation including specimen codes to ultimately persistent identifiers. However, the infrastructure to find and obtain the respective identifiers, to a large extent, does not exist yet.\nThe seventh conclusion is that there is a gap between biodiversity research results and the raw data, that is the specimens. This is detrimental to biodiversity research because a lot of data and knowledge is omitted in situations like the Covid-19 pandemic or understanding future spill-overs.\nData reuse and research data life cycle\nAll the processed data are reused by GBIF as data sets. Through this, as a beginning, the chapters on Rhinolophidae and Hipposideridae of the most recent catalogue of bats, the Handbook of the Mammals of the World Vol. 9, as well as the very first, Linnaeus’ Systema Naturae 1758 with all the treatments are available in their taxonomic backbone.\nTogether with GloBI and Quentin Groom\u0026rsquo;s indexing, this data added to the 17,123 taxonomic names and 85,493 interactions to GloBI. GloBI indexed 138 of the publications on Zenodo.\nThe extracted bat names from this corpus of literature are being reused in the CoVoc, a vocabulary to mine literature in PubMed adding especially terms in regards of host and Corona virus relationships. To widen PubMed, a small project has been awarded to the Swiss Institute for Bioinformatics (Patrick Ruch Swiss Institute of Bioinformatics, including Plazi) to include taxonomic treatments as an additional publication type for text and data mining. Together this will allow discovering more published host virus relationships over a wider corpus of publications.\nTogether with Pensoft and GloBI a standard template for appendices in Pensoft publications have been developed and implemented that allows direct harvesting and conversion of linked FAIR data, such as specimen codes, accession number of biotic interactions. An exemplar application is the publication by Patterson et al., 2020.\nInfrastructure development\nMetadata in publications deposits allows you to find more efficiently target publications. Together with Jorrit Poelen (GloBI) and Alex Ioannids from Zenodo, custom metadata for host, virus and their specific relationships reported in the respective publication have been added to Zenodo, see e.g. Lau et al.. The metadata is a subset of and references the Molecular Interactions Controlled Vocabulary. This allows GloBi - and in fact anybody - to index this corpus of literature and reuse it. This is the second set of custom metadata implemented in Zenodo for biodiversity deposits, the former covering geo-coordinate pairs for observations listed in taxonomic treatment deposits. This development is using Zenodo’s potential as a general repository to allow customization for specific communities.\nMaking increasingly complex and larger corpora needs a high degree of automation, which is possible via the Zenodo’s API. Together with Zenodo, their infrastructure has been extended to deal with custom metadata, in this case for the biotic interactions.\nThe future of publishing\nAs the above shows, published knowledge locked into opaque PDF documents is not helpful so let’s turn publications into structured, searchable knowledge graphs. To respond quickly to new challenges the publications have to be based on the open access FAIR principles, implicit links to cited material have to be explicit, and the data therein has to be usable by machine.\nHaving the chance to work with publishers in the task force provides the opportunity to implement obvious changes to the existing publications system. The most obvious is to cut down as much as possible parts of or the entire Plazi workflow from finding to processing publications to liberate data. Though there are hundreds of millions of pages to process - including new pages every day - the only way forwards is to use semantic publishing. A big advancement in this task force has been the development of publications that include publishing data that can immediately be harvested and reused. The proof is the publishing of biotic interactions and indexing by GloBI (see publication, reused interaction)\nWhat’s after the CETAF Covid-19 task force?\nThe sheer magnitude of the unavailability of published research data is a daunting task, coupled with very limited funding for converting legacy publications into data. Data that is nota bene the foundation to improve the understanding of the dynamics of virus spill-over.\nEven within the task force group, there is insufficient awareness of the big data - tens of billions of facts - hidden in unstructured publications and the fledgling tools to extract, store, explore, cite and reuse this data. Research projects as proposed and started in the task force on virus spill-over will be the best communication platform. We will spend additional effort to communicate and build the needed awareness.\nWe will continue to explore funding possibilities for research projects formulated within the taskforce that integrate research questions and legacy literature.\nWe will make use of the momentum and extract all the treatments in the main mammal taxonomic catalogues to provide to the community access to the mammal classifications provided by the standards at the given time, and which is often the reference for name usage in ecological studies or the identification of specimens in collections that end up in GBIF. We believe that this might serve as a very good example of the value of having entire catalogus and their treatments available. It will also show the difference from having a name to being able to know what’s behind it.\nThese data liberation operations will be maintained thanks to the support of Arcadia. Their unique, generous support is instrumental to change the way taxonomic data is shared and accessible and to build underlying infrastructure together with Zenodo that make this change possible and sustainable.\nFinally our lesson learned, once more, is to invest in collaborations with publishers, specifically Pensoft, CETAF-European Journal of Taxonomy, the Museum National d’Histoire Naturelle, Paris publishing and BioOne to transition in an area where biodiversity data is from its inception digitally accessible and reused by aggregators like GBIF or ELIXIR.\nWe are looking forward to more task forces of this inspiring kind.\n","id":90,"permalink":"https://plazi.org/news/beitrag/time-for-an-interim-review-of-plazis-covid-19-related-activities/3e26b3bc95a4b39f0a2a9d7fccee8b19/","tags":["news"],"title":"Time for an interim review of Plazi’s Covid-19 related activities"},{"categories":["news"],"contents":"\nTaxonomic treatment. The taxonomic treatment is a well defined part of a publication about a particular taxon, in this case of the honey bee Apis mellifera, described by Linnaeus 1758 on page 576-577. It is based on the implicitly cited specimens in his reference collection.\nA milestone in the development of modern biology was the introduction of the standardization resulting from comparative and reference works. This led to a rapidly growing corpus of knowledge by learning from, referencing of, and building on previous work, today known as the data life cycle.\nCarolus Linnaeus added to the data life cycle in multiple ways, and because of this, became the founder of modern taxonomy. He produced 12 editions of Systema Naturae and only after the tenth edition did his ideas stabilize to a degree sufficient to serve as a solid foundation. That is why the 10th edition of \u0026ldquo;Systema naturae per regna tria naturae: secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis.\u0026rdquo; has been selected as the starting point of zoological nomenclature.\nLinnaeus is celebrated for his introduction of the Latin binomen, whereby each species is given a generic and a species name, such as Apis mellifera for the honey bee. This standard is still the basis for naming species, though with inevitable typographical errors as well as cryptic abbreviations that may be easily comprehended by trained humans; but pose challenges to processing by computers. This is why something more robust, such as a globally unique persistent identifier is needed for a more seamless access.\nA second important aspect of Linnaeus\u0026rsquo; work was his effort to build a monograph or catalogue that included all the currently existing knowledge about animals, following his parallel work on plants.\nA third important aspect was to present the species in a taxonomic hierarchy, whereby he used class, order and genus level, mentioned already in the title of the book “…secundum classes, ordines, genera”.\nA fourth forgotten aspect announced in the title “..cum characteribus, differentiis, synonymis, locis.” is the provision of a highly standardized taxonomic treatment for each species listed. The taxonomic treatment, the clearly delimited part of the publication, including a diagnosis of the taxon, a reference to previous work, an extended description, a note on the distribution and habitat are the fundamental units of the knowledge system that Linnaeus created.\nThe reference to previous treatments, though irrelevant in Linnaeus, because of the arbitrarily chosen starting date of modern taxonomy starts with his 10th edition, is the prerequisite to understand the development of the knowledge about a species. These treatment citations are not only links, but they also have other functions. They may cite and thus augment the cited treatment with new data, or they may state that a species is actually a synonym of another one, and thus end the life of a taxonomic name, but not the data therein, which is inherited by the senior synonym.\nEcological description can include links to other species, such as whether these are prey or predator, and thus define interactions between the species.\nSystema Naturae is based on Linnaeus’ reference collection, the first types in Zoology. (the insects are now housed at the Linnean Society in London). Making these implicit links between the treatment and the underlying types explicit is planned.\nSystema Naturae is without doubt the most cited publication in science, since each name usage in the sciences of one of the 4,819 names created refers to this volume. However the real number of citations is opaque because of the convention to not cite the publication of the authority of a name.\nThis conversion of Linnaeus 1758 has been made possible through the scan of the original work by the Missouri Botanical Garden, accessible via the Biodiversity Heritage Library and the Internet Archive, Dave Remsen, who converted the scan into text, and Richard Pyle from Zoobank who spent uncounted hours to convert the text into a database. He also added the Zoobank ID for each taxonomic name. Plazi converted this database into treatments, and uploaded them to TreatmentBank and the Biodiversity Literature Repository (BLR) where each taxon such as the honey bee has its own DOI.\nBy uploading of the treatment to BLR as a taxonomic treatment data type a Digital Object Identifier (DOI) is minted allowing the citation of a treatment similar to citing a scholarly publication. For Apis mellifera it is 10.5281/zenodo.3922706.\nSince Zoobank and Treatmenbank share the universally unique identifier (UUID: e.g. 9082C709-6347-4768-A0DC-27DC44400CB2 for Apis mellifera) for the same taxon, switching from the nomenclatural name to the treatment is just changing the URL from zoobank.org/NomenclaturalActs/ to treatment.plazi.org/id/.\nThe treatments and names in Systema Naturae are already reused by GBIF, and with that the first complete set of taxonomic names known at a given time added to the GBIF taxonomic backbone. It allows now to add protologs (the first taxonomic treatments of a new taxon) to all taxonomic treatments that cite directly or via intermediary name Linnaeus 1758 names.\nThe conversion is a contribution the the CETAF-Covid19 task force. Seven bat species have been known in 1758: Vespertilio spectrum, V. pespicillatus, V. vampyrus, V. spasma, V. auritus, V. leporinus and V. murinus.\nQuestions, suggestions or other contributions can be posted on Plazi community.\n","id":91,"permalink":"https://plazi.org/posts/linnaeus-systema-naturae/","tags":["news"],"title":"The taxonomic treatments from Linnaeus’ Systema Naturae, 1758, 10th edition, liberated"},{"categories":["news"],"contents":"On the 1st of April, 2020, a new copyright law was enacted in Switzerland. Amongst other revision points, it declares an extension of the protection of photographs by applying the notion of work to non-individual photographs, provides for a new copyright exception for the use of copyrighted works for purposes of scientific research, and introduces extended collective licenses as a new instrument for collective copyright regulation. As the Plazi workflow is organised and ruled by Swiss copyright law, Plazi had to adapt to these legal amendments.\nThe new exception to copyright for the use of works for scientific research purposes refers to any work, to which the researcher has a lawful access, and allows free copying and reuse for commercial or non-commercial scientific research. The exception is compulsory and overrules eventual licence agreements. The extension of the copyright protection for photographs widens the notion of \u0026ldquo;work\u0026rdquo; for this genre. Up to now, photographs qualified for copyright protection only if they proved a certain degree of individuality and originality. This criteria will not be applied to photographs any more. As a consequence, every photographer can now claim copyright protection also for his or her most standardised or trivial products.\nWhilst the first amendment alleviates the Plazi workflow, the extension of the protection for photographs could hamper these procedures as Plazi extracts this kind of photographs from biodiversity literature in order to make them findable and accessible. Therefore, Plazi had to make sure that the extraction of scientific data and images as it has been practised lawfully for more than 10 years, will fit also into this new legal framework of Swiss copyright law. In order to do so, we convened an extended collective license with ProLitteris, the Swiss collecting society dealing with rights in photographic works. By this agreement, ProLitteris authorises Plazi to re-use all published photos and other images for the purpose of indexing and making available the worldwide biodiversity literature in the context of BLR.\nAn extended collective license is an agreement between a collecting society representing a substantial number of right owners in a specific category of works and a specific user of such works that applies to members of this collecting society as well as to non-members. This legal instrument existed for decades in some European countries and must now be implemented into copyright law of all other EU countries as a follow-up of the Copyright Directive 2019/790/EU. In Switzerland, extended collective licenses have been introduced together with the copyright exception for the use of works for scientific research purposes and with the aforementioned extension of the protection of photographs.\nThanks to this agreement, Plazi can assure that, also in the next future, all the data and images extracted from biodiversity literature are available in the Plazi TreatmentBank and in the Biodiversity Literature Repository for free. The re-use of these data by third persons is ruled by the copyright regulation applicable to the re-user. In most cases, there is no copyright protection at all as data and standardised images are not copyrightable (see Egloff et al.). It might be different in countries with a special protection regime for non-individual photographs, as it has been introduced in Switzerland and as it exists in a few other European countries like Denmark, Germany, Italy, and Austria for example.\n","id":92,"permalink":"https://plazi.org/posts/agreement-between-plazi-and-prolitteris/","tags":["news"],"title":"Extended collective licence agreement between Plazi and ProLitteris: Plazi adopts to revised copyright law in Switzerland"},{"categories":["news"],"contents":"Read the Eurekalert! and Knoweldge Speak release.\nThe COVID-19 pandemic presumably started with the escape of the Coronavirus from its bat host to humans. To understand the original host, it is important to have access to relevant scientific knowledge about these organisms. The scientific results from charting the world’s biodiversity reside in a vast corpus, which is often “imprisoned” by paywalls, copyright laws or trapped in formats unfavorable to text and data mining. For the majority of the world’s species, there exist only one or a few articles providing descriptions of the species or adding some additional observations. Even for well-known groups such as birds and mammals, access to primary taxonomic literature requires extensive and time-consuming specialist searches. Bats, suspected hosts of COVID-19 and other viruses such as Ebola, are particularly poorly covered Catalogue of Life and ITIS, and most taxonomic information is locked within commercial closed-access books and scholarly articles.\nThe current COVID-19 pandemic is also just one of the many occasions in which rapid access to all possible data is crucial. There is already evidence for a possible link between the escape of SARS-like (coronaviruses) viruses from bats to humans. Potential hosts include a variety of animals, including pangolins, bats, snakes and civets. The evidence supporting these claims spans from the early 2000’s up to papers published shortly after the Wuhan outbreak (Li et al. 2005, Menachery et al. 2015, Hou et al. 2017, Zhou et al. 2020, Lam et al. 2020). Nonetheless, no dedicated large-scale study on potential hosts, nor efforts to mine data and compile the taxonomic information available for these known reservoirs have been made.\nFor that reason, and in alignment with the recently announced DiSSCo and CETAF COVID-19 Task Force intended to create an efficient network of taxonomists, collection curators and other experts from around the globe, Plazi together with Pensoft are launching an initiative to make broadly accessible taxonomic and other biological traits data about the hosts or vectors of the SARS-CoV-2 or other coronaviruses. We will locate, acquire publications relating to the virus’ hosts and deposit in a newly formed Coronavirus-Host Community, a repository hosted on the Zenodo platform, which will provide persistent open access to these publications, enhanced with taxonomy specific data derived from the sources though text and data mining processes. Currently accessible data on the Biodiversity Literature Repository is accessible here and will be shared with the Coronavirus-Host community.\nThe liberated data is open access and will feed automatically into GBIF (see example) and can be reused through the APIs (see an the Ocellus example).\nContributions can be made at various levels, from sending suggestions of articles to be added to the Zotero bibliographies public libraries on virus-hosts associations and on hosts\u0026rsquo; taxonomy (such as bats, pangolins or snakes and others), to help converting and FAIRize these articles. If you’re interested in collaborating, please email us at [email protected].\n","id":93,"permalink":"https://plazi.org/posts/coronavirus-hosts/","tags":["news"],"title":"Plazi and Pensoft launch an initiative to provide access to scholarly published data about Coronavirus hosts"},{"categories":["news"],"contents":"\nA striking example of the impact of open access on the usage of scientific publications. Around 2000 antbase.org, co-funded by the Atherton-Seidal Foundation at the Smithsonian Institution provided open access to the entire ant taxonomic literature. Within a very short time, this resource has been used world wide. DOI: 10.5281/zenodo.1343376\nFollowing discussions on Taxacom list server on the value on Open Science, here some points to consider.\nThe situation has changed regarding open access and open science. The EU fully requires open access to anything they fund. No funds are awarded to any institution that will not accept a commitment to open access. Many of our institutions signed up the Bouchout Declaration on Open Biodiversity Knowledge Management and open access is for example a central part of the development of DiSSCo – the Distributed System of Scientific Collections in Europe.\nMany of our science agencies signed up on DORA, the San Francisco declaration on alternative metrics, and increasingly even disregard citation indexes to evaluate scientists and proposals.\nIt is very obvious that open access opens an entire new door to the way we do science. It saves an enormous amount of time to access cited works, literature to specimens. It enables large studies that have not been possible before, and it enables reproducing research.\nIt improves our science, because many eyes have suddenly access to the data, data can be analyzed in context, including links to any cited material, that has not been possible until now.\nIn fact, it should be our ambition and goal that any publication is accessible through PubMed Central, BHL, BLR, taxonomy at GBIF or a similar global infrastructure, and the data therein is citable, such as figures, taxonomic treatments or materials cited.\nThis data can and is reused, see e.g. the last published EJT: It is not only accessible as PDF, but in various formats in the Biodiversity Literature Repository, in TreatmentBank or GBIF. The types are accessible, images are accessible to anybody anywhere at any time in the world. The scientists contribution is immediately accessible through services like the Bloodhound tracker, or it can be reused in knowledge systems like openbiodiv or Wikidata. And all the access points lead always back to the source publication.\nThe only stumbling block for most of the literature is that we even don\u0026rsquo;t know that a new species has been described, even worse, to a large extent we do not know what we know at all. This is a major reason for an utterly out of data catalogue of life, a broken link system from a taxonomic name to the taxonomic treatment, the referenced specimens, sequences, that is the door to the literature better knowledge about the species.\nOpen science in the digital internet era is a huge benefit to our science. It allows spreading knowledge instantaneously. This is what we want, we need and are obliged to do in the age of drastically disappearing biodiversity.\nOpen science is an advantage to science. It needs to be underpinned with an adequate infrastructure. It needs publishers that can publish in a semantic enhanced way so that the data is immediately reusable. It needs functional large scale services and projects such as IPNI, Zoobank, Catalogue of Life, Biodiversity Literature Repository, BHL, GBIF, DiSSCo or idigBio or large scale sequencing projects.\nOpen science is exactly what we need. We want to be able to critically review research results, such as what is at the base of the description of a new species: Which specimens, which characters, what kind of sequence or other data. We want to be able to understand the growth of data related to a taxon by making use of the citations of previous literature. Open science and its tools allow this.\nOpen science is not a threat or stupid, it makes your work visible, it raises the profile of taxonomy by allowing linking between specimens, sequences, taxonomic names and research results.\nOpen science will help us to overcome the logjam we have to create a Catalogue of Life with all the automation that is possible, curatorial tools to correct possible errors in the processing. It thus will help us to liberate us out of this incredible awkward situation that we do now know what we know because we have not learned how to publish properly nor deal with the daily increasing number of publication adding the estimated 500 Million pages of literature of biodiversity, that, among others, encompasses the entire catalogue of life.\nFunding for open science does not compete with our taxonomic research funds. Rather the opposite, if we can show that what vibrant and relevant field we work in, more money will be diverted for charting and understanding global biodiversity.\nFor the first time since Linnaeus, we have the chance to be able to build a system that provides access to all the knowledge we have, similar to the Systema Naturae at its time, only this time not a book but perhaps in a mobile app in your hand.\nOpen science also means collaboration, and this is happening at a grand scale, not least because our community can compete against science projects from other domains. It attracts funding, because we are devoted to open access, innovative, and make our data accessible to anybody anywhere at any time.\nFinally, it increases dramatically access from any place where biodiversity disappears the fastest: Any student, scientist or conservationist has access too, not just we in the North.\nTogether we are now building an incredible infrastructure – an infrastructure that is owned by the scientists, run by scientists for the scientists. An open infrastructure intended to anybody to preserve the worlds biodiversity to create innovations which create wealth and tax income that enables the science foundations or philanthropic funds to spend money on its development. Hopefully we can convince these funders to make a special effort to generating new and recovering existing knowledge about our biodiversity. An infrastructure that allows to document and give credit to each of the scientists contribution to chart the world\u0026rsquo;s biodiversity.\nLinks:\ndoi.org/10.5281/zenodo.1343376 ","id":94,"permalink":"https://plazi.org/posts/open-science-charting-biodiversity/","tags":["news"],"title":"On the Role of Open Science in Charting Biodiversity"},{"categories":["news"],"contents":"\nPlazi Minicurso at the Congresso Brasileiro de Zoologia, March 1, 2020\nTaxonomic literature is an almost untapped resource of data covering our scholarly knowledge of biodiversity. This includes an estimated 500 Million printed pages and is augmented annually with over 17.000 taxonomic treatments of species new to science. This data represents in fact an incredibly rich citation network, albeit with most of the citations implicit, only understandable by extensive domain expertise, and often with an insurmountable time required to follow up chasing the cited resources. Plazi’s aim, together with the support of Arcadia and CERN’s Zenodo repository, is to liberate the imprisoned data and make it findable, accessible, interoperable and reusable (FAIR).\nThe training course at the Congresso Brasileiro de Zoologia at Aguas de Lindoia is the first in a series of training courses in which Plazi is teaching interested parties how to use its data preparation tools and become a data liberator. The goal of the course has been to arrive at adding the liberated data to the Global Biodiversity Information Facility (GBIF) and thus make it widely accessible and reusable.\nThe 13 participants showed great interest, even spending more than an extra hour after the official end to upload their data. In total, they added from five publications 12 new species to GBIF’s taxonomic backbone that have not been there before, out of a total of 14 taxonomic treatments, 22 figures and 66 materials citations (occurrences in GBIF language).\nData liberated from taxonomic publications by participants in the minicurso. figs figures, mc materials citations, treat treatments; TB TreatmentBank, BLR Biodiversity Literature Repository, GBIF Global Biodiversity Information Facility. The details of the markup can be explored by clicking through the respective representations of the article. The following students liberated the data: Victor de Queiroz, Sarah Stephany Pereira Garcia, Henrique Webber Andriolo and Gabriel Vieiradoi\ndoi treat figs mc TB BLR GBIF 10.1016/j.rbe.2018.06.003 2 6 1 x x x 10.1590/1678-4766e2018023 1 3 2 x x x 10.26107/RBZ-2019-0005 7 4 23 x x x 10.26107/RBZ-2020-0002 2 4 2 x x x 10.5281/zenodo.3693112 2 5 38 x x x This minicurso is the launch of training courses that will be offered by Plazi to get more (citizen) scientists involved to broaden the community to liberate data from taxonomic publications.\n","id":95,"permalink":"https://plazi.org/posts/first-plazi-training-course/","tags":["news"],"title":"12 new species added to GBIF during the first Plazi training course at the Congresso Brasileiro de Zoologia"},{"categories":[null],"contents":"The Swiss Parliament adopted a new copyright law on 27 September, 2019. Amongst its many new elements, there is a large exception to copyright for the use of protected works for scientific research purposes (Art. 24d1). This exception is compulsory and overrules any subsequent license agreements. It allows copying and reuse of copyrighted works for commercial or non-commercial scientific research, and refers to any work to which the researcher has lawful access.\nThe new exception in Swiss coypright law corresponds to the existing exception for reproductions and extractions made by research organisations and cultural heritage institutions to carry out, for the purposes of scientific research, text and data mining of works or other subject matter, as provided by article 3 of Directive 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market. The scope of this new exception is even wider insofar as it allows text and data mining also for commercial scientific research and is applicable also to private scholars who may not be associated with a public research institution.\nThis new legislation is very good news for Plazi. It provides for supplementary legitimation of extracting and making available data from biodiversity literature as enabled by the Plazi workflow. This workflow was already organised and ruled according to the Swiss copyright law. The new copyright exception further empowers this process and allows for new developments.\nArt. 24d Verwendung von Werken zum Zweck der wissenschaftlichen Forschung\nZum Zweck der wissenschaftlichen Forschung ist es zulässig, ein Werk zu vervielfältigen, wenn die Vervielfältigung durch die Anwendung eines technischen Verfahrens bedingt ist und zu den zu vervielfältigenden Werken ein rechtmässiger Zugang besteht. Die im Rahmen dieses Artikels angefertigten Vervielfältigungen dürfen nach Abschluss der wissenschaftlichen Forschung zu Archivierungs- und Sicherungszwecken aufbewahrt werden. Dieser Artikel gilt nicht für die Vervielfältigung von Computerprogrammen. \u0026#160;\u0026#x21a9;\u0026#xfe0e; ","id":96,"permalink":"https://plazi.org/posts/2019/10/new-copyright-law-in-switzerland/","tags":["News"],"title":"New Copyright Law in Switzerland Empowers Scientific Research"},{"categories":["news"],"contents":"Am kommenden Montag (21.1.2019) berät die Kommission for Wissenschaft, Bildung und Kultur (WBK) des schweizerischen Ständerates die Revision des Urheberrechtsgesetzes. Nach dem jetzigen Revisionsentwurf sollen in Zukunft auch Fotografien, welche keinen individuellen Charakter haben, wie Werke der Kunst und der Literatur geschützt werden. Wir sind der Meinung, dass durch eine solche Bestimmung der wissenschaftlichen Lehre und Forschung, und letztlich unserer IT Industrie und Gesellschaft, erheblicher Schaden zugefügt wird, und zwar aus den folgenden Gründen:\nDer grösste Teil der heute entstehenden Fotografien sind digitale Bilder, nicht selten auch von Automaten oder auf eine möglichst standardisierte halbautomatische Weise hergestellt, die für wissenschaftliche Zwecke nützliche Fakten dokumentieren. Sie veranschaulichen Zustände, Veränderungen, Entwicklungen über längere Zeitabschnitte hinweg. Moderne Technologien erlauben es, diese Fotografien maschinell auszuwerten. Anhand dieser wissenschaftlichen Auswertung können wichtige Erkenntnisse in den Naturwissenschaften ebenso wie in der Medizin und den Sozialwissenschaften gewonnen werden, die andersweitig nicht möglich sind. Wissenschaftliche Lehre und Forschung ist darauf angewiesen, dass Daten zugänglich sind. Deshalb fordern Forschungsinstitutionen wie der Schweizerische Nationalfonds oder das Europäische Forschungsprogramm Horizon 2020 den freien Zugang zu wissenschaftlicher Information. Zu dieser Information gehören auch Daten in Form von Bildern. Ein urheberrechtlicher Schutz nicht individueller Fotografien macht einen Grossteil dieses Bildmaterials für die Wissenschaft unzugänglich. Es ist nicht möglich, den Zugang zu diesen Bildern auf dem Wege der Lizenzierung zu erlangen. Angesichts der Menge benötigter Bilder ist der Aufwand für eine Abklärung des Zugangs von Fall zu Fall viel zu gross. Die damit verbundenen Transaktionskosten wären allenfalls für die ganz grossen Datenverarbeiter wie Google oder Amazon verkraftbar, aber für keine einzige mit privaten oder staatlichen Mitteln finanzierte Forschungseinrichtung, für keinen IT-Betrieb und schon gar nicht für die immer wichtiger werdenden Citizen Sciences. Die geplante Vorschrift verhindert daher auch den Aufbau von IT-Startups im Bereich der Bilderkennung oder der Bildverarbeitung. Es ist unbestritten, dass Fotografien mit individuellem Charakter Werke der Literatur und Kunst darstellen und daher urheberrechtlich geschützt werden sollen. Das ist beim geltenden URG denn auch der Fall. Für die Ausweitung dieses Schutzes auf Fotografien, die keinerlei individuellen Charakter haben, gibt es aber keinen stichhaltigen Grund. Fotografien sind ein Kommunikationsmittel wie geschriebene oder gesprochene Texte auch. Wo sie individuellen Charakter haben, werden sie urheberrechtlich geschützt, wo dies nicht der Fall ist, darf es höchstens den Schutz gegen unlauteren Wettbewerb geben. Die vorgeschlagene Sonderregelung für Fotografien ohne individuellen Charakter ist ein massiver, durch nichts zu rechtfertigender Eingriff in die gesellschaftliche und insbesondere die wissenschaftliche Kommunikation. Völlig unverständlich ist schliesslich, dass die neue Regelung auch noch rückwirkend angewandt werden soll. Jede Fotografie in einer historischen oder wissenschaftlichen Sammlung oder Veröffentlichung wäre plötzlich ein geschütztes Rechtsobjekt, das nur noch mit der Einwilligung der – meist unbekannten – Fotografinnen oder Fotografen verwendet werden dürfte. Durch eine solche Regelung verliert die Schweiz den Zugang zu ihrem fotografischen Erbe. Aus diesem Grunde schlagen wir vor, den vorgeschlagenen Artikel 2 Abs. 3bis wieder aus der URG-Revisionsvorlage zu streichen.\nGeschäft 17.069: Urheberrechtsgesetz. Änderung\nNachträge\nMedienmitteilung WBK-S, 22. Sept. 2019, 17:00\nÄNDERUNG DES URHEBERRECHTSGESETZES\nDer Nationalrat behandelte in der Wintersession 2018 die Vorlage zur Revision des Urheberrechtsgesetzes sowie zur Genehmigung und Umsetzung zweier Abkommen der Weltorganisation für geistiges Eigentum (17.069). Für den Ständerat als Zweitrat hat die WBK nun die Vorberatung begonnen und eine Anhörung mit den folgenden Organisationen durchgeführt: swissuniversities, Schweizerischer Nationalfonds (SNF), Schweizer Buchhändler- und Verleger-Verband (SBVV), Arbeitsgruppe «Lichtbildschutz», Allianz der Konsumentenschutz-Organisationen, Genossenschaft der Urheber und Verleger von Musik (SUISA), Interessengemeinschaft Radio und Fernsehen (IRF) sowie Verband für Kommunikationsnetzwerke (SUISSEDIGITAL). Dabei wurden die Schwerpunkte auf Open Access in der Wissenschaft, den Schutz von Fotografien ohne individuellen Charakter, die Regelung zu Video-on-Demand, das zeitversetzte Fernsehen sowie auf die Vergütung für Hotels, Ferienwohnungen, Spitäler und Gefängnisse gesetzt. In der anschliessenden Eintretensdebatte unterstrich die Kommission die Notwendigkeit einer Modernisierung des Urheberrechtsgesetzes und ist mit 11 zu 0 Stimmen bei einer Enthaltung auf die Vorlage des Bundesrates eingetreten. Die Detailberatung wird an der nächsten Kommissionssitzung vom 12. Februar 2019 stattfinden.\nDie Detailberatung in der WBK-S ist am 12. Februar\n","id":97,"permalink":"https://plazi.org/posts/lichtbildschutz-artikel/","tags":["news"],"title":"Der geplante Lichtbildschutz Artikel (Art 2 Abs. 3bis) behindert Forschung und IT-Industrie"},{"categories":["news"],"contents":"\nVisualization of data liberated from 22 published articles in the journals Zootaxa and PLoS ONE during the training course at Leiden University\u0026rsquo;s Integrative Taxonomy course.\nThe taxonomic literature is a rich, high quality, and largely untapped source of biodiversity knowledge. Taxonomy helps us recognize species and map their distributions through text descriptions, images, and records of when and where they have been observed. These are the data we need to answer questions that are relevant to our world today, like setting conservation priorities and anticipating the effects of climate change on biodiversity and ecosystem functions that affect the lives of people.\nBut the true value of taxonomic data remains unrealized because basic biodiversity information remains fragmented and unevenly accessible. In many cases, taxonomic literature contains everything that is known about a species. Semantic enhancement of taxonomic literature provides a solution by mobilizing data elements within taxonomic articles. Standard tags label the content of various elements, including taxonomic names, descriptions, images, and specimen records, and allow them to be shared among online databases.\nIn December 2018, Leiden University’s Integrative Taxonomy course organized by Jeremy Miller featured a unit on semantic enhancement of taxonomic literature. Over the course of the three-day unit, nine students learned about cybertaxonomy through lectures and assigned readings, and used the GoldenGATE document editor to apply semantic tags to 22 taxonomic articles published in the journals Zootaxa and PLoS ONE. Subjects included vertebrates (fish, reptiles, and amphibians) and arthropods (insects, arachnids, millipedes, and crustaceans). In some cases, students checked and corrected semantic tags previously applied using an automated process; in others, students applied tags de novo. A total of 1026 specimen records from 85 species (including 54 new species) were marked, which means they are now freely available through Plazi’s TreatmentBank and are shared with the Global Biodiversity Information Facility. The majority of records came from Thailand, Russia, or Japan.\nThe biodiversity knowledge library is vast, with an estimated half a billion printed pages describing more than a million species and citing many millions of specimens. Semantic enhancement of taxonomic literature gives us a way to query biodiversity knowledge, making the data freely available for aggregation, exchange, and reuse. But because biodiversity is one of the most information-rich fields of human knowledge, we need more people to commit their time and expertise to marking up taxonomic literature. The experience of training a classroom of students to mobilize taxonomic data by using GoldenGATE is proof of concept that our curriculum is ready to share with more students.\n","id":98,"permalink":"https://plazi.org/posts/mineable-taxonomic-literature/","tags":["news"],"title":"Training for building and reusing your corpus of mineable taxonomic literature"},{"categories":["news"],"contents":"When Jérôme Constant published yesterday his taxonomic article on eurybrachid planthopper insects in the European Journal of Taxonomy, its data became immediately accessible as FAIR and open access data in TreatmentBank including taxonomic treatments, and the Biodiversity Literature Repository (BLR) as an article and figures.\nThis data is also at the same moment reused by GBIF to describe the dataset aka article, to list all its material citations, provide the taxonomic treatments for the included species, which is especially valuable for those new to the sciences or an individual occurrence. Ocellus is using the figures to index journals and provide a novel access to them. All the bibliographic references have been added to Refbank. Some of the references in the article include a Digital Object Identifier DOI, but most do not.\nIn addition to providing access to data that has been made citable, findable and reusable, the article deposit in BLR provides related identifiers for all the data extracted. They all resolve to a respective digital copy, i.e. the figure, taxonomic treatment or article.\nSince there are only a limited number of articles for which a DOIs is included in the bibliographic reference, they do not resolve to the digital object, which in most cases is a PDF. This, despite the fact, that most of the authors have on their computers PDFs of all the articles they cite.\nWouldn’t it be great if authors would make all the referenced articles accessible to everybody by creating a DOI for those that are not yet accessible? This would save a tremendous amount of time to any subsequent readers of their work, since they would not have to find, digitize, or copy the articles again. It would raise the number of taxonomic articles and treatments accessible for use in Wikicite and Wikidata. It would contribute towards building corpora of knowledge for taxa where all the published record is digital.\nBLR offers exactly such a service. It is part of Zenodo, one of the world’s leading repositories, especially for long tail scientific results that have no other home. This includes data that is underused because they are cumbersome to access, such as the millions of scientific figures or the taxonomic treatments that go largely unnoticed. Zenodo is part of one of the largest science experiments at CERN with a very powerful and sustainable IT infrastructure.\nZenodo is not unique in the way it makes its deposits discoverable and citable. It is using DataCite data types and digital object identifiers (DOI) that are a global standard and are becoming increasingly more discoverable through collaboration with CrossRef and ORCID discussed early in the coming year.\nContributors to Zenodo can upload records individually or in batches. Ideally though, access to scientific articles is provided by publishers, libraries or services like the Biodiversity Heritage Library or Archive.org or national services. The challenge though is that an expected 10% of the legacy literature is so far digital, and that there is no active program under way scanning at a scale that will cover the remaining 90%. For that reason, alternatives like BLR are decisive.\nAccess to a PDF and a minting a DOI is not a unique value BLR offers. This is the service providing a multitude of accesses points to the articles. It is the focus on dissemination, findability, citability of data within an article, and providing copies of the article in a machine readable formats providing all these links.\nFinally, having domain specific corpora of articles in one place will facilitate prioritizing the selection of journals for their processing to include their data too because the respective digital copies are at hand. This ongoing project and service is a contribution to open science supported by Arcadia, Plazi and Zenodo.\n","id":99,"permalink":"https://plazi.org/posts/blr-scholarly-taxonomic-articles/","tags":["news"],"title":"Why do we need BLR to store scholarly taxonomic articles?"},{"categories":["news"],"contents":"The Thursday November 29, 2018 (doi) issue of the Neue Zürcher Zeitung includes a “Gastkommentar” by me (Donat Agosti) arguing that the proposed insertion of an article protecting any kind of photographs of three dimensional objects in the revised copyright law of Switzerland is misguided (Art. 2 al.).\nIn the sciences, illustrations, including photographs, play a crucial role. They are one among many types of research data, and are used to visually document and compare results. Although in the current debate about access to research data in Switzerland and abroad the main focus is on open access to research articles, the movement is broadening to include open data and open science.\nLittle thought is given to the fact that scientific articles are often based on blocks of texts or images that are itself data that could be properly cited and reused in further analyses. One of these data types are scientific photographs. These images are taken in a systematic and therefore standardized way, and with the goal to contribute to a growing corpus of knowledge in their respective scientific domain. Not only are scientific images (and data) meant to be reused, their principal value (overwhelmingly more than their limited commercial value) is in their reusability. The internet and new developments in natural language processing (NLP) and machine learning (ML) provide the ability to leverage their full potential.\nFor example, the description of the world’s species includes not only tens of millions of scientific publications but also an even greater number of illustrations, of which, photographs play today a game changing role. With today\u0026rsquo;s new technical capabilities, the estimated billions of specimens in our natural history collections can be made accessible. Images document observations of a defined specimen in the field. All these images can be included in scientific publications that verify their identification and put them into context. This is one reason, why, the European Strategy Forum on Research Infrastructures (ESFRI) decided to fund the digitization of the specimens of European natural history museums (DiSSCo). At the same time, images can be the basis of an index to scientific publications, as the Biodiversity Literature Repository, a collaboration between Plazi, Zenodo/CERN and Pensoft, illustrates.\nThis example is just one of many ongoing projects that are mining scientific publications. Illustrations and visualizations play an important role in most scientific fields, but still are an underutilized asset for scientific discovery.\nThe proposed blanket protection of photographs will be a huge impediment to the scientific endeavour, and possibly stifle a fledgling movement to open up science, nota bene paid by tax payers money and requested by the Swiss Science Foundation. The protection of scientific, non artistic photographs should thus not be inserted into the Swiss copyright law.\n","id":100,"permalink":"https://plazi.org/posts/schnappschuesse-urheberrecht/","tags":["news"],"title":"Schnappschüsse und das Urheberrecht (Snapshots and copyright law)"},{"categories":["news"],"contents":"When an \u0026rsquo;ex banker\u0026rsquo; - Tim Robertson - meets an \u0026rsquo;ex taxonomist\u0026rsquo; - Donat Agosti - this can be very inspiring, especially when they discover that they share rather unexpectedly more than they ever expected. So, what is it?\nBoth share data lineages. The banker has bank accounts that are created, mutated and that have at a given time a certain value. The taxonomist has species that are described, mutated and at a given time have a certain name. Both of them have provenance, a documented lineage of data that includes the data origins, what happened to it and where it moves over time, a ledger in bankers term. It allows to trace the history, and in taxonomy it ought to allow to reproduce the history.\nEach change is documented with bank records, now almost completely automated and well secured contrasting to taxonomy, where neither accounts of species are maintained in registries nor are the changes linked to the an electronic registry but rather published in parts of scientific publications that not even can be cited.\nIn both system, there is a login mechanism. A highly secured authentication mechanism to log into the respective bank account and in contrast a very loose mechanism whereby a transaction has to be in a scholarly publication as defined by the Codes.\nBoth have some requirement on how to create a new account of create a new taxon. Whilst an actual money transfer is essential to create an account of mutate its value, it is required to fulfil certain criteria to create an available name for a new species or higher taxon, but one is left to believe that this conditions are fulfilled.\nHere then begins the difference that is in fact rather that taxonomy is many years behind the technological development in banking.\nIn taxonomy too, each transaction is documented by a taxonomic treatment, a piece of text labeled with the respective accepted taxonomic name published in scholarly publication. They together fill hundreds of millions of pages of taxonomic literature, refer often to scholarly illustrations, and in scientific tradition, cite earlier usage of the species in literature, either in a highly implicit form citing the first author and publication year or providing a complete history of subsequent taxonomic treatments.\nCitations are typed, that is they can just refer to a former usage and add more data, they can accept a change in the name by citing the taxonomic treatment where the change occured or they can argue for a change in the name, such as synonymized a taxon with another other, or creating a new combination after it has been discovered that this species belongs into a different genus altogether.\nMore importantly, in the taxonomic treatment reference is made to the specimens used to create the new species and its name (the holotype as the most decisive specimen). They are in most cases implicit and there is no electronic link to the respective physical object in our natural history collections.\nThe positive aspect is that we have provenance, a well documented lineage for all the currently accepted names through a very simple but efficiently corpus of linked taxonomic treatments. Another positive aspect is that the banks show us how to deal with large data.\nTaxonomists can essentially copy what the banks to produce a highly automated process to create an electronic version of the catalogue of life (sort of the central bank for taxonomic names). The currency would be the specimens that can be cited because they have certified digital copies, the conditions to create a new name could be checked following the conditions set by the Codes. An additional effort is needed to digitize all the old records.\nTaxonomic treatments play in this system a central role similar to the documented transactions in a bank allowing to recapitulate each step. Additionally, in taxonomy as a science, it allows to reproduce the discovery of new species or changes. The treatments summarize all the arguments a scientist used, and peer review accepted in many ways, in this process. And in good scientific tradition, taxonomic treatments include with their included treatment citations, all the information to build the catalogue of life.\nFormally, most of the technical elements and legal basis exist and are in operation, both for ongoing publications as well as processing the overwhelming corpus of legacy publications. Arcadia is supporting Plazi, in collaboration with CERN – Zenodo and Pensoft, and the European Journal of Taxonomy is collaborating to enhance and made this approach more popular, GBIF is a long term user of taxonomic treatments.\n","id":101,"permalink":"https://plazi.org/posts/provenance/","tags":["news"],"title":"Provenance – another look at taxonomic treatments and names"},{"categories":["news"],"contents":"\nZookeys at 10\nHappy Birthday Zookeys, and congratulations to Pensoft, its innovator and publisher.\nToday’s 10th anniversary of Zookeys is a great day for biodiversity. The discoveries reported in the press release and celebrated in the 770th issue of the journal are magnificent and open access, unlike the big bulk of taxonomic articles.\nThe real and unreported impact of Zookeys, its sister Biodiversity Data Journal and the many other natural history journals hosted by Pensoft is neither reported by Pensoft, widely recognized by the taxonomists nor the wider scientific community: The revolution caused, and enabled by, the technical changes adopted in the publishing of Zookeys and the subsequent journals.\nYes, Zookeys has been an early adopter of the open access paradigm. Ten years ago, it was novel and it would take quiet a courageous discussion for a commercial publisher to delve into a largely unknown business model. A business model whereby the publishing has to be paid with the consequence that the article is afterwards be open to anybody in the world. Now, this is widely required by science funders, but still, probably most of the fellow taxonomists don’t realize what it means, that anybody, well beyond the few colleagues, have access to the publication of a new species or other relevant results.\nBut this is only the beginning. Another very big step has been to make Zookeys the first taxonomy journal to be accepted at PubMed Central and thus expanding the coverage of the largest archive of biomedical literature to include taxonomy. This happened by changing from publishing in a traditional print/PDF way to join Plazi to develop together with US National Library of Medicine the first domain specific flavor of the widely used Journal Article Tag Suit used to import scholarly articles into PubMed and PubMed Central. This was not just a technical change. It had another widely unknown consequence to almost everybody.\nDuring the Linnaeus 250 anniversary celebration in Paris, Pensoft’s president Lyubomir Penev must have gotten convinced by Plazi’s contribution “1758 Binomen – 2008 e-publications” that publishing has to change so that machines can understand its content, in this context, the many taxonomic treatments that are communicated by the taxonomists in their millions of publications that include each single report of a new species and subsequent augmentations.\nToday the FAIR principles are a core element of open science, referring to findable, accessible, interoperable and reusable data elements. Having a tag set that allows to tag elements in a publications such as taxonomic treatments, other elements from geographic coordinates to scientific names to materials cited in publications, allows also to annotate them with persistent identifiers, so they can be cited, or link to the standard vocabularies such as the widely used Darwin Core. This system enables automatic, immediate annotation and dissemination of taxonomic data across platforms.\nZookeys has been the first journals that championed automatic minting of Zoobank ideas for new taxa. When GBIF celebrates today its 1 Billlion’s upload based on 39.570 datasets, Zookeys is among the 22,698 datasets extracted by Plazi, in this case fully automatically, from scholarly articles. This is in stark contrast from Plazi’s effort to convert imprisoned data from PDF based publications which needs to be done for all but the Pensoft publications.\nWith other words, Pensoft shows the way forward how an extremely costly expedition to discover known biodiversity, the complexity only to digitize volumes being shown by the Biodiversity Heritage Library. Having a tag set in place that allows tagging taxonomic and nomenclatural elements in place allows citing well beyond the usual citation of articles. Treatments cite treatments, including often qualifier such as synonymies, reference to protologues or just augmentation to earlier treatments which lends its well for linked open data applications like synospecies. In fact this opens the door to create the catalogue of life by machine, and with that frees time of thousands of editors in catalogues to do science.\nCitable treatments also allows linking cited specimens to referenced digital object increasing produced by projects like idigBio or the recently accepted ESFRI DiSSCo infrastructure in the near future. At the same time, having persistent identifiers for treatments allows to link specimens with the respective data in publications.\nAll Zookeys\u0026rsquo; figures and the articles themselves are submitted to the Biodiversity Literature Repository allowing citing individual figures. Together with the input by Plazi, this open access repository now includes over 180,000 scholarly images and 30,000 articles, all heavily annotated with links to related items, such as the taxonomic treatment a figure cites, or an article including figures of citing other figures. The 50,000+ figures submitted by Pensoft are part of an emerging image based index to the taxonomic literature.\nBut all the data being available this way add to another Billion, a billion of facts that will be available in the OpenBiodiv knowledge management system based on facts produced in a daily way by Pensoft and extracted by Plazi with the support from Arcadia and a productive collaboration with Zenodo at CERN.\nIt’s time to celebrate and be happy what has been achieved during the last ten years. But this success story should also be an encouragement to look optimistically into the next ten years. If I had a wish I would like to see an open biodiversity knowledge management system as the base of all our knowledge we generate daily and have the billion of facts speak by themsleves to stimulate discovering the known.\nLast but not least, I wish Pensoft a lot of energy and enthusiasm, innovation and enterpreneurship in support of biodiversity research, and an increasing awareness and adopter of this so far rather quiet revolution.\nLinks:\nzookeys.pensoft.net ","id":102,"permalink":"https://plazi.org/posts/zookeys-10th-anniversary/","tags":["news"],"title":"ZooKeys 10th anniversary"},{"categories":["news"],"contents":"\nVisual search results from the Biodiversity Literature Repository. Each of the images provides access to the taxonomic treatment of the species or the source article. Only figures from scholarly publications are provided.\nPlazi has received a grant of EUR 1.1 million from Arcadia – the charitable fund of Lisbet Rausing and Peter Baldwin – to liberate data, such as taxonomic treatments and images, trapped in scholarly biodiversity publications.\nThe project will expand the existing corpus of the Biodiversity Literature Repository (BLR), a joint venture of Plazi and Pensoft, hosted on Zenodo at CERN. The project aims to add hundreds of thousands of figures and taxonomic treatments extracted from publications, and further develop and hone the tools to search through the corpus.\nThe BLR is an open science community platform to make the data contained in scholarly publications findable, accessible, interoperable and reusable (FAIR). BLR is hosted on Zenodo, the open science repository at CERN, and maintained by the Switzerland-based Plazi association and the open access publisher Pensoft.\nIn its short existence, BLR has already grown to a considerate size: 35,000+ articles have been added, and extracted from 600+ journals. From these articles, more than 180,000 images have also been extracted and uploaded to BLR, and 225,000+ sub-article components, including biological names, taxonomic treatments or equivalent defined blocks of text have been deposited at Plazi’s TreatmentBank. Additionally, over a million bibliographic references have been extracted and added to Refbank.\nThe articles, images and all other sub-article elements are fully FAIR compliant and citable. In case an article is behind a paywall, a user can still access its underlying metadata, the link to the original article, and use the DOI assigned to it by BLR for persistent citation.\n“Generally speaking, scientific illustrations and taxonomic treatments, such as species descriptions, are one of the best kept ‘secrets’ in science as they are neither indexed, nor are they citable or accessible. At best, they are implicitly referenced,” said Donat Agosti, president of Plazi. “Meanwhile, their value is undisputed, as shown by the huge effort to create them in standard, comparative ways. From day one, our project has been an eye-opener and a catalyst for the open science scene,” he concluded. Though the target scientific domain is biodiversity, the Plazi workflow and tools are open source and can be applied to other domains – being a catalyst is one of the project’s goals.\nWhile access to biodiversity images has already proven useful to scientists, but also inspirational to artists, for example, the people behind Plazi are certain that such a well-documented, machine-readable interface is sure to lead to many more innovative uses.\nTo promote BLR’s approach to make these important data accessible, Plazi seeks collaborations with the community and publishers, to remove hurdles in liberating the data contained in scholarly publications and make them FAIR.\nThe robust legal aspects of the project are a core basis of BLR’s operation. By extracting the non-copyrightable elements from the publications and making them findable, accessible and re-usable for free, the initiative drives the move beyond the PDF and HTML formats to structured data.\nsource: Eurekalert!\nLinks:\nocellus.punkish.org\nplazi.org/news/beitrag/200000-deposits-at-the-biodiversity-literature-repository/e7f04bfb78ad7910350d9440125cbccf/\n","id":103,"permalink":"https://plazi.org/posts/arcadia-fund/","tags":["news"],"title":"Plazi and BLR receive €1.1 million from Arcadia to open up new biodiversity data"},{"categories":["news"],"contents":"This is the view of Donat Agosti, cofounder of Plazi.\nIn 1992, I wrote an article in the followup of the Rio Earth Summit in the Swiss newspaper Neue Zürcher Zeitung titled “Brauchen wir zu wissen, wie viele Arten es gibt?» Now, 2018, we are still discussing this question, and I am still convinced we should. Should, because we still don’t know know the answer. But must, because of the rapidly increasing, unprecendented loss of biodiversity.\nOur world has dramatically changed since 1992. I can’t tell you my argumentation in this newspaper article, because I have only a hardcopy somewhere, I can’t find it online, and I forgot 1. This is annoying in a world where everything seems to have a digital fingerprint. If it is only this one personal work, this might be acceptable. But this is the case for probably 90% of all the printed scientific publications covering the description of the world’s biodiversity. And this is the reason, we don’t know how many species we know, not to speak of how many there are.\nThe story doesn’t end here. We continue to publish. Not anymore in paper, but digitally. The largest part of these new research results are closed access, not registered, and we have no idea what data the article includes. Some months to years later some humans enter the data into respective databases, which might have a link to the article, but still access to the deep content - the data - is not possible. Each article includes in the average 7 images – highly important illustrations following well established standards with goal (in the mind) to create a seamless corpus of illustrations depicting the Earth’ species. The same holds true for the taxonomic treatments that include anything from descriptions, summaries of distribution, behavior, references to the observed specimens, to synonymy.\nThis is in stark contrast to what is happening elsewhere. The genomics community develops ever faster and more efficient methods to collect DNA sequences and building up their own systems to study the world’s diversity. The citizen scientists have in place incredible tools to collect data of their objects, producing monthly millions of observation records with very precise geodata and a sophisticate quality control to ensure that the identification of the record is correct. This data is now the main staple of the Global Biodiversity Information Facility (GBIF) and probably the only dataset to study changes in (bird) biodiversity sufficient for monitoring requested back in 1992 in the Convention on Biological Diversity. These are all alternative ways to discover and chart the world’s biodiversity with its own constraints.\nBack in 2003 with the help of the US NSF and Deutsche Forschungsgemeinschaft we started to a complementary approach which I like to think of the “Second wave of biodiversity discovery”: Discovering what we should know or with other words, what we have been publishing, and in fact what we continually publish. The idea is simple: if we have access to the data in all publications, even if we could not extract all of it, we could link it to what Tim Berners-Lee called the Knowledge Graph. This would work better if we model the taxonomic domain and discover the respective elements in the published literature, and more so if we explicitly identify (i.e., tag) and link upfront in the publishing process – an alternative few seriously are willing to discuss, let alone implement.\nThe user groups we organized at the American Museum of Natural History informed a research team which led to a first model put down in TaxonX schema which attempted to cover all the elements characteristic of taxonomic treatments. The lucky circumstances of the meeting of a very diverse team of scientists, from library, to computer, to biological sciences had also the advantage that we had many connections beyond taxonomy itself and a good overview what is happening in the various domains. One of the fruitful connections has been with the US National Center for Biotechnology Information of the National Library of Medicine, and the team that maintains the Journal Archival Tag Suite (JATS), which convinced us to use the lessons learned from TaxonX to create a taxonomy specific extension of JATS, which ultimately became TaxPub.\nIn 2008, at the Linnaean 250 year celebration in Paris, the Bulgarian publisher Pensoft has not only to agreed to consider the relevance of taxonomic treatments as the core element of taxonomic publishing that ought to be citable and retrievable from each respective taxonomic name, but also to change its publishing workflow to be based on JATS/Taxpub. This at the same time opened the door to Pensoft\u0026rsquo;s submission of taxonomic works into PubMed, another first.\nParallel to this, we run increasingly into the issue of copyright. Our activities have been on the radar of Kew Botanical Gardens who invited us to participate in a meeting about access. Together, we not only questioned the argument that copyright should be used and applied to protect the incredible and potentially highly valuable work done at the Garden. On the way home, writing a constitution became possibile. Within a very short time this converged with the insight that we had to “incorporate” and “brand” US, a loosely formed group of specialists with the same mission, to be more efficient. This let the founding skype on March 14, 2008, where Plazi Association was born.\nPlazi’s mission is to foster open access to taxonomic work and make this knowledge an integral part of the science infrastructure. Our collaboration with Zenodo at CERN, with whom we actively collaborated from their very beginning, is a very important element in our virtual expedition. It provides our community a stable, state of the art, for the time being unlimited repository, along with shared interests such as making each data object citable using DataCite DOIs, enhanced with links to related items, all with usage statistics, and fully automated upload and annotations. With this we (together with Pensoft) provide a repository to the community that is independent of Plazi’s fate and which allows to compare others in the fledgling DiSSCo. The lesson from our collaboration with Zenodo taught as that what we consider a huge data, is in fact just dwarfed by large scale science projects like Large Hadron Collider at CERN, and with that not to consider storage as a limiting factor again.\nDealing with a rapidly growing number of images became another challenge. How can we best make use of them in a repository that has been build for single, but very large datasets as opposed to many small images? How can we make use of image analyses to contribute to automated identification of specimens? Why not build the world’s index of scientific taxonomic illustrations and make it another gateway to find out what we know about our species and in which publications? Suddenly having access to so many liberated images which nobody had ever before is one of the most stunning results of our expedition. It also shows that discoveries can not all be planned – we originally started focused on textual objects and later realized that illustrations are scientific data as well.\nToday, at our tenth anniversary, we recognize that we are probably still far away from being engulfed into the necessary real large-scale expedition to discover the known biodiversity. With increasing experience, we have learned about the challenges. But we are even more convinced that making it integral the body of global knowledge; readily findable and citable, so that the work in related fields \u0026ndash; genomic, citizen-science and museum collection digitization \u0026ndash; can be linked, and that they in turn can make use of the existing knowledge, is decisive to ultimately conserve the world\u0026rsquo;s biodiversity.\nWe are proud that we have managed to develop ways to highly automatically open-up scientific publications, provide long term, sustainable access to all the data therein, and foster the debate on Open Access.\nWe are aware of many shortcomings, but they help to look into the future, to find solutions and continually stay engaged into a grand, exciting area of discovery.\nAll this development would not have been possible without a continued support from the US NSF and DFG (Collaborative Research: Development of New Digital Library Applications), the European Union Framework Program 7 (FP 7: ViBRant, pro-iBiosphere, EU BON), Horizon 2020 (ICEDIG), Zenodo, the University of Massachusetts (Boston) and a very prolific collaboration with our partner Pensoft. Last but not least, the incredible dedication and investment of voluntary work, sympathetic partners and in kind contributions through the last ten years has been, and still is, a main pillar of Plazi’s dynamic to uncover known biodiversity.\nIn future News Items, specific aspects of our experiences following our vision to “discover the known biodiversity\u0026quot; will follow, as well as other team members views.\nThanks to Neue Zürcher Zeitung I got a copy of the article, and thanks to Zenodo it will be accessible from now on (DOI: 10.5281/zenodo.1198575)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","id":104,"permalink":"https://plazi.org/news/beitrag/10-years-of-plazi-on-expedition-to-discover-the-known-biodiversity/2136104073c3aabdebe6c5067d767a52/","tags":["news"],"title":"10 years of Plazi. On expedition to discover the known biodiversity"},{"categories":["news"],"contents":"This is the view of Donat Agosti, cofounder of Plazi.\nIn 1992, I wrote an article in the followup of the Rio Earth Summit in the Swiss newspaper Neue Zürcher Zeitung titled “Brauchen wir zu wissen, wie viele Arten es gibt?» Now, 2018, we are still discussing this question, and I am still convinced we should. Should, because we still don’t know know the answer. But must, because of the rapidly increasing, unprecendented loss of biodiversity.\nOur world has dramatically changed since 1992. I can’t tell you my argumentation in this newspaper article, because I have only a hardcopy somewhere, I can’t find it online, and I forgot 1. This is annoying in a world where everything seems to have a digital fingerprint. If it is only this one personal work, this might be acceptable. But this is the case for probably 90% of all the printed scientific publications covering the description of the world’s biodiversity. And this is the reason, we don’t know how many species we know, not to speak of how many there are.\nThe story doesn’t end here. We continue to publish. Not anymore in paper, but digitally. The largest part of these new research results are closed access, not registered, and we have no idea what data the article includes. Some months to years later some humans enter the data into respective databases, which might have a link to the article, but still access to the deep content - the data - is not possible. Each article includes in the average 7 images – highly important illustrations following well established standards with goal (in the mind) to create a seamless corpus of illustrations depicting the Earth’ species. The same holds true for the taxonomic treatments that include anything from descriptions, summaries of distribution, behavior, references to the observed specimens, to synonymy.\nThis is in stark contrast to what is happening elsewhere. The genomics community develops ever faster and more efficient methods to collect DNA sequences and building up their own systems to study the world’s diversity. The citizen scientists have in place incredible tools to collect data of their objects, producing monthly millions of observation records with very precise geodata and a sophisticate quality control to ensure that the identification of the record is correct. This data is now the main staple of the Global Biodiversity Information Facility (GBIF) and probably the only dataset to study changes in (bird) biodiversity sufficient for monitoring requested back in 1992 in the Convention on Biological Diversity. These are all alternative ways to discover and chart the world’s biodiversity with its own constraints.\nBack in 2003 with the help of the US NSF and Deutsche Forschungsgemeinschaft we started to a complementary approach which I like to think of the “Second wave of biodiversity discovery”: Discovering what we should know or with other words, what we have been publishing, and in fact what we continually publish. The idea is simple: if we have access to the data in all publications, even if we could not extract all of it, we could link it to what Tim Berners-Lee called the Knowledge Graph. This would work better if we model the taxonomic domain and discover the respective elements in the published literature, and more so if we explicitly identify (i.e., tag) and link upfront in the publishing process – an alternative few seriously are willing to discuss, let alone implement.\nThe user groups we organized at the American Museum of Natural History informed a research team which led to a first model put down in TaxonX schema which attempted to cover all the elements characteristic of taxonomic treatments. The lucky circumstances of the meeting of a very diverse team of scientists, from library, to computer, to biological sciences had also the advantage that we had many connections beyond taxonomy itself and a good overview what is happening in the various domains. One of the fruitful connections has been with the US National Center for Biotechnology Information of the National Library of Medicine, and the team that maintains the Journal Archival Tag Suite (JATS), which convinced us to use the lessons learned from TaxonX to create a taxonomy specific extension of JATS, which ultimately became TaxPub.\nIn 2008, at the Linnaean 250 year celebration in Paris, the Bulgarian publisher Pensoft has not only to agreed to consider the relevance of taxonomic treatments as the core element of taxonomic publishing that ought to be citable and retrievable from each respective taxonomic name, but also to change its publishing workflow to be based on JATS/Taxpub. This at the same time opened the door to Pensoft\u0026rsquo;s submission of taxonomic works into PubMed, another first.\nParallel to this, we run increasingly into the issue of copyright. Our activities have been on the radar of Kew Botanical Gardens who invited us to participate in a meeting about access. Together, we not only questioned the argument that copyright should be used and applied to protect the incredible and potentially highly valuable work done at the Garden. On the way home, writing a constitution became possibile. Within a very short time this converged with the insight that we had to “incorporate” and “brand” US, a loosely formed group of specialists with the same mission, to be more efficient. This let the founding skype on March 14, 2008, where Plazi Association was born.\nPlazi’s mission is to foster open access to taxonomic work and make this knowledge an integral part of the science infrastructure. Our collaboration with Zenodo at CERN, with whom we actively collaborated from their very beginning, is a very important element in our virtual expedition. It provides our community a stable, state of the art, for the time being unlimited repository, along with shared interests such as making each data object citable using DataCite DOIs, enhanced with links to related items, all with usage statistics, and fully automated upload and annotations. With this we (together with Pensoft) provide a repository to the community that is independent of Plazi’s fate and which allows to compare others in the fledgling DiSSCo. The lesson from our collaboration with Zenodo taught as that what we consider a huge data, is in fact just dwarfed by large scale science projects like Large Hadron Collider at CERN, and with that not to consider storage as a limiting factor again.\nDealing with a rapidly growing number of images became another challenge. How can we best make use of them in a repository that has been build for single, but very large datasets as opposed to many small images? How can we make use of image analyses to contribute to automated identification of specimens? Why not build the world’s index of scientific taxonomic illustrations and make it another gateway to find out what we know about our species and in which publications? Suddenly having access to so many liberated images which nobody had ever before is one of the most stunning results of our expedition. It also shows that discoveries can not all be planned – we originally started focused on textual objects and later realized that illustrations are scientific data as well.\nToday, at our tenth anniversary, we recognize that we are probably still far away from being engulfed into the necessary real large-scale expedition to discover the known biodiversity. With increasing experience, we have learned about the challenges. But we are even more convinced that making it integral the body of global knowledge; readily findable and citable, so that the work in related fields \u0026ndash; genomic, citizen-science and museum collection digitization \u0026ndash; can be linked, and that they in turn can make use of the existing knowledge, is decisive to ultimately conserve the world\u0026rsquo;s biodiversity.\nWe are proud that we have managed to develop ways to highly automatically open-up scientific publications, provide long term, sustainable access to all the data therein, and foster the debate on Open Access.\nWe are aware of many shortcomings, but they help to look into the future, to find solutions and continually stay engaged into a grand, exciting area of discovery.\nAll this development would not have been possible without a continued support from the US NSF and DFG (Collaborative Research: Development of New Digital Library Applications), the European Union Framework Program 7 (FP 7: ViBRant, pro-iBiosphere, EU BON), Horizon 2020 (ICEDIG), Zenodo, the University of Massachusetts (Boston) and a very prolific collaboration with our partner Pensoft. Last but not least, the incredible dedication and investment of voluntary work, sympathetic partners and in kind contributions through the last ten years has been, and still is, a main pillar of Plazi’s dynamic to uncover known biodiversity.\nIn future News Items, specific aspects of our experiences following our vision to “discover the known biodiversity\u0026quot; will follow, as well as other team members views.\nThanks to Neue Zürcher Zeitung I got a copy of the article, and thanks to Zenodo it will be accessible from now on (DOI: 10.5281/zenodo.1198575)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","id":105,"permalink":"https://plazi.org/posts/10-years-of-plazi/","tags":["news"],"title":"10 years of Plazi. On expedition to discover the known biodiversity"},{"categories":["news"],"contents":"\nOcellus: Search term \u0026ldquo;Perlidae\u0026rdquo;. All the images are provided with a link to the source.\nThe wealth of data encapsulated in scientific publications is a well-guarded secret. The recent discussions around Open Access are centered around open access to the articles, and thus little effort is made to build an index, for example, of all the illustrations therein. This is even more astounding in descriptive sciences, such as biological taxonomy, dealing with discovering and describing the world’s biological diversity, where each article includes an average of 7.2 illustrations.\nThese illustrations are made for the purpose to illustrate and build a comparative corpus of data to describe and identify – by comparing highly standardized illustrations – the world’s species. From a legal point of view, these figures to not qualify as work - something unique, innovative, novel - and thus are not copyrighted and can be re-used. The citation of them is not just science means, but necessary to understand the origin, not only the author but increasingly the specimen and all the data attached to it.\nBased on these presumptions, through a daily data extraction workflow, over 172.000 published, scientific illustrations have been deposited to The Biodiversity Literature Repository. Together with articles, more than 200.000 all together. Each deposit includes citation of the source, but more importantly also taxonomic treatments deposited in TreatmentBank, referring to those illustrations.\nThese deposits represent over 50% of all the deposits in Zenodo, in which the Biodiversity Literature Repository is one of their communities. At the same time, the occupy only 1.2% of the physical space of Zenodo. This calls for a rapid expansion of the process to add more data.\nThe value of Zenodo is its robustness, speed of service, standardization, minting of persistent identifiers (i.e. DataCite DOIs), the emphasis on adding related items, machine upload, the export in various formats and sustainability. This allows, and necessitates, building applications on top of it.\nCurrently the Biodiversity Literature Community is working on Ocelllus to provide a visual access to the data, and to document the API so others can be encouraged to use this unique resource.\nThe Biodiversity Literature Repository is an open community run currently by Plazi and Pensoft. Publishers and scientists are encouraged to help to contribute to enhance a resource.\nSource: EurekAlert!\nLinks:\nocellus.punkish.org ","id":106,"permalink":"https://plazi.org/posts/200000-deposits-at-blr/","tags":["news"],"title":"200.000 deposits at the Biodiversity Literature Repository!"},{"categories":["news"],"contents":"\nPensoft - Plazi project to produce semantically enhanced data\nOpen Science is based on open data. Data per se is not copyrighted and thus freely accessible. However it is kept in well maintained prisons, such as hard disks on a scientist’s desk to large password protected databases to publications that lost all their semantic structure in transition from a scientists lab to the publishers print shop. To paraphrase, scientist do all to loose structure in their data in publications.\nPlazi does the opposite: it does all to discover data in publications and make it machine readable, ready for Open Science. The increasing amount of extracted, cited and reused data in the Biodiversity Literature Repository with over 170,000 illustrations extracted and open accessible or the many taxonomic treatment in TreatmentBank (220,000) and reused in GBIF is a promising sign that this kind of extraction at production level is possible.\nPensoft is one of the world’s leading scientific publishers that begun in 2010 to publish semantically enhanced publications: publications that can be understood by machine and parts that can immediately be reused.\nAs a spin-off from pro-iBiosphere, a European Union Framework Program 7 project, Pensoft and Plazi together continued to implement the initiated Open Biodiversity Knowledge Management System OpenBiodDiv. OpenBiodiv is based on Linked Open Data Technology, a triple store where all the facts are stored, and an interface that allows to create simple to complex queries to efficiently use all the data in publications.\nPublishing and extracting data and make it widely accessible comes at a cost. Using complementary technology to deal with prospective publications and legacy publications respectively, Pensoft and Plazi join forces to offer a highly customizable service from publishing directly semantically enhanced articles, to converting unstructured publications into findable, accessible, interoperable and reusable data. The data is accessible through the Biodiversity Literature Repository, TreatmentBank or OpenBioDiv.\nLinks:\ndoi.org/10.5281/zenodo.1197129 ","id":107,"permalink":"https://plazi.org/posts/building-openbiodiv/","tags":["news"],"title":"Pensoft and Plazi join forces to expedite building the OpenBioDiv"},{"categories":["news"],"contents":"2017 has been a productive year at Plazi.\n9,601 articles have been processed from 173 different journals resulting in 82,082 taxonomic treatments, 62,958 scientific figures uploaded to the Biodiversity Literature Repository (BLR), including 314,881 bibliographic references of which 37,120 have a digital object identifier (DOI). The 3,177 articles published in 2017 that we mined for data included one new family, 330 new genera and 5,145 new species, and a total of 30,091 taxonomic treatments. The TreatmentBank now includes 216,868 taxonomic treatments from 26,727 articles; of the 850,781 bibliographic references 79,205 have DOI. The Biodiversity Literature Repository now includes 167,681 open access figures and 12,586 open access publications. Data is transferred on a daily base to the Global Biodiversity Information Facility (GBIF) and to NCBI making Plazi one of the largest name providers to the taxonomic backbone. Access to BLR has been greatly improved with the creation of standards-compliant and well-documented Zenodeo API, a nodejs (server-side JavaScript) API that queries the Zenodo API with more BLR-specific queries. A new web application called Ocellus, now provides improved access to the image store in the BLR. 2018 promises to be an exciting year at Plazi.\nOn January 1, ICEDIG has started to build the infrastructure to digitize European Natural History collections as part of a larger initiative DiSSCo to provide access to this valuable content. Plazi is responsible for drafting the data sharing policy and collaborating with Zenodo/CERN to build a demo repository for the digitized products within ICEDIG. On January 1, the collaboration with the European Journal of Taxonomy (EJT) has started to produce a semantically enhanced version of its articles based on a novel archival version a the Journal Archival Tag Suits Taxpub flavor. This will allow dissemination of EJT’s data more widely, for example, to GBIF, and will allow text and data mining, and visualization of the content more easily. In 2018, Plazi will offer a service to convert legacy or traditionally published articles into semantically enhanced publications. In 2018, together with GBIF, we hope to establish a workflow from publications to GBIF that provides immediate access to the names of new described species and the links to their data, such as treatment, tables, figures, and hopefully increasingly the cited material. We aim at providing access to the taxonomic treatments and figures of over 8,500 of half of the new species described annually. We wish that in 2018, the keepers of scientific collections start using citable, standardized, automatically discoverable identifiers for the digital copies of their specimens. We also hope that the publishers include those identifiers in their taxonomic works. We hope to establish a workflow from data extracted within the Plazi workflow to Wikidata and Wikicite. ","id":108,"permalink":"https://plazi.org/posts/happy-new-year-2018/","tags":["news"],"title":"Happy New Year 2018"}]