Skip to content

Minimal WPUB for a scholarly paper (of sort)

Ivan Herman edited this page Jun 5, 2018 · 8 revisions

As agreed on the call on 2018-06-04, the content of this page has been transferred to:

https://github.com/w3c/wpub/tree/master/experiments/w3c_rec


Inspired by Dave's minimal WPUB for a book, I tried to create one for what is equivalent to a scholarly paper. I wanted a real-life example; to avoid copyright issues, I took a W3C document instead: the Model for Tabular Data and Metadata on the Web. I believe that, as far as a WPUB goes, it is equivalent to a scholarly paper.

The interesting points of this publication, from our point of view:

  • It is a single document publication. Ie, the entry point and the main content is the same HTML resource.
  • The publication already has a TOC (as generated for the recommendation by respec): its structure is a section element with a ul. It is not a nav, thus. And, of course, they do not use doc-toc. In a new WPUB these should be slightly updated, of course, depending on what the final structure is.
  • Because it is a single document publication, it is o.k. to use the title HTML element (as the spec says) for the Title infoset item, it is not necessary to use the relevant schema.org name property.
  • The publication refers to further HTML files that are not in the main thread of the paper, but may essential for the publication (i.e., they should be cached/offlined!), namely:
    • a diff file comparing the document to its previous incarnation
    • a separate html file used for a longdesc value for a diagram
  • The publication refers to a number of CSV and Excel files, as well as images in different formats, that may be essential for the content of the paper

In other words, the "boundaries" of the publication should include (beyond the CSS files used for rendering) references to other resources. These should be listed explicitly in the resource list of the publication in my view. The document also refers to a number of other HTML files (e.g., in the references) which should not be part of the boundaries, ie, should not be cached/offlined.

I have created two WPUB skeletons. The simple version has the strict minimum according to our spec. It relies on a number of (reasonable) defaults: the language in the manifest is en-US, the names of persons is enough, the references to svg, png, csv, etc, files do not require media type setting because they are all "well known" to browsers. The complex version adds a number of extra metadata entries to, say, the persons, all using the relevant schema.org entries, while still referring to the values listed in our infoset (schema.org has many many more metadata entries that could be used, of course). B.t.w., I have added the various @type values although, in real practice, I am not sure they are necessary (can be deduced from the values).

The resources property is not a schema.org term, it is what we could use for the 'Resource List' infoset item. Per JSON-LD rules, this property, with the subtree underneath, will be ignored by any JSON-LD processor; I guess this is a feature not a bug at this point.

Simple version

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Model for Tabular Data and Metadata on the Web</title>
    <link href="#wpm" rel="publication" />
    ...
    <script id="wpm" type="application/ld+json">
    {
        "@context"              : [
            "https://schema.org",
            {
                 "publ-resources" : null,
                 "publ-toc"       : null
            }
        ],
        "@id"                   : "http://www.w3.org/TR/tabular-data-model/",
        "url"                   : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
        "creator"                : [
            {
                "@type"         : "Person",
                "name"          : "Jeni Tennison",
            },
            {
                "@type"         : "Person",
                "name"          : "Gregg Kellogg",
            },
            {
                "@type"         : "Person",
                "name"          : "Ivan Herman",
            }
        ],
        "datePublished"         : "2015-12-17",
        "publ-resources"        : [
            "datatypes.html",
            "datatypes.svg",
            "datatypes.png",
            "diff.html",
            "test-utf8.csv",
            "test-utf8-bom.csv",
            "test-utf16.csv",
            "test-utf16-bom.csv",
            "test.xls"
        ],
        "publ-toc" : "#toc"
    }
    </script>
</head>
<body>
    ....

    <section id="toc">
        <h2 resource="#h-toc" id="h-toc" class="introductory">Table of Contents</h2>
        <ul class="toc">
            <li class="tocline"><a class="tocxref" href="#intro"><span class="secno">1. </span>Introduction</a></li>
            ...
        </ul>
    </section>
    ...

</body>
</html>

Complex version

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Model for Tabular Data and Metadata on the Web</title>
    <link href="#wpm" rel="publication" />
    ...
    <script id="wpm" type="application/ld+json">
    {
        "@context"              : [
            "https://schema.org",
            {
                 "publ-resources" : null,
                 "publ-toc"       : null,
                 "@language"      : "en-US"
            }
        ],
        "@id"                   : "http://www.w3.org/TR/tabular-data-model/",
        "url"                   : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
        "accessMode"            : ["textual", "visual"],
        "accessModeSufficient"  : ["textual"],
        "editor"                : [
            {
                "@type"         : "Person",
                "name"          : "Jeni Tennison",
                "givenName"     : "Jeni",
                "familyName"    : "Tennison",
                "affiliation"   :  {
                    "name"  : "The Open Data Institute",
                    "url"   : "http://theodi.org/"
                }
            },
            {
                "@type"         : "Person",
                "@id"           : "http://greggkellogg.net/",
                "name"          : "Gregg Kellogg",
                "givenName"     : "Gregg",
                "familyName"    : "Kellogg",
                "affiliation"   : {
                    "name"  : "Kellogg Associates",
                    "url"   : "http://kellogg-assoc.com/"
                }
            }
        ],
        "author"                : [
            {
                "@type"         : "Person",
                "name"          : "Jeni Tennison",
                "givenName"     : "Jeni",
                "familyName"    : "Tennison",
                "affiliation"   :  {
                    "name"  : "The Open Data Institute",
                    "url"   : "http://theodi.org/"
                }
            },
            {
                "@type"         : "Person",
                "@id"           : "http://greggkellogg.net/",
                "name"          : "Gregg Kellogg",
                "givenName"     : "Gregg",
                "familyName"    : "Kellogg",
                "affiliation"   : {
                    "name"  : "Kellogg Associates",
                    "url"   : "http://kellogg-assoc.com/"
                }
            },
            {
                "@type"         : "Person",
                "@id"           : "https://www.w3.org/People/Ivan/",
                "name"          : "Ivan Herman",
                "givenName"     : "Ivan",
                "familyName"    : "Herman",
                "affiliation"   : {
                    "name"  : "World Wide Web Consortium",
                    "url"   : "https://www.w3.org"
                }
            }
        ],
        "datePublished"         : "2015-12-17",
        "dateModified"          : "2015-12-17",
        "publ-resources"        : [
            "datatypes.html",
            "datatypes.svg",
            "datatypes.png",
            "diff.html",
            {
                "@type"         : "StructuredValue",
                "url"           : "test-utf8.csv",
                "fileFormat"    : "text/csv"

            },
            {
                "@type"         : "StructuredValue",
                "url"           : "test-utf8-bom.csv",
                "fileFormat"    : "text/csv"

            },
            {
                "@type"         : "StructuredValue",
                "url"           : "test-utf16.csv",
                "fileFormat"    : "text/csv"

            },
            {
                "@type"         : "StructuredValue",
                "url"           : "test-utf16-bom.csv",
                "fileFormat"    : "text/csv"

            },
            {
                "@type"         : "StructuredValue",
                "url"           : "test.xls",
                "fileFormat"    : "application/vnd.ms-excel"
            },
            {
                "@type"         : "StructuredValue",
                "url"           : "test.xlsx",
                "fileFormat"    : "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
            }
        ],
        "publ-toc" : "#toc"
    }
    </script>
</head>
<body>
    ....

    <section id="toc">
        <h2 resource="#h-toc" id="h-toc" class="introductory">Table of Contents</h2>
        <ul class="toc">
            <li class="tocline"><a class="tocxref" href="#intro"><span class="secno">1. </span>Introduction</a></li>
            ...
        </ul>
    </section>
    ...

</body>
</html>