Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Document Properties and Cardinalities Defined in SHACL Shapes? #104

Open
tobiasschweizer opened this issue Nov 26, 2021 · 23 comments

Comments

@tobiasschweizer
Copy link

Hi there

First of all: nice tool!

I have the following use case: I define SHACL Shapes to define a subset of schema.org types (classes) and properties (following https://datashapes.org/schema).

What I get looks like this:
Thing_class

Thing_shape

In the addition to the SHACL shapes, I added a few RDF(S) definitions to the graph:

    {
 
    Shapes (ThingShape, CreativeWorkShape) ...
  
    },
    {
      "@id": "schema:Thing",
      "@type": "rdfs:Class"
    },
    {
      "@id": "schema:name",
      "@type": "rdf:Property",
      "schema:domainIncludes": {
        "@id": "schema:Thing"
      },
      "schema:rangeIncludes": {
        "@id": "xsd:string"
      }
    },
    {
      "@id": "schema:CreativeWork",
      "@type": "rdfs:Class",
      "rdfs:subClassOf": {
        "@id": "schema:Thing"
      }
    }

Then I do: ontospy gendocs graph.json, then option 2 (Html multi page)

Question: Is there a way to display the properties, their ranges and cardinalities that are defined in the SHACL shapes along with the class?
So far, the properties only show up if they are defined along with schema:domainIncludes (which is somehow redundant because this is also stated in the shapes).

Ontospy already recognises the relation between the shape and its target class so I wonder if it could include more information from the shape.

Thanks a lot!

@tobiasschweizer
Copy link
Author

@lambdamusic Hi, I realised that I could solve my problem partially by extracting some information from the SHACL shapes and using this for the docs generation:

...,
{
      "@id": "schema:email",
      "@type": "rdf:Property",
      "schema:domainIncludes": {
        "@id": "schema:Person"
      },
      "schema:rangeIncludes": {
        "@id": "xsd:string"
      },
      "rdfs:comment": "Email address.",
      "rdfs:label": "email"
},
...

Screenshot 2021-12-03 at 18 06 13

However, I am still wondering how to display the cardinalities or additional info like sh:pattern etc.

Could you guide me how to adjust the template or just point out an example? That'd be a great help! Thanks a lot for this cool tool.

@tobiasschweizer
Copy link
Author

Would it possible to customise https://github.com/lambdamusic/Ontospy/blob/d23a17544c9c20039b9ce4ab051f8aebfc5a45b6/ontospy/ontodocs/media/templates/html-multi/browser/browser_classinfo.html? If yes, how would I get more information from the internal representation of the ontology?

@lambdamusic
Copy link
Owner

Ontospy doesn't really process SHACL shapes at the moment (apart from highlighting their links to classes).

At the very leart, I suppose you'd have to add a shapes extractor in ontospy.py and then a pythonic representation of shapes in entities.py.

Do you have a sample file you can share for testing?

@tobiasschweizer
Copy link
Author

Hi there

Ontospy doesn't really process SHACL shapes at the moment (apart from highlighting their links to classes).

Yes, this is what I noticed. For now, I generate the ontology from the shapes and then use this for Ontospy to generate the docs.

Example:

SHACL shape for schema:Thing

{
  "@context": {
    "owl": "http://www.w3.org/2002/07/owl#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "skos": "http://www.w3.org/2004/02/skos/core#",
    "prov": "http://www.w3.org/ns/prov#",
    "dcat": "http://www.w3.org/ns/dcat#",
    "sh": "http://www.w3.org/ns/shacl#",
    "shsh": "http://www.w3.org/ns/shacl-shacl#",
    "dcterms": "http://purl.org/dc/terms/",
    "schema": "http://schema.org/",
    "rescs": "http://rescs.org/"
  },
  "@graph": [
    {
      "@id": "rescs:dash/thing/ThingShape",
      "@type": "sh:NodeShape",
      "rdfs:comment": {
        "@type": "xsd:string",
        "@value": "The most generic type of item."
      },
      "rdfs:label": {
        "@type": "xsd:string",
        "@value": "Thing"
      },
      "sh:property": [
        {
          "sh:datatype": {
            "@id": "xsd:string"
          },
          "sh:description": "An alias for the item.",
          "sh:maxCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:name": "alternateName",
          "sh:path": {
            "@id": "schema:alternateName"
          }
        },
        {
          "sh:datatype": {
            "@id": "xsd:string"
          },
          "sh:description": "A description of the item.",
          "sh:maxCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:name": "description",
          "sh:path": {
            "@id": "schema:description"
          }
        },
        {
          "sh:description": "The identifier property represents any kind of identifier for any kind of [[Thing]], such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See [background notes](/docs/datamodel.html#identifierBg) for more details.",
          "sh:maxCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:name": "identifier",
          "sh:or": {
            "@list": [
              {
                "sh:datatype": {
                  "@id": "xsd:string"
                }
              },
              {
                "sh:nodeKind": {
                  "@id": "sh:IRI"
                }
              }
            ]
          },
          "sh:path": {
            "@id": "schema:identifier"
          }
        },
        {
          "sh:description": "An image of the item.",
          "sh:maxCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:name": "image",
          "sh:nodeKind": {
            "@id": "sh:IRI"
          },
          "sh:path": {
            "@id": "schema:image"
          }
        },
        {
          "sh:datatype": {
            "@id": "xsd:string"
          },
          "sh:description": "The name of the item.",
          "sh:maxCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:minCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:name": "name",
          "sh:path": {
            "@id": "schema:name"
          }
        },
        {
          "sh:description": "URL of a reference Web page that unambiguously indicates the item's identity. E.g. the URL of the item's Wikipedia page, Wikidata entry, or official website.",
          "sh:name": "sameAs",
          "sh:nodeKind": {
            "@id": "sh:IRI"
          },
          "sh:path": {
            "@id": "schema:sameAs"
          }
        },
        {
          "sh:description": "URL of the item.",
          "sh:maxCount": {
            "@type": "xsd:integer",
            "@value": 1
          },
          "sh:name": "url",
          "sh:nodeKind": {
            "@id": "sh:IRI"
          },
          "sh:path": {
            "@id": "schema:url"
          }
        }
      ],
      "sh:targetClass": {
        "@id": "schema:Thing"
      }
    }
  ]
}

I added a manually coded definition for schema:Thing:

{
      "@id": "schema:Thing",
      "@type": "rdfs:Class"
    }

and then from the shapes I extracted this information (here an example for the property schema:name):

{
      "@id": "schema:name",
      "@type": "rdf:Property",
      "schema:domainIncludes": {
        "@id": "schema:Thing"
      },
      "schema:rangeIncludes": {
        "@id": "xsd:string"
      },
      "rdfs:comment": "The name of the item.",
      "rdfs:label": "name"
    }

This is rather a workaround so I can generate the docs in the "traditional" way from the ontology.

@tobiasschweizer
Copy link
Author

Is there a way to also include the cardinalities in the "traditional" way? I guess the SHACL cardinalities would have to be converted to OWL cards. However, I am not sure if SHACL cards can be converted to OWL cards and vice-versa without loss of information.

Do you have an example of OWL cards being documented?

@tobiasschweizer
Copy link
Author

tobiasschweizer commented Dec 9, 2021

At the very leart, I suppose you'd have to add a shapes extractor in ontospy.py and then a pythonic representation of shapes in entities.py.

Yes, I can have a look if I understand how this could work. Unfortunately at the moment I don't have the time to come up with a PR implementing support for SHACL. But I think I could assist you in specifying this feature and then help with smaller specific tasks.

Maybe you could explain me how modular this design could be. Is there a way to decouple some things from the Ontospy package so users could use their own config without having to change the source (e.g., templating, custom Python classes for the internal repr.)?

@tobiasschweizer
Copy link
Author

@lambdamusic Would it be possible to discuss this feature request sometime early next year?

I'd propose the following priorities:

  1. add support for OWL cardinalities first (should not be too hard to do, right? :-)) -> a property's cardinalities are displayed for a class (min, max, exact)
  2. If feasible, we could discuss support for SHACL (partial to full in steps)

I am happy to help and provide some code. However, I need some guidance to make this efficient.

@lambdamusic
Copy link
Owner

Hi Tobias, thanks for contributing. I'm happy to discuss more, in the new year.

FYI right now there is already an active PR that is intended to add various functionalities for SHACL processing. That might address some of your requirements - so feel free to take a look and comment if you like.

In general, I'd rather finish up integrating that code, before adding more SHACL support.

@tobiasschweizer
Copy link
Author

@lambdamusic That sounds great. I'll have look at the PR this week. I wish you a great 2022 :-)

@tobiasschweizer
Copy link
Author

I installed ontospy-2.0.0a0 and tried with my shapes graph.

I can now see a new category "Property Shapes" or "Shape Properties" along with their cardinalities:
Screenshot 2022-06-13 at 11 26 13

@lambdamusic @ajnelson-nist Overall, this looks great! Thanks a lot for your effort!

I have a few questions:

  • for some reason, the shacl:or combinations of (local) ranges do not show under "property shapes"/"shape properties" (see screenshot above) but they do when I look at the ontology definition, see screenshot below for the example schema:funder.
  • sh:IRI does not show as type / range
  • if sh:minCount misses, a zero is displayed, if sh:maxCount misses, a dot is shown. I understand that a missing sh:minCount is the same as 0, but would it maybe make sense to show the dot here as well, meaning not specified? This is a detail and I am not even sure whether their is a semantic difference between setting sh:minCount to 0 and omitting it ...

Screenshot 2022-06-13 at 11 20 36

Please let me know in case I can help with something.

@tobiasschweizer
Copy link
Author

sidenote: rdflib changed their default schema prefix to the HTTPS version, so HTTP now shows as schema1 ... see RDFLib/pySHACL#118

@ajnelson-nist
Copy link
Contributor

@tobiasschweizer - thank you for the kind words.

I have a short answer about sh:or - it's not appearing because @balon and I didn't write the SPARQL-based discovery query to find sh:or usage.

If sh:maxCount is missing, we should probably have an asterisk, not a dot. That was likely a design oversight from me, as I recall other practices about maximum cardinalities in this project use an asterisk.

If sh:minCount is absent, I recall that is semantically equivalent to sh:minCount 0.

I think what the SHACL display testing really needs next is a test suite of many small ontologies specific to certain SHACL features. It's not immediately obvious to me what the right presentation is for some cases, e.g. sh:path ( ex:property1 ex:property2 ) ., or sh:targetSubjectsOf.

@lambdamusic - if many small SHACL feature-test files would be helpful, in what directory under ontospy/tests would it be helpful to arrange these?

@tobiasschweizer
Copy link
Author

If sh:maxCount is missing, we should probably have an asterisk, not a dot.

It is in fact an asterisk as I can see now :-) If I was 20 years younger I'd have seen it right away ;-)

@tobiasschweizer
Copy link
Author

tobiasschweizer commented Jun 15, 2022

didn't write the SPARQL-based discovery query to find sh:or usage.

Querying lists in RDF seemed quite cumbersome to me. Especially when their length is not known.

Let me know if you come up with a good way. Not sure if this may help: http://www.snee.com/bobdc.blog/2014/04/rdf-lists-and-sparql.html (Retrieving all the members). The example applies property path syntax. (I got the author's book many years ago)

@ajnelson-nist
Copy link
Contributor

I can't think offhand of a better way than what was done in that snee post. For the needs of this repository, a general list function could be added to take a list's head identifier and turn it into a Python list of rdflib.Nodes. That'd be so general there's no reason it couldn't be contributed to rdflib, if they don't have such a function already. Within Ontospy, it would satisfy looking around for sh:ors, sh:ands, and some forms of rdfs:Datatypes based on enumerations.

Meanwhile, in your screenshot about a grants shape, it's a non-trivial tree of sh:and's and sh:or's. How would you expect that to be rendered in a gendocs page?

@tobiasschweizer
Copy link
Author

Meanwhile, in your screenshot about a grants shape, it's a non-trivial tree of sh:and's and sh:or's. How would you expect that to be rendered in a gendocs page?

Thanks for asking!

We are using a system which supports SHACL but not RDF(S) inference. This is why we make use of sh:and in the shape definitions to represent inheritance between schema.org classes.

To build the docs, however, there is an additional ontology file which just hardcodes those class relations. And there is a build process which collects the single SHACL shape definitions and puts them in a graph along with the ontology file. Properties are dynamically filled into the ontology definitions to avoid too much redundancy. See https://github.com/Connectome-Consortium/rescs_shacl_shapes#use-with-standard-tools for more details. (There is even a script which gets rid of sh:and, see https://github.com/Connectome-Consortium/rescs_shacl_shapes/blob/main/scripts/transform_shapes_graph.py but note that I did it the easier JSON way instead of querying lists with SPARQL ...)

In short: Because of the ontology file stating the class relations the sh:and statements need not be processed for my use case. It would be nice, however, if the sh:or statements would be processed and displayed. They are used to expressed that the range of a property like schema:funder could either be a an schema:Organization or schema:Person (like shown in the second screenshot). We try to follow https://datashapes.org/schema but only implement what we need for our use case.

I am not sure how generic that support for sh:or should be though for gendocs.

@ajnelson-nist
Copy link
Contributor

I think sh:or and sh:and should be supported generically. It seems from your use case you don't have as pressing a need for sh:and, but I think the "right" answer for how to render either as documentation will be nearly equivalent.

How do you think sh:or should render by itself? I suspect the answer will involve either nesting <table> elements in cells, or something involving extra <tbody> elements.

I suggest a small ontology be used as an initial exemplar. Hopefully the citation pattern inlined here shows the source sufficiently well:

@prefix ex: <http://example.com/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:PersonAddressShape
	a sh:NodeShape ;
	rdfs:seeAlso <https://www.w3.org/TR/shacl/#OrConstraintComponent> ;
	sh:targetClass ex:Person ;
	sh:property [
		sh:path ex:address ;
		sh:or (
			[
				sh:datatype xsd:string ;
			]
			[
				sh:class ex:Address ;
			]
		)
	] .

@tobiasschweizer
Copy link
Author

It seems from your use case you don't have as pressing a need for sh:and

That's right.

How do you think sh:or should render by itself? I suspect the answer will involve either nesting <table> elements in cells, or something involving extra <tbody> elements.

I have no particular requirements regarding HTML. I think the example above could be rendered like Ontospy already does for cases like

      "schema:rangeIncludes": [
        {
          "@id": "schema:Person"
        },
        {
          "@id": "schema:Organization"
        }
      ]

Screenshot 2022-06-17 at 14 38 01

Screenshot 2022-06-17 at 14 38 22

However, I am not sure how sh:or could be used besides indicating ranges. If it should be supported in a generic manner, we probably need more examples of its possible use (the same applies to sh:and). It could also be used to define alternative paths, see https://www.w3.org/TR/shacl/#OrConstraintComponent. I guess we would not want to go this far, right? So we probably should constrain (note the irony :-)) our coverage of SHACL here.

@tobiasschweizer
Copy link
Author

tobiasschweizer commented Jun 17, 2022

I have just noticed that you referred to https://www.w3.org/TR/shacl/#OrConstraintComponent as well ;-)

I meant this one:

ex:OrConstraintExampleShape
	a sh:NodeShape ;
	sh:targetNode ex:Bob ;
	sh:or (
		[
			sh:path ex:firstName ;
			sh:minCount 1 ;
		]
		[
			sh:path ex:givenName ;
			sh:minCount 1 ;
		]
	) .

@tobiasschweizer
Copy link
Author

@ajnelson-nist Let me know if I can help with anything.

@tobiasschweizer
Copy link
Author

I will be on vacation in August and I am looking forward to doing some work here in September :-)

@ajnelson-nist
Copy link
Contributor

@tobiasschweizer Given scattered summer availabilities, we're probably still at the strategy-level thinking on this.

I think the right strategy is to start systematically exercising SHACL capabilities as they are demonstrated in the SHACL specification document. I borrowed a snippet and included a rdfs:seeAlso as a citation. (It'd probably be better to use something from CITO.) That snippet should probably become a standalone file named according to...looks like section number and then 1-based index may be the best option available.

So, the snippet I excerpted could go into: /ontospy/tests/rdf/shacl/4.6.3-ex-OrConstraintExampleShape.ttl.

And then the test framework could loop through that folder (/ontospy/tests/rdf/shacl/) and run gendocs for each sample. I forget offhand if the test framework would require a folder per sample, but it probably would in order to have somewhere to stash the generated HTML.

The design work can then be isolated to handle each the SHACL informative examples. If we're lucky, the designs for each of the samples we need would compose when we start combining features like your chain of sh:and nesting sh:or.

How does that sound as a way forward?

@tobiasschweizer
Copy link
Author

@ajnelson-nist Thanks for your message.

Yes, that sounds very systematical and allows us to do one step at a time, not getting wound up in SHACL's complexity from the beginning.

In September, we could try working on this and split the tasks a bit once we are sure about the basic setup.

Have a good summer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants