-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformation of trustyuris in sub-names yields unexpected result #6
Comments
Yes, I agree that this is somewhat unexpected, but it's all according to design. This transformation takes the base URI ( For In your case, you could just not use the hash character in the initial file, and the output is as you'd expect it to be:
The URIs before transforming then look a bit strange, but if you generate the trusty version directly that doesn't really matter. Alternatively you could also use There might be doable ways how your initial problem could be solved without URI clashes, but not sure it's worth it at this point... |
Thanks for the clarification!
in a file myresource.trig
yields
Here, the external references The remedy would be to replace This would also address the initial issue: |
The assumption is that you are under control of the base URI like While writing this, I see that you could apply a similar argument for removing the But there can in any case be different rules and procedures for this, that all lead to valid trusty URIs and associated resources. So this is not about the core of the structure and valitidy of trusty URIs; these are just practical matters on how to generate them. In other words, we could add parameters to have different strategies for producing trusty URIs, without the need to touch the checking of trusty URIs at all. A simple one could be to only transform the base URI and nothing else. Another one could be the one you sketched. So, I guess in the end the question is: How important is this for you at this point? :) If it's important we could work on adding these parameters to achieve different transformations. |
First off, I guess the issue here has never come up in the relevant scenarios so far, so it's really not a big thing - on the other hand, I also doubt that any application relies on the behaviour as it is now. If that's the case, it could just be changed. However, in its current form it's equivalent to a tacit rule saying: "You have to make sure none of your URIs is the prefix of any other" (or we can guarantee for nothing) - and that's a tough thing to check. DBpedia, for example, would not be able to use trustyURIs in their current form. Consider:
The links to Episode 1 would be broken in the Star Wars content. While some people might actually be quite ok with that ;-), it illustrates that these collisions happen, probably more likely so for systems in which URIs carry semantics. It's fair to say that DBpedia controls the URI space, but each URI depends on user input and they would first have to do a prefix scan of all their URIs when minting a new one to avoid the issue. EDIT: they could use a special terminator char. That would definitely be simpler, but also weird. I'm thinking about using trustyURIs for the webofneeds project, as you once proposed. In experimenting with it, I stumbled upon a few things, like the issue here, and also the complication with SHACL being incompatible with skolemization, which is mandatory for trustyURI content. If trustyURI were used for WoN, I'd need it without skolemization and with the prefix replacement as described above. As I haven't made up my mind yet, I wouldn't want to incur any work on your side, though. |
The important point here is that whatever your base URI is, it's a temporary one anyway that will be transformed into the new trusty URI. So a new URI is minted, and there is no reason why the pre-trusty base URI should correspond to anything that is out there already. So if I could add more features to control which URIs are transformed and how. But skolemization will stay. Trusty URIs won't support unskolemized blank nodes (we'll I am open to become convinced otherwise, but I don't see how that could happen at the moment). Skolemization is just the cleanest way (theoretically and practically) to deal with blank nodes. |
Concerning the skolemization: maybe our use case is a little different. I can think of reasons for skolemization in other use cases, for example, if you have an application that must be able to address any piece of information unambiguously and with minimal effort - that's just not our concern. For our use case, we just want the self-references to be "trusty" and that the trusty URI can be verified. And, it seems, we need the blank nodes for SHACL. I guess making skolemization optional would incur a performance hit because you'd have to do the skolemization when verifying, and you probably don't want that for the existing applications, but maybe this would warrant another module for as-is RDF content except for self-references? |
The main problem with blank nodes is not addressability but graph normalization, which is intractable in the general case when blank nodes are involved. So one would have to use a normalization algorithms that works reasonably well on graphs found in real data, but it will break (i.e. not terminate) for certain inputs. |
Transforming the follwoing content:
in a file
myresource.trig
using
yields
I'd expect the graph names and objects to be
sub:graph[12]
andsub:part[12]
. The result is not incorrect, just not nicely readable and probably not what is intended.The text was updated successfully, but these errors were encountered: