-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDF/XML Literals not being parsed properly #75
Comments
In the this.parseDOM() function, changing: var nv = parsetype.nodeValue; if (nv === "Literal"){ frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom) // should work but doesn't frame = this.buildFrame(frame); frame.addLiteral(dom); dig = false; } to: var nv = parsetype.nodeValue; if (nv === "Literal"){ frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom) // should work but doesn't frame = this.buildFrame(frame); frame.addLiteral(dom.lastChild.nodeValue); dig = false; } to get the actual content of the literal node seems to work. Will this might break something else? |
I didn't mean to close the issue. |
It appears the dataType is incorrect: { subject: { uri: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml', value: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml' }, predicate: { uri: 'http://purl.org/dc/terms/title', value: 'http://purl.org/dc/terms/title' }, object: { value: 'JKE Banking (Change Management)', lang: '', datatype: [Object] }, why: { uri: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/workitems/catalog', value: 'https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/workitems/catalog' } }, Should it be: { value: 'JKE Banking (Change Management)', lang: undefined, datatype: undefined } or somehow a string? Or am I doing this query incorrectly: var sp = this.catalog.statementsMatching(undefined, DCTERMS('title'), 'JKE Banking (Change Management)'); Does the string literal object need to be wrapped in this.catalog.literal? I tried that too, still didn't match, and I noticed that wrapping the string as a literal leaves the datatype undefined as shown above. |
I'm making some progress. The ‘addLiteral’ function of the RDFParser frameFactory adds the datatype var sp = this.catalog.statementsMatching(undefined, DCTERMS('title'), this.catalog.literal('JKE Banking (Change Management)', undefined, this.catalog.sym('http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral')))); This doesn't seem to match the documentation which says you should be able to just use a JavaScript string. Is this a bug or does it work as intended, and I have to create these literals with the symbol datatype? |
The parsetype="Literal" syntax in RDF/XML is for quoting pieces of embed XML literally. I think you probably just want strings. If you just miss out parsetype="Literal" then you will have the strings you want I suspect. |
Unfortunately I don't control the RDF/XML source, its from Rational Team Concert OSLC Service Provider Catalog. So I may have to just deal with RTC's quirk for how it expresses dcterms:title. That's no problem. However, isn't there still an issue? The RDF/XML source is: <oslc:serviceProvider> <oslc:ServiceProvider rdf:about="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/oslc/contexts/_pMhMgPsWEeSnQvDHoYok5w/workitems/services.xml"> <dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title> <oslc:details rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w"/> <jfs_proc:supportLinkDiscoveryViaLinkIndexProvider rdf:parseType="Literal">false</jfs_proc:supportLinkDiscoveryViaLinkIndexProvider> <jfs_proc:supportContributionsToLinkIndexProvider rdf:parseType="Literal">true</jfs_proc:supportContributionsToLinkIndexProvider> <jfs_proc:globalConfigurationAware rdf:parseType="Literal">compatible</jfs_proc:globalConfigurationAware> <jfs_proc:consumerRegistry rdf:resource="https://oslclnx2.rtp.raleigh.ibm.com:9443/ccm/process/project-areas/_pMhMgPsWEeSnQvDHoYok5w/links"/> </oslc:ServiceProvider> </oslc:serviceProvider> Seems like the value of this property should be LiteralXML, but shouldn't include the property itself, just the value: JKE Banking (Change Management) (is this even valid XML?) not <dcterms:title rdf:parseType="Literal">JKE Banking (Change Management)</dcterms:title> |
I think my patch above is incorrect. The this.parseDOM() function for Literal nodes: var nv = parsetype.nodeValue; if (nv === "Literal"){ frame.datatype = RDFParser.ns.RDF + "XMLLiteral";// (this.buildFrame(frame)).addLiteral(dom) // should work but doesn't frame = this.buildFrame(frame); frame.addLiteral(dom); dig = false; } should normalize the children of the Literal property (so that === on embedded XML works consistently regardless of ordering), and use an XML serializer to create the value of the node which should be XML source, not parsed DOM. I see similar code in the RDFa parser. If this is correct, I can submit a fix. |
Interesting, I have a problem here in May 2016 with Jim's oslc-client being unable to find Service Providers because the statementsMatching method is not finding XMLLiterals that contain the sought CCM Project Name (name only). I wonder if rdflib.js evolved while Jim's OSLC4JS example has not. |
My patch for XMLLiterals has not been merged into rdflib.js yet.
|
Because the find-the-service-provider-by-name method is only looking for a string in what is likely to be a fairly small set of titles, we could refactor the method to retrieve all ?-title-? statements and then use a simple JS or Lodash collection filter to pick out the pattern "(.)${serviceProviderTitle}(.)". That may be good enough versus trying to get the rdflib.catalog to recognize our particular literal value string. What do you think? |
The following and the addition of lodash and escapeStringRegex allow the method to find the statement that relates the subject uri to the literal title for the sought serviceProviderTitle. var haveTitle = this.catalog.statementsMatching(
undefined,
DCTERMS('title'),
undefined );
const regex = new RegExp( ".*?" + escapeStringRegexp( serviceProviderTitle ) + ".*?" );
var sp = _.filter( haveTitle,
(s) =>
{
return s.object.value.match( regex );
}
); |
@jamsden probably even easier fix without introducing new dependency: |
frame.addLiteral(dom.childNodes) does indeed work. DOM such as: JKE Banking (Change Management) Another paragraph And another paragraph </dcterms:title>would parse as the following literal string of XML source: JKE Banking (Change Management) Another paragraph And another paragraph So this becomes a one-line code change. I'll implement in my fork, test and create a PULL request. There is about to be a lot of use of rdflib.js in developing OSLC integrations. This defect is a show stopper however since OSLC makes a lot of use of parseType="Literal". |
This change does not behave nicely in-browser. The Browser's DomParser handles serialization of NodeLists differently than the library used for NodeJS. In the browser, objects get serialized as I would propose that the line
Would be better as
This both serializes the inner content, as well as preserving it's XML content as requried by I'm a little fuzzy on how nodejs handles this. I assume Issue verified in:
https://forum.solidproject.org/t/errors-parsing-xml-with-rdflib-js-in-the-browser/448 |
We are facing the same issue. Is it possible to get that fixed or do you have any workarounds? Thanks |
Is this solving your issue ? Or are there other issues ? |
Hi @bourgeoa, let my try a new version and if not I will come back with more information about the issue and some test data, Thank you |
@bourgeoa, we had the same issue as this bug in an implementation of the OSLC AM V3 specification using |
@bourgeoa, this fix is not in |
merged in [email protected] |
Thanks @bourgeoa! Confirmed |
Given some RDF/XML that contains:
An a query such as:
someKb.the(aServiceProvider, DCTERMS('title’));
returns:
instead of the text. Am I missing something of is the dcterms:title being parsed incorrectly?
The text was updated successfully, but these errors were encountered: