Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribution: XML/RDF/Turtle please. #4430

Closed
midijohnny opened this issue Jun 3, 2024 · 16 comments · Fixed by #4499
Closed

Attribution: XML/RDF/Turtle please. #4430

midijohnny opened this issue Jun 3, 2024 · 16 comments · Fixed by #4499
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🕹 aspect: interface Concerns end-users' experience with the software 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API 🧱 stack: frontend Related to the Nuxt frontend

Comments

@midijohnny
Copy link

It would be handy if there was some additional formats for the "Credit the creator" section.
In particular - I would suggest at least it should include a simple well-formed XML record.

Better: one that corresponds to the Dublin Core specification : https://www.dublincore.org/specifications/dublin-core/dcmi-terms/

Or RDF in general - including a 'Turtle format'.

(Although publishing in Dublin Core XML format would be enough for others to automatically translate this to other forms of RDF probably).

I would suggest this would also encourage more compliance with attribution , since it would be easier for the author to automatically credit creators.

@midijohnny midijohnny added ✨ goal: improvement Improvement to an existing user-facing feature 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels Jun 3, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Jun 3, 2024
@dhruvkb dhruvkb added 🟩 priority: low Low priority and doesn't need to be rushed 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🕹 aspect: interface Concerns end-users' experience with the software 🧱 stack: api Related to the Django API and removed 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work ✨ goal: improvement Improvement to an existing user-facing feature labels Jun 3, 2024
@obulat obulat added the 🧱 stack: frontend Related to the Nuxt frontend label Jun 3, 2024
@obulat
Copy link
Contributor

obulat commented Jun 3, 2024

I added "frontend" label because I think this refers to the frontend single result page's "Credit the creator" section, not the API's "attribution" property.

@dhruvkb
Copy link
Member

dhruvkb commented Jun 5, 2024

Attribution formats like XML should be supported by the API as well imo, so having both is 👌 .

@sarayourfriend
Copy link
Collaborator

DC sounds great! CC REL already uses DC terms, and the rich-text/HTML version of the attribution would be relatively easy to translate into a DC XML fragment.

@madewithkode
Copy link
Collaborator

I'd love to take on this.
Quick question off the top of my head, Should the generation of the XML attributions be done on the API level or on the frontend?

@sarayourfriend
Copy link
Collaborator

Currently all frontend attribution generation happens in JavaScript: https://github.com/WordPress/openverse/blob/main/frontend/src/utils/attribution-html.ts. The python openverse-attribution package also exists, but we can back-port this feature to there later on, if it's needed. For now, just add it to the frontend.

The frontend's attribution-html module generates the HTML for each type of attribution. Rich text is the same as the HTML, but we render the HTML directly, rather than displaying the HTML as code to copy. Plain text is the same, but without any markup.

The XML snippet should just be another option of output. You can use the existing methods for generating HTML to generate the XML.

Are you familiar with DC or RDF @madewithkode? There are a lot of resources online about both, but DublinCore's own documentation tends to be the best, and here's their documentation about RDF/XML specifically: https://www.dublincore.org/specifications/dublin-core/usageguide/#rdfxml and https://www.dublincore.org/specifications/dublin-core/dc-xml-guidelines/

The snippet there already gives a good idea of how to add the parts we'd need, it's essentially 1:1 with that, except we'd also populate dc:rights. Something like this, using https://openverse.org/image/feb91b13-422d-46fa-8ef4-cbf1e6ddee9b?q=galah as an example:

<rdf:RDF 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:>

   <rdf:Description rdf:about="https://www.flickr.com/photos/126953422@N04/40593461235">

      <dc:creator>Graham Winterflood</dc:creator>
      <dc:title>Galah in Darwin (Eolophus roseicapilla)</dc:title>
      <dc:rights>"Galah in Darwin (Eolophus roseicapilla)" by Graham Winterflood is licensed under CC BY-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse.</dc:rights>

   </rdf:Description> 
</rdf:RDF>

That interprets dc:rights as the broadest possible rights statement, and makes things relatively "uncomplicated" for us, when it comes to deciding how to represent CC with just DC. If we want to bring in CC REL, that's a separate story. I believe we could offer that, but if we want just the most basic RDF representation with just DC, this is probably it. Users can edit down dc:rights to whatever makes sense for their use case. This also has us ignoring a bunch of DC's recommendations for how to format DC XML, including not using DC (with XSI) to designate the type of resource, the type of resource identifier, and more detailed information about the rights statement.

However, I think we shouldn't create the full RDF XML, and instead, just offer the DC elements as XML (and we could follow this up by offerring different formats like Turtle or JSON-LD in the future, as separate issues). So then, we'd just have a copyable snippet, with some explanatory text. Maybe like this:

<dc:creator>Graham Winterflood</dc:creator>
<dc:title>Galah in Darwin (Eolophus roseicapilla)</dc:title>
<dc:rights>"Galah in Darwin (Eolophus roseicapilla)" by Graham Winterflood is licensed under CC BY-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse.</dc:rights>
<dc:identifier>https://www.flickr.com/photos/126953422@N04/40593461235</dc:identifier>
<dc:type>StillImage</dc:type>

dc:type should be Sound for audio.

It can be that simple, if we like. @midijohnny please let me know if I've got this wrong... I'm basing this on just 6 months of Library and Information Services courses I took recently, and only did a small amount of DC, but never anything in XML.

I don't think we should try to use DCMI terms (like implementing RightsStatements) because ultimately DC is so flexible, every institution or system realistically has its own approach to how they want to use it. Listing the DC terms like this as an XML snippet is my guess at the most flexible version of what we could do here.

@openverse-bot openverse-bot moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Jun 11, 2024
@sarayourfriend
Copy link
Collaborator

Assigned to you, @madewithkode, but it's probably a good idea to wait for @midijohnny to give more input before going to strongly in one direction (snippet, full RDF, which terms to use, etc). I do think it's best to stick with just an XML snippet for this first issue.

@madewithkode
Copy link
Collaborator

madewithkode commented Jun 11, 2024

I agree with you @sarayourfriend, any more extra/specific details regarding what's required would be appreciated. And thank you for the really indepth insights on this topic, I'd be sure to checkout the resources you shared as I do not have any prior experience with all the other markups/specifications being discussed asides XML. I'd standby on this a bit to see if @midijohnny has anything more to add before getting started.

@midijohnny
Copy link
Author

Great discussion ! I'm not an expert in RDF or Dublin Core either - but I would say the example above ("Maybe like this...") is going to be good enough - with one minor alteration - to include a root element with a namespace identifier.
That way : we would have a well-formed XML document in a specific namespace.

So something like:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
	<dc:creator>Graham Winterflood</dc:creator>
	<dc:title>Galah in Darwin (Eolophus roseicapilla)</dc:title>
	<dc:rights>"Galah in Darwin (Eolophus roseicapilla)" by Graham Winterflood is licensed under CC BY-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/2.0/?ref=openverse.</dc:rights>
	<dc:identifier>https://www.flickr.com/photos/126953422@N04/40593461235</dc:identifier>
	<dc:type>StillImage</dc:type>
</metadata>

It doesn't have to be 'metadata' - it could be (say) 'attribution' or whatever you think it best.

Having a well-formed document like this - with the namespace included (so people can look up the vocabulary based on the namespace) would provide a large benefit I think.

It means (for instance) somebody downstream can build an XSLT to transform this to what suits them.
You could even consider using this (or something similar) as the 'base' information and use XSLT to transform to the HTML/plain-text format to be displayed on the website - but that is just a suggestion.

For my purposes: I was collecting images to display in an XHTML (i.e. well-formed XML) environment, so if I had the format above it would have made my life easier.

@midijohnny
Copy link
Author

For additional context - here's why I logged the original request.
I was building a small example that needed some example images and I wanted to make sure I displayed the attribution (of course) - I had to build my own representation in a file images.xml, but if the original attribution information was already available in a relatively simple well-formed document, I would have just been able to use that (perhaps with minor edit) straight-off.

@sarayourfriend
Copy link
Collaborator

Perfect, thanks very much @midijohnny! I was wondering how best to include the namespace, that looks great. And makes things more flexible for the future if we want to implement CC REL.

@madewithkode how do you feel about starting on this, when you have time? Do you feel you have enough to go on to get started?

@madewithkode
Copy link
Collaborator

Sorry I'm late guys, been battling a flu. Really great insights and extra contexts @midijohnny
@sarayourfriend sure, I should be able to start off something with the information at hand, once I'm fully back.

@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Jun 13, 2024

No worries at all, take your time and get well soon! There's no rush or pressure with this.

@openverse-bot openverse-bot moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog Jun 17, 2024
@zackkrida
Copy link
Member

zackkrida commented Jun 18, 2024

I wanted to share some prior art here concerning XML. The Dublin Core we're adding in #4499 looks good. I also remembered today that Creative Commons' own License Chooser offers Extensible Metadata Platform (XMP) format, which is XML in a .xmp file.

Fun fact: @obulat implemented it a few years ago in this PR: creativecommons/chooser#272. A small change was made to that implementation shortly after.

I wonder if we should support that format as well?

@sarayourfriend
Copy link
Collaborator

What's the use case for downloading an XMP snippet? Wouldn't you use your image editor software to add that data, either embedded as an EXIF extension or as a sidecar file? I didn't know it could (or would) be used for attribution, I'm only familiar with it for use to describe the immediate work. I guess if you can add arbitrary additional metadata, attributions would just go there? I'm not sure how you would structure attributions in DC. An array of RightsStatements?

@zackkrida
Copy link
Member

The only (possible) use case I can think of is in the context of remixing works, where you might want to store or modify the original XMP for your new, derived work.

Even that feels somewhat contrived though. Probably best to wait on that until someone with a clear use case asks for it, as happened with dublin core here! 😄

@midijohnny
Copy link
Author

midijohnny commented Sep 25, 2024

That XML tab looks really good - nice one, should be a good benefit for people - it would have saved me time and effort for sure when I was building my little demo app!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🕹 aspect: interface Concerns end-users' experience with the software 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: api Related to the Django API 🧱 stack: frontend Related to the Nuxt frontend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants