Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction wrongly emphasizes file-based data; other types of data entity are also valid #125

Closed
CaroleGoble opened this issue Jan 12, 2021 · 10 comments · Fixed by #127
Closed
Assignees
Labels
bug Something isn't working

Comments

@CaroleGoble
Copy link
Contributor

Describe the bug
"RO-Crate method: “organizing file-based data with associated metadata, using linked data principles, in both human and machine readable formats, with the ability to include additional domain-specific metadata”"

Thise emphasises the file approach not the references. Without references it cannot support FDO and many other use cases.
That quote means that anyone not using files and not using data will tink RO-Crate is inappropriate. I can point to an entry in a database - that is not file-based data - it in Github. Linked Data principles is too obtuse - references needs to be explicit.

Its damaging, confusing and potentially will stop adoption and funding. Its also contrary to our presentations.

URL
https://www.researchobject.org/ro-crate/1.1/introduction.html

Suggested fix
update the text

Additional context
Add any other context about the problem here.

@CaroleGoble CaroleGoble added the bug Something isn't working label Jan 12, 2021
@stain
Copy link
Contributor

stain commented Jan 13, 2021

Thanks for spotting this, @CaroleGoble - this text must have been left behind after we added #74 in RO-Crate 1.1

While I think the simple use case of organizing files should remain, which of course with #74 also can be remote "downloadable" data entities, but with #122 we would even semantically have an RO-Crate with no data entities at all, just a meaningful collection of contextual entities.

How to identify and represent entries in a database is a bit unclear at the moment, perhaps we should make a separate issue for that.. there is a risk of indirection to indirection (do we really want an entity of Record type or do the record still represent some other thing? http-issue-14 trap aheaD!). So some clear guidance would be desirable for aggregating records.

Do we have a concrete use case for referencing database record that we could formalize, perhaps from SYNTHESYS+?

@ptsefton
Copy link
Contributor

@CaroleGoble I'll take a look at this. What's FDO?

@ptsefton ptsefton self-assigned this Jan 20, 2021
@ptsefton
Copy link
Contributor

ptsefton commented Jan 20, 2021

@stain - what's the procedure for starting work on an update 1.1x? I assume that is if this is as important as @CaroleGoble's language suggests that we want to do a rapid patch-release on 1.1 rather than wait for 1.2?

@CaroleGoble
Copy link
Contributor Author

ptsefton pushed a commit that referenced this issue Jan 21, 2021
@ptsefton
Copy link
Contributor

@CaroleGoble I had a go at rewording that intro part. It's on a branch here: https://github.com/ResearchObject/ro-crate/blob/bug-intro-issue-125/docs/1.1/introduction.md

What do you think?

@ptsefton
Copy link
Contributor

ptsefton commented Feb 1, 2021

Any comments on this, otherwise I'll ask for help from @stain to do a patch release so we no longer have misleading information on the spec.

@stain
Copy link
Contributor

stain commented Feb 1, 2021

As mentioned in the call I think your branch changes are good.

Because 1.1.1 will still be under same folder 1.1/ let's prepare this change as a single pull request.

Could you update your branch to also update equivalently in 1.2-DRAFT/ folder so it does not get lost? The markdown file should not have otherwise changed so you can probably just copy it over.

Then as soon as that PR is merged I can do the formal tagging as 1.1.1 - which would update a couple of places that says 1.1.0 and make a new PDF.

@stain
Copy link
Contributor

stain commented Feb 3, 2021

I think it's still valuable to distinguish between Data and Contextual entitites, as highlighted in our draft RO-Crate paper - but from #125 discussion I agree we can broaden its scope so that hasPart can more clearly go to non-file and non-datasets.

(You could do this even before without violating anything, but this option was not emphasized)

So the question then is what to type these types as - I would assume they would still be CreativeWork in some form as they would be carrying information and would have someone who created them. And so I would argue that keeping CreativeWork as the superclass for any data entity (but they may also have the type of https://schema.org/StructuredValue for instance).

This is mostly covered in https://www.researchobject.org/ro-crate/1.1/data-entities.html#referencing-files-and-folders-from-the-root-data-entity but without example:

Data Entities can also be other types, for instance an online database. These SHOULD be of "@type": "CreativeWork" and typically have a @id which is an absolute URI.

I think hasPart comes down to containment - for me the whole thing about contextual entities is that they are mostly pre-existing things in the world which are brought into the crate - generally to assist description of data entities or other contextual entities. (See #122 on promoting non-data entities)

Perhaps for this 1.1.1 fix we don't need to add the non-file non-dataset example, but it would be good to make a new issue to capture such an example - it could for instance be a molecule acession and trajectory like https://bioexcel-cv19.bsc.es/#/browse/MCV1900002/overview - which includes PDB files, but is not just the set of files. (In fact the above should probably be a RO-Crate!)

@stain stain changed the title Misinformation on website Introduction wrongly emphasizes file-based data; other types of data entity are also valid Feb 3, 2021
@stain
Copy link
Contributor

stain commented Feb 3, 2021

When/if we add #122 then we can also modify this introduction further to emphasize that any kind of entities can be gathered by the RO-Crate.

@ptsefton
Copy link
Contributor

ptsefton commented Feb 3, 2021

Created the PR @stain.

Agree that we should tidy up stuff around data entities and how to reference contextual entities - will comment over on #122

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants