Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to represent dataset source? #210

Closed
simonff opened this issue Aug 29, 2018 · 5 comments
Closed

How to represent dataset source? #210

simonff opened this issue Aug 29, 2018 · 5 comments

Comments

@simonff
Copy link

simonff commented Aug 29, 2018

For hosted datasets (eg, for MODIS MOD13Q1 dataset in the EE catalog) I want to point at the original file source. Should I use links for that? If yes, what do I put as 'rel' instead of XXX?

"links": [
{ "rel": "XXX", "href": " https://e4ftl01.cr.usgs.gov/MOLT/MOD13Q1.006/ " },
]

If no, should I create a separate Host section?

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 29, 2018

Isn't that similar to what we are discussing in #179? Sounds like this could be solved by the derived_from relation.

@simonff
Copy link
Author

simonff commented Aug 29, 2018

It is a very simple case of provenance. But while specifying provenance in general is very hard and not always needed, specifying the data source (especially in case of EE that is mostly a mirror) is very easy and always necessary.

@m-mohr
Copy link
Collaborator

m-mohr commented Aug 30, 2018

In #179 it is proposed to just link them using a link with rel type derived_from until there is a more concrete standard to describe provenance. I think these issues are closely related and should be discussed together. I would imaging a newly created standard to describe provenance would also include something to link to the source?!

Other than that, I think most parts of my first comment in #225 also apply here...

@cholmes
Copy link
Contributor

cholmes commented Aug 31, 2018

When I first started thinking about this I did think there'd be a 'derived_from' as well as 'copy_of' (or something like that). The first representing that there was processing done, the second just that it's stored in a different location - but that it points back to where it came from. As mentioned in #225 - I'd see 'copy_of' include the 'metadata processing' - unzipping, putting into a COG, etc.

I'm hesitant to add two more core link relationships, since derived_from seems like a stretch with few implementations and so we already have one in. I could see an extra attribute on derived_from that indicates that it is just a copy, not actual processing. Or just make a new link type, but we 'incubate' it for a bit. But I'd say if GEE uses it, and we also have Sentinel & Landsat in AWS also link back to their 'source' then it'd be pretty easy to bring in to the core.

@m-mohr m-mohr added this to the future milestone Jul 18, 2019
@m-mohr m-mohr modified the milestones: future, new extensions Mar 11, 2021
@m-mohr
Copy link
Collaborator

m-mohr commented Mar 11, 2021

Recently we added the rel type via, which links back to the original metadata. derived_from links to the STAC Item for the source data. Having that should be enough, I think.

@m-mohr m-mohr closed this as completed Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants