Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USE CASE: MSc Programs #17

Open
analice1pt opened this issue May 7, 2019 · 15 comments
Open

USE CASE: MSc Programs #17

analice1pt opened this issue May 7, 2019 · 15 comments
Labels

Comments

@analice1pt
Copy link
Collaborator

Creator: Ana Alice Baptista

Problem statement

Application that combines and integrates aggregated Linked Open Data of courses from several European universities, processes that data and releases the processed data as LOD. In this case we have LOD both as input and output. The data is conformant to three application profiles (AP1, AP2 and AP3). This causes that although the structures of the datasets are similar, some properties and constraints over values differ. For example:
• Property: indication of the University of a course: AP1 uses eg1:foo AP2 uses eg2:bar, AP3 uses eg3:baz.
• Ranges of the property in the AP: AP1 uses eg1:Foo, AP2 uses eg2:Bar and AP3 uses xsd:string.
• Relationships: in AP1 one course is related with one or more universities; in AP2 one course may be related to zero or more universities; in AP3 one course relates with one and only one university.

The questions are:
1 - How to map data originating from different datasets?
2 - How to deal with different but equivalent properties?
3 - How to deal with different domains and ranges of equivalent properties?
4 - How to deal with different constraints over values?

Stakeholders

Data providers, universities, future university students and other potential users of the application.

Links

Requirements

R1 – To be able to identify, relate and map possibly conflicting application profiles.
R2 – To be able to state preferred properties, classes and related constraints over possibly conflicting possibilities.
R3 – To be able to identify which data sources are related to a given profile.

Comments

We usually think of application profiles for data that we want to make available. How do we do when we want to develop applications that use data made available by others?

@kcoyle kcoyle added the use case label May 8, 2019
@philbarker
Copy link
Collaborator

One issue that a consuming application has to deal with when using an application profile that I don't think is mentioned here is what to do with incoming data that conforms to the base specification but does not adhere to the application profile. Options:

  1. keep all data that arrives so that is can be passed on other
  2. only ingest the data that adheres to the profile

Option 2 might seem the obvious route, but many applications prescribe a minimal profile, i.e. saying something like "you must supply this much data before we will accept any", while encouraging provision of more data.

@analice1pt
Copy link
Collaborator Author

@philbarker , I think this might be a new use case. When I wrote this use case, I was not thinking about an already existing base profile. Instead, I was thinking that a base profile might emerge from the inputing data.

@kcoyle
Copy link
Collaborator

kcoyle commented May 10, 2019

In SHACL and ShEx these two cases are handled with a statement that the graph being investigated (which could be the same as what we define as a profile) is either OPEN (allow properties that are not included in the validation document) or CLOSED (only allow properties that are included in the validation document). I have included the ShEx property sx:closed as a possible property for the profile in my original attempt at the vocabulary.

Would one of you add this as a use case? Thanks.

@analice1pt
Copy link
Collaborator Author

@kcoyle, I am not sure we are talking about the same thing. I think I should make this use case more clear. I am thinking on how to organize the data in the design phase. I mean, suppose that we want to develop an application that only uses data that others make available (e.g., Eurostat, aggregated data from European hospitals and aggregated transportation data). This data may have different properties, but also equal properties and equivalent properties. My question is: how do we, in the design phase, handle this potential diversity and superposition?

@philbarker
Copy link
Collaborator

@kcoyle #19 is a use case for what I had in mind

@kcoyle
Copy link
Collaborator

kcoyle commented May 10, 2019

@analice1pt I wasn't thinking that profiles themselves would handle mapping. Again, perhaps a more detailed example of a single situation (e.g. with just a few elements) would help us think about this.

@marianamalta
Copy link
Collaborator

@analice1pt the point here is:
Someone else wants to aggregate LOD, from providers that don't know each other, did not talk to each other, that just decided to publish according to its own model. These data will then be published again as LOD by an entity that did not produce the data.
How is this different from defining a application profile "without" having structured data ? The kinds of things (entities) might be different from provider to provider as well how those things are described (properties); and even if the models have similarities they may use different vocabularies/terms to describe each domain/property.
The issue here has more to do with the process, or the track of things (ProviderA published propertyA using termA but now we publish with termB), than with the application profile itself...
I am sorry if i am not reaching the real issue...

@analice1pt
Copy link
Collaborator Author

analice1pt commented May 10, 2019

@marianamalta , I guess that depends on what is an application profile and how it can be used. I like AP to support the design process, not only to inform others about the data that I am making available (if any).

@marianamalta
Copy link
Collaborator

Right. But the process of developing an AP can be complex, how deep do you want to go? This is something (the tracking of the process) that might not have an ending!

@analice1pt
Copy link
Collaborator Author

analice1pt commented May 17, 2019

I am giving an example of a simple situation that fits in this use case. A more complex situation would involve different ranges or different allowed values for properties.

20190517PropertiesDifferentDatasets

@analice1pt
Copy link
Collaborator Author

Another simple example because it involves only one property with different ranges in the MAP. I am assuming that we will be able to specify a range in the MAP that will be somehow related to the range in the base schema of the property.

20190517DifferentRanges

@kcoyle
Copy link
Collaborator

kcoyle commented May 22, 2019

I'm trying to develop requirements from this. Do either (or both) of these capture the sense?

  1. An application profile may contain mapping information for data from different sources
  2. An application profile may allow for more than one value type for a property
  3. There needs to be a way to define preferred properties in the case where more than one is available in the dataset

@analice1pt
Copy link
Collaborator Author

@kcoyle , I agree with those requirements. I just would like to point out that mapping between data sources may not capture all the meaning: the idea is mapping between profiles of different data sources.

@kcoyle
Copy link
Collaborator

kcoyle commented May 29, 2019

@analice1pt Mapping between profiles would require that the individual statements in each profile are identified as belonging to that profile. As a reminder, all properties / elements in a profile are pre-defined in vocabularies. So when a profile reuses dct:title it is identified with dct:title. How would you indicate that this is the use of dct:title in a particular profile?

@analice1pt
Copy link
Collaborator Author

@kcoyle , I am not sure I understood your comment. I was meaning that we should be able to map between the properties, types of values, and other constraints of the data sources, not between the data sources themselves. When I referred to APs, I was meaning both implicit and explicit APs. By implicit APs I mean APs that may be inferred by the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants