-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CKAN / DataPackages / JSON Table Schema integration spec #160
Conversation
|
||
For achieving this it seems sensible to reuse as much stuff as we can from ckanext-datapackager, extending it if necessary | ||
|
||
The first two points could be implemented either in core or separate extension, but I'd like to propose that the third regarding resource schemas is implemented in core (in a generic form). It seems central enough to all future work around data import, cleaning, etc (eg closer integration with DataStore) to justify it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/The first two points could be implemented either in core or separate extension/The first two points should be implemented as a core extension/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean an extension shipped with CKAN core, like datastore
, recline_view
etc? If so I'm happy with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly like those, yes.
|
||
> [is a] short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique). | ||
|
||
We can not assume that these identifiers will be unique and that they will not clash with existing CKAN datasets. We can append stuff to the name but then we need to consider what happens when we reupload the same Data Package (ie to update the dataset). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could assign a data package a guid (used for matching when you reimport it) which contains its name and author. So the author becomes its namespace. And then in ckan we just munge the name to ensure its unique within CKAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, although perhaps I'd use the organization name if possible. I'd assume that if two users on the same org upload a budget-2016
file they would be working on the same one (and if not they should use different names)
I find also very useful to have a generated"datapackage.json" for each resource in case needed. It is not very clear where the information that could be generated on the fly (datapackage.json) will be stored. |
|
||
> [is a] short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique). | ||
|
||
We can not assume that these identifiers will be unique and that they will not clash with existing CKAN datasets. We can append stuff to the name but then we need to consider what happens when we reupload the same Data Package (ie to update the dataset). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we can't assume that the name is unique and yet that is its intended use, is this a flaw with the specification that may need fixing?
For a given CKAN, should a munge of the title be used to find clashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs addressing in the spec itself. Also see discussion in frictionlessdata/datapackage#220
I've started changing the old https://github.com/ckan/ckanext-datapackager extension to work on CKAN 2.4 and implement the ideas discussed here about importing/exporting datapackages into CKAN. I've created a milestone to track this on https://github.com/ckan/ckanext-datapackager/milestones/Importing%20and%20Exporting%20Data%20Packages%20on%20CKAN%202.4. I'm documenting the issues I'm encountering as I work on it. For example, I've written about the mapping between CKAN fields to datapackage fields at frictionlessdata/ckanext-datapackager#25 (comment). Another issue was how to deal with the extras fields (frictionlessdata/ckanext-datapackager#27). If you guys have some time, please take a look 👍 |
Rendered doc:
https://github.com/ckan/ideas-and-roadmap/blob/datapackages-spec/specs/datapackages/README.md