Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CKAN / DataPackages / JSON Table Schema integration spec #160

Merged
merged 3 commits into from
May 19, 2020

Conversation

amercader
Copy link
Member


For achieving this it seems sensible to reuse as much stuff as we can from ckanext-datapackager, extending it if necessary

The first two points could be implemented either in core or separate extension, but I'd like to propose that the third regarding resource schemas is implemented in core (in a generic form). It seems central enough to all future work around data import, cleaning, etc (eg closer integration with DataStore) to justify it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/The first two points could be implemented either in core or separate extension/The first two points should be implemented as a core extension/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean an extension shipped with CKAN core, like datastore, recline_view etc? If so I'm happy with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly like those, yes.


> [is a] short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique).

We can not assume that these identifiers will be unique and that they will not clash with existing CKAN datasets. We can append stuff to the name but then we need to consider what happens when we reupload the same Data Package (ie to update the dataset).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could assign a data package a guid (used for matching when you reimport it) which contains its name and author. So the author becomes its namespace. And then in ckan we just munge the name to ensure its unique within CKAN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, although perhaps I'd use the organization name if possible. I'd assume that if two users on the same org upload a budget-2016 file they would be working on the same one (and if not they should use different names)

@danmihaila
Copy link

I find also very useful to have a generated"datapackage.json" for each resource in case needed. It is not very clear where the information that could be generated on the fly (datapackage.json) will be stored.

@amercader amercader changed the title First draft of the CKAN / DataPackages spec CKAN / DataPackages / JSON Table Schema integration spec Oct 30, 2015

> [is a] short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique).

We can not assume that these identifiers will be unique and that they will not clash with existing CKAN datasets. We can append stuff to the name but then we need to consider what happens when we reupload the same Data Package (ie to update the dataset).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we can't assume that the name is unique and yet that is its intended use, is this a flaw with the specification that may need fixing?

For a given CKAN, should a munge of the title be used to find clashes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs addressing in the spec itself. Also see discussion in frictionlessdata/datapackage#220

@vitorbaptista
Copy link

I've started changing the old https://github.com/ckan/ckanext-datapackager extension to work on CKAN 2.4 and implement the ideas discussed here about importing/exporting datapackages into CKAN. I've created a milestone to track this on https://github.com/ckan/ckanext-datapackager/milestones/Importing%20and%20Exporting%20Data%20Packages%20on%20CKAN%202.4.

I'm documenting the issues I'm encountering as I work on it. For example, I've written about the mapping between CKAN fields to datapackage fields at frictionlessdata/ckanext-datapackager#25 (comment). Another issue was how to deal with the extras fields (frictionlessdata/ckanext-datapackager#27).

If you guys have some time, please take a look 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants