CKAN / DataPackages / JSON Table Schema integration spec #160

amercader · 2015-10-29T13:43:22Z

Rendered doc:

https://github.com/ckan/ideas-and-roadmap/blob/datapackages-spec/specs/datapackages/README.md

rossjones · 2015-10-29T15:24:19Z

specs/datapackages/README.md

+
+For achieving this it seems sensible to reuse as much stuff as we can from ckanext-datapackager, extending it if necessary
+
+The first two points could be implemented either in core or separate extension, but I'd like to propose that the third regarding resource schemas is implemented in core (in a generic form). It seems central enough to all future work around data import, cleaning, etc (eg closer integration with DataStore) to justify it.


s/The first two points could be implemented either in core or separate extension/The first two points should be implemented as a core extension/

Do you mean an extension shipped with CKAN core, like datastore, recline_view etc? If so I'm happy with it.

Exactly like those, yes.

davidread · 2015-10-29T15:52:55Z

specs/datapackages/README.md

+
+> [is a] short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique).
+
+We can not assume that these identifiers will be unique and that they will not clash with existing CKAN datasets. We can append stuff to the name but then we need to consider what happens when we reupload the same Data Package (ie to update the dataset).


We could assign a data package a guid (used for matching when you reimport it) which contains its name and author. So the author becomes its namespace. And then in ckan we just munge the name to ensure its unique within CKAN.

That sounds good, although perhaps I'd use the organization name if possible. I'd assume that if two users on the same org upload a budget-2016 file they would be working on the same one (and if not they should use different names)

danmihaila · 2015-10-30T11:41:17Z

I find also very useful to have a generated"datapackage.json" for each resource in case needed. It is not very clear where the information that could be generated on the fly (datapackage.json) will be stored.

rossjones · 2015-10-30T11:59:53Z

specs/datapackages/README.md

+
+> [is a] short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique).
+
+We can not assume that these identifiers will be unique and that they will not clash with existing CKAN datasets. We can append stuff to the name but then we need to consider what happens when we reupload the same Data Package (ie to update the dataset).


As we can't assume that the name is unique and yet that is its intended use, is this a flaw with the specification that may need fixing?

For a given CKAN, should a munge of the title be used to find clashes?

I think it needs addressing in the spec itself. Also see discussion in frictionlessdata/datapackage#220

vitorbaptista · 2015-12-04T19:12:11Z

I've started changing the old https://github.com/ckan/ckanext-datapackager extension to work on CKAN 2.4 and implement the ideas discussed here about importing/exporting datapackages into CKAN. I've created a milestone to track this on https://github.com/ckan/ckanext-datapackager/milestones/Importing%20and%20Exporting%20Data%20Packages%20on%20CKAN%202.4.

I'm documenting the issues I'm encountering as I work on it. For example, I've written about the mapping between CKAN fields to datapackage fields at frictionlessdata/ckanext-datapackager#25 (comment). Another issue was how to deal with the extras fields (frictionlessdata/ckanext-datapackager#27).

If you guys have some time, please take a look 👍

First draft of the CKAN / DataPackages spec

0f4a73c

jqnatividad added the Status: In Progress label Oct 29, 2015

rossjones reviewed Oct 29, 2015
View reviewed changes

Fix formatting

acef2e8

davidread reviewed Oct 29, 2015
View reviewed changes

Add more prior work, fix fields case, expand schemas implementation

4664182

amercader changed the title ~~First draft of the CKAN / DataPackages spec~~ CKAN / DataPackages / JSON Table Schema integration spec Oct 30, 2015

rossjones reviewed Oct 30, 2015
View reviewed changes

amercader mentioned this pull request Dec 3, 2015

'name' in datapackages shouldn't be required frictionlessdata/datapackage#220

Closed

jqnatividad mentioned this pull request Dec 15, 2015

Add CKAN Adapter openpermit/OpenPermit.NET#70

Open

amercader mentioned this pull request May 11, 2016

[WIP] Data package output for CKAN ckan/ckan#3007

Closed

5 tasks

rufuspollock merged commit a3fb784 into master May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CKAN / DataPackages / JSON Table Schema integration spec #160

CKAN / DataPackages / JSON Table Schema integration spec #160

amercader commented Oct 29, 2015

rossjones Oct 29, 2015

amercader Oct 30, 2015

rossjones Oct 30, 2015

davidread Oct 29, 2015

amercader Oct 30, 2015

danmihaila commented Oct 30, 2015

rossjones Oct 30, 2015

pwalsh Oct 30, 2015

vitorbaptista commented Dec 4, 2015


		For achieving this it seems sensible to reuse as much stuff as we can from ckanext-datapackager, extending it if necessary

		The first two points could be implemented either in core or separate extension, but I'd like to propose that the third regarding resource schemas is implemented in core (in a generic form). It seems central enough to all future work around data import, cleaning, etc (eg closer integration with DataStore) to justify it.


		> [is a] short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ., _ or - characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique).

		We can not assume that these identifiers will be unique and that they will not clash with existing CKAN datasets. We can append stuff to the name but then we need to consider what happens when we reupload the same Data Package (ie to update the dataset).

CKAN / DataPackages / JSON Table Schema integration spec #160

CKAN / DataPackages / JSON Table Schema integration spec #160

Conversation

amercader commented Oct 29, 2015

rossjones Oct 29, 2015

Choose a reason for hiding this comment

amercader Oct 30, 2015

Choose a reason for hiding this comment

rossjones Oct 30, 2015

Choose a reason for hiding this comment

davidread Oct 29, 2015

Choose a reason for hiding this comment

amercader Oct 30, 2015

Choose a reason for hiding this comment

danmihaila commented Oct 30, 2015

rossjones Oct 30, 2015

Choose a reason for hiding this comment

pwalsh Oct 30, 2015

Choose a reason for hiding this comment

vitorbaptista commented Dec 4, 2015