Add `schema.fieldsMatch` property; clarified extra/non-specified fields in Table Schema #39

roll · 2024-02-21T16:24:20Z

fixes Clarity around extra/non-specified fields datapackage#367

Overview

It's the first attempt at implementing a new partial/syncSchema/schemaComplete feature that also clarifies a default field-column mapping behavior in Table Schema. Also, it naturally makes field.name unique amongst other fields in Table Schema with backward-compatibility note for implementors.

I propose to add this property to the Table Schema level as it regulates field-column mapping rules where both field and column concepts are defined on the Table Schema level (not on the Data Resource level). Also, because Table Schema can be shared or constitute catalogs, I think it will be right if compliance rules configuration belongs to abstract Table Schema rather than concrete Data Resources.

For example, if I publish a partial Table Schema I declare that any data source having fields A, B, C in any order will be valid to this schema. If I publish non-partial (default) Table Schema I declare that only a data source having only 3 fields A, B, C in exact order is valid to this schema. I think it's important for e.g. government regulation Table Schemas etc

roll · 2024-02-23T09:46:00Z

I think a found one more good argument why partiality configuration is better to be on the Table Schema level.

If we think about it conceptually:

JSON Schema: validates structured data
Table Schema: validates tabular data

JSON Schema has settings like additionalProperties that you can set to false limiting the amount of properties in the DATA. That's basically what we do with this currently called schema.partial -- we constraint or not amount and order of columns in the DATA. I think it's really symmetrical and in both cases it happens within the abstract reusable validation container (rather than in the concrete data).

nichtich · 2024-02-23T11:21:21Z

This seems to mix two indepenent features:

require unique names and any order
allow additional fields

Both should get their own flag. If order is relevant, additional fields can stil be allowed after the last defined field, so both flags are independent from each other.

roll · 2024-02-23T12:37:10Z

@nichtich
Good point!

Probably two flags extraFields and orderedFields will be even better. At least, it's more self-explanatory and better aligns with JSON Schema counterpart spirit

cloudflare-workers-and-pages · 2024-02-23T13:56:08Z

Deploying datapackage with Cloudflare Pages

Latest commit:	`f71a530`
Status:	✅ Deploy successful!
Preview URL:	https://9e56f6e0.datapackage.pages.dev
Branch Preview URL:	https://367-schema-partial.datapackage.pages.dev

View logs

roll · 2024-02-23T14:56:20Z

I've updated the PR to have two properties that make data source requirements stricter if set to true:

exactFields
orderedFields

Selected defaults make it non-breaking in relation to the current v1 definition.

roll · 2024-03-11T10:30:16Z

Hi @peterdesmet,

can you please take a look? WDYT?

peterdesmet

Selected defaults make it non-breaking in relation to the current v1 definition.

@roll, can you clarify this comment? The defaults for both fields are false, allowing partial matching, while frictionless v1 mandated complete matching.

Irrespective, do you think the default should be the most lenient approach?

peterdesmet · 2024-03-13T15:35:52Z

content/docs/specifications/table-schema.md

+
+A Table Schema descriptor `MAY` contain a property `exactFields` that `MUST` be boolean with default value `false`:
+
+- **false** (default): The number of fields in the data source `MUST` be equal or more than the number of elements in the `fields` array.


What about the (common) use case where you have less fields in the data source compared to the fields array?

@peterdesmet
It's a good question. Won't it make the data source invalid against the given Table Schema? I guess software still can read it but my feeling that the missing fields need to be marked as a structural error

I'm just trying to think here about a Table Schema as a data contract and missing fields seems to be a violation but I'm not sure.

In biodiversity informatics, it is common to work with schemas (like this one https://rs.gbif.org/extension/gbif/1.0/distribution_2022-02-02.xml), where a publisher will only use a number of fields, since not all apply to the data they hold. I'd like to be able to use Table Schemas for that. 😄 Fields that are expected to be there can be indicated with the required property (and raise a validation error if they are not).

If you agree, would it be possible to update the wording to allow this use case?

Of course, missing fields can be filled by null and then e.g. required: true be used but it seems a little bit different as it's on an individual rows level

@peterdesmet
Make sense!

I've updated the PR -- if a Table Schema author needs strictness here, they can use exactFields: true. If there is a use case of requiring equal or more fields we can add a separate constraint later

content/docs/specifications/table-schema.md

Co-authored-by: Peter Desmet <[email protected]>

roll · 2024-03-14T15:10:37Z

@peterdesmet
It was a little bit of a surprise for me, but the current wording is:

It MUST contain a property fields. fields MUST be an array where each entry in the array is a field descriptor (as defined below). The order of elements in fields array SHOULD be the order of fields in the CSV file. The number of elements in the fields array SHOULD be the same as the number of fields in the CSV file.

It doesn't use MUST so basically, it's the same as:

exactFields: false
orderedFields: false

Irrespective, do you think the default should be the most lenient approach?

Yeah, I was thinking about it, and I did like the concept of applying constraints similar to JSONSchema e.g. additionalProperties: false. On the other hand, I don't have a strong opinion here but it seems to be the only non-breaking way based on the v1 wording.

This reverts commit f04b7ed.

peterdesmet · 2024-03-14T15:26:15Z

@roll the current SHOULD wording is a "recent" change though. It was suggested by me in 2020. From 2012 to 2020 it was MUST, so I believe that is what most implementors/users consider v1.

roll · 2024-03-14T15:31:23Z

@peterdesmet
Probably it was committed back then by mistake without updating a version according to semver etc but it's been there for 3 years so I think we just need to consider v1 to be as it is now.

BTW even before the change it had been using SHOULD for length:

The number of elements in fields array SHOULD be exactly the same as the number of fields in the CSV file.

This reverts commit f670614.

peterdesmet · 2024-03-14T15:36:38Z

Regarding v1: ok, seems it was SHOULD. But I think a number of data consumers assumed MUST (including frictionless-r) and did offer sync_schema functionality. Even if it is a breaking change, I think the strict approach (orderedFields: true, exactFields: true) makes more sense as a default to me. But I'd like to hear from others too.

roll · 2024-03-14T15:45:13Z

I would vote for any, both false or both true, by default. I agree that both true by default is what, currently, is a de-facto standard on the implementation level.

I'm curious if there are better names to mirror it and use false by default .e.g unorderedFields and arbitraryFields. Personally, I always feel more natural not to use flags that truth by default but, of course, it's a matter of taste

peterdesmet · 2024-03-14T16:00:19Z

@roll, we have some precedents of true defaults (and false):

CSV dialect doubleQuote: true
CSV dialect header: true
Table schema bareNumber: true
Table schema required: false
Table schema unique: false

I think the positive properties (orderedFields, exactFields) are easier to understand than the negative ones (unorderedFields, ...)

roll · 2024-03-14T16:23:33Z

@peterdesmet
I agree, it's not a problem. I'll update defaults for voting now as it seems we're more leaning towards true by default

Upd.
Done

content/docs/specifications/table-schema.md

Co-authored-by: Peter Desmet <[email protected]>

roll · 2024-03-28T08:41:25Z

With Paul's support expressed in the original issue we got the quorum. It was one of the most awaited features in the Data Package, and it's really great that we finally made it! Esp. in a form that beautifully aligned with the set theory that became real only after a great discussion (the initial proposal was way different).

🎉

roll · 2024-03-28T08:41:52Z

ACCEPTED by WG (6/9)

roll added 4 commits February 21, 2024 15:08

Bootstrapped field order section

4d84726

Updted the spec

fe48c96

Updated the profile

f57f411

Fixed styling

bb7011a

roll mentioned this pull request Feb 22, 2024

Frictionless Standards (v2). Feb 19 - Feb 25 frictionlessdata/datapackage#881

Closed

Merge branch 'main' into 367/schema-partial

7488bf1

roll added 2 commits February 23, 2024 14:48

Rebased on two properties

919206a

Removed partial

f832abf

roll changed the title ~~Add schema.partial property; clarified extra/non-specified fields in Table Schema~~ Add schema.exact/orderedFields properties; clarified extra/non-specified fields in Table Schema Feb 23, 2024

Added articles

94c9605

roll mentioned this pull request Feb 26, 2024

Frictionless Standards (v2). Live Tracker frictionlessdata/datapackage#853

Closed

peterdesmet reviewed Mar 13, 2024

View reviewed changes

roll and others added 2 commits March 14, 2024 15:03

Update content/docs/specifications/table-schema.md

8657f3b

Co-authored-by: Peter Desmet <[email protected]>

Merge branch 'main' into 367/schema-partial

1bf8dd3

roll added 2 commits March 14, 2024 15:19

Fixed exactFields

f04b7ed

Revert "Fixed exactFields"

f670614

This reverts commit f04b7ed.

Revert "Revert "Fixed exactFields""

64bd8c0

This reverts commit f670614.

roll added 3 commits March 14, 2024 16:29

Reverce defaults

528978a

Updated the profile

bd74252

Fixed typo

6003b25

peterdesmet mentioned this pull request Mar 15, 2024

Clarity around extra/non-specified fields frictionlessdata/datapackage#367

Closed

roll added 2 commits March 18, 2024 15:21

Rebased to schema.fieldsMatch

b6cfbba

Updated wording

915867e

roll changed the title ~~Add schema.exact/orderedFields properties; clarified extra/non-specified fields in Table Schema~~ Add schema.fieldsMatch property; clarified extra/non-specified fields in Table Schema Mar 18, 2024

roll added the candidate label Mar 19, 2024

peterdesmet reviewed Mar 21, 2024

View reviewed changes

content/docs/specifications/table-schema.md Outdated Show resolved Hide resolved

content/docs/specifications/table-schema.md Outdated Show resolved Hide resolved

content/docs/specifications/table-schema.md Outdated Show resolved Hide resolved

roll and others added 5 commits March 28, 2024 08:28

Update content/docs/specifications/table-schema.md

a901c50

Co-authored-by: Peter Desmet <[email protected]>

Update content/docs/specifications/table-schema.md

3bba057

Co-authored-by: Peter Desmet <[email protected]>

Update content/docs/specifications/table-schema.md

a531108

Co-authored-by: Peter Desmet <[email protected]>

Merge branch 'main' into 367/schema-partial

ea62c66

Merge branch 'main' into 367/schema-partial

f71a530

roll merged commit bd163e8 into main Mar 28, 2024
2 checks passed

roll deleted the 367/schema-partial branch March 28, 2024 08:42

roll mentioned this pull request Mar 28, 2024

Frictionless Standards (v2). Mar 1 - Mar 31 frictionlessdata/datapackage#892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `schema.fieldsMatch` property; clarified extra/non-specified fields in Table Schema #39

Add `schema.fieldsMatch` property; clarified extra/non-specified fields in Table Schema #39

roll commented Feb 21, 2024 •

edited

Loading

roll commented Feb 23, 2024 •

edited

Loading

nichtich commented Feb 23, 2024

roll commented Feb 23, 2024

cloudflare-workers-and-pages bot commented Feb 23, 2024 •

edited

Loading

roll commented Feb 23, 2024 •

edited

Loading

roll commented Mar 11, 2024

peterdesmet left a comment

peterdesmet Mar 13, 2024

roll Mar 14, 2024

peterdesmet Mar 14, 2024

roll Mar 14, 2024

roll Mar 14, 2024

roll commented Mar 14, 2024

peterdesmet commented Mar 14, 2024

roll commented Mar 14, 2024

peterdesmet commented Mar 14, 2024

roll commented Mar 14, 2024 •

edited

Loading

peterdesmet commented Mar 14, 2024

roll commented Mar 14, 2024 •

edited

Loading

roll commented Mar 28, 2024

roll commented Mar 28, 2024


		A Table Schema descriptor `MAY` contain a property `exactFields` that `MUST` be boolean with default value `false`:

		- false (default): The number of fields in the data source `MUST` be equal or more than the number of elements in the `fields` array.

Add schema.fieldsMatch property; clarified extra/non-specified fields in Table Schema #39

Add schema.fieldsMatch property; clarified extra/non-specified fields in Table Schema #39

Conversation

roll commented Feb 21, 2024 • edited Loading

Overview

roll commented Feb 23, 2024 • edited Loading

nichtich commented Feb 23, 2024

roll commented Feb 23, 2024

cloudflare-workers-and-pages bot commented Feb 23, 2024 • edited Loading

Deploying datapackage with Cloudflare Pages

roll commented Feb 23, 2024 • edited Loading

roll commented Mar 11, 2024

peterdesmet left a comment

Choose a reason for hiding this comment

peterdesmet Mar 13, 2024

Choose a reason for hiding this comment

roll Mar 14, 2024

Choose a reason for hiding this comment

peterdesmet Mar 14, 2024

Choose a reason for hiding this comment

roll Mar 14, 2024

Choose a reason for hiding this comment

roll Mar 14, 2024

Choose a reason for hiding this comment

roll commented Mar 14, 2024

peterdesmet commented Mar 14, 2024

roll commented Mar 14, 2024

peterdesmet commented Mar 14, 2024

roll commented Mar 14, 2024 • edited Loading

peterdesmet commented Mar 14, 2024

roll commented Mar 14, 2024 • edited Loading

roll commented Mar 28, 2024

roll commented Mar 28, 2024

Add `schema.fieldsMatch` property; clarified extra/non-specified fields in Table Schema #39

Add `schema.fieldsMatch` property; clarified extra/non-specified fields in Table Schema #39

roll commented Feb 21, 2024 •

edited

Loading

roll commented Feb 23, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Feb 23, 2024 •

edited

Loading

roll commented Feb 23, 2024 •

edited

Loading

roll commented Mar 14, 2024 •

edited

Loading

roll commented Mar 14, 2024 •

edited

Loading