You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#1290 highlighted a rule that got added with microscopy that we have not considered in the schema. In Samples file:
The combination of sample_id and participant_id MUST be unique.
And of course in participants.tsv "There MUST be exactly one row for each participant."
In both of these cases, each row is metadata associated with an object identified by one or more of the columns, making those columns the primary key in relational database terms. The values (or tuple of values, if multiple columns) must be unique because they can be used to look up the row. While we could write rules and come up with expressions that would enforce these constraints, it seems less ad hoc to make this a declarative property of a table.
Tooling would then enforce (and perhaps take advantage of) this constraint.
I use index_columns because it feels a little less jargony than primary key, but we could also just use primary_key and document its meaning.
Note that there are some TSV files (such as events.tsv) where there is no primary key, so we can't get away with an implicit "first column is primary unless otherwise stated". Or we can, but then we would need to declare some file types as not having them.
The text was updated successfully, but these errors were encountered:
effigies
added
the
schema
Issues related to the YAML schema representation of the specification. Patch version release.
label
Sep 26, 2022
I like index_columns as well. Only other constraint that I'd like to see is that any column used as an index must be a required column, I feel like that keeps the meaning of the rule simpler since there's something else guaranteeing the columns existence.
#1290 highlighted a rule that got added with microscopy that we have not considered in the schema. In Samples file:
And of course in participants.tsv "There MUST be exactly one row for each participant."
In both of these cases, each row is metadata associated with an object identified by one or more of the columns, making those columns the primary key in relational database terms. The values (or tuple of values, if multiple columns) must be unique because they can be used to look up the row. While we could write rules and come up with expressions that would enforce these constraints, it seems less ad hoc to make this a declarative property of a table.
Here's a proposal:
Tooling would then enforce (and perhaps take advantage of) this constraint.
I use
index_columns
because it feels a little less jargony than primary key, but we could also just useprimary_key
and document its meaning.Note that there are some TSV files (such as events.tsv) where there is no primary key, so we can't get away with an implicit "first column is primary unless otherwise stated". Or we can, but then we would need to declare some file types as not having them.
The text was updated successfully, but these errors were encountered: