Schema refactor #92

bmcfee · 2015-12-14T15:09:41Z

Rehashing #40 after a conversation with @ejhumphrey

There are good arguments for splitting the JAMS schema into smaller pieces that can be shared and repurposed. Specifically, a database (eg, a mongodb key-value store) for managing jams collections could be more reasonable structured (and easily searchable) if the database contains individual annotation objects (indexed by track id) rather than full JAMS objects.

I propose that we refactor the jams schema so that annotations can exist independently of the JAMS file format. Of course, the JAMS file format will still use annotation definitions, so there should be no observable difference in the way JAMS files work*; put another way, the API for JAMS files stays the same, and all the changes would be under the hood.

Digging in a bit more, the current schema looks like:

jams_schema
`- JAMS
   `- FileMetadata
   |  `- [more stuff]
   `- Annotations
   |  `- [more stuff]
   `- Sandbox

and the refactored schema might look like:

jams_common
`- Sandbox

jams_annotation
`- Annotations
   `- [more stuff]

jams_metadata
`- FileMetadata
   `- [more stuff]

jams_file
`- JAMS
   `- jams_metadata.FileMetadata
   `- jams_annotation.Annotations
   `- jams_common.Sandbox

What do folks think?

To make this happen, we'd have to get a better handle on json-schema inheritance, but I think it's totally possible.

We might have to tweak the schema id's, which might require a slight modification to the spec. Not sure about this yet.

The text was updated successfully, but these errors were encountered:

ejhumphrey · 2016-08-18T15:56:11Z

More related to this than worth spawning a new issue: I'd like to revisit / upvote a conversation about how identifiers / named entities are referenced in JAMS. For example, I'd like to tag a single annotation as being produced by some unique identifier, such that I can search a collection for all annotations performed by the same entity (human or algorithm). We've got the annotator dict, but it's a little too unconstrained to encourage any convention.

bmcfee · 2016-08-18T16:03:42Z

I'm not sure that fits under the scope of JAMS per se; remember the headaches about filenames in #5? We eventually decided that that's better handled at the application level -- for better or worse. I suspect that indexing annotation sources will have similar difficulties.

OTOH, if we do want to add support for foreign-key indexing (for tracks, annotators, etc), maybe it's worth reopening that discussion?

urinieto · 2016-08-18T16:24:04Z

Could we simply add a new identifier field in the annotator dictionary that
is basically a unique hash produced by the annotator name, email,
affiliation, etc?

On Thu, Aug 18, 2016 at 9:03 AM, Brian McFee [email protected]
wrote:

I'm not sure that fits under the scope of JAMS per se; remember the
headaches about filenames in #5 #5?
We eventually decided that that's better handled at the application level
-- for better or worse. I suspect that indexing annotation sources will
have similar difficulties.

OTOH, if we do want to add support for foreign-key indexing (for tracks,
annotators, etc), maybe it's worth reopening that discussion?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#92 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADhisZBuTv6usFCrjzvjP9YFyMB1CaEqks5qhIJfgaJpZM4G004O
.

ejhumphrey · 2016-08-18T16:27:40Z

I don't want to necessarily tell users what the namespace should be, but I
think we could benefit from some standardization.
On Aug 18, 2016 12:24, "Oriol Nieto" [email protected] wrote:

Could we simply add a new identifier field in the annotator dictionary that
is basically a unique hash produced by the annotator name, email,
affiliation, etc?

On Thu, Aug 18, 2016 at 9:03 AM, Brian McFee [email protected]
wrote:

I'm not sure that fits under the scope of JAMS per se; remember the
headaches about filenames in #5 #5?
We eventually decided that that's better handled at the application level
-- for better or worse. I suspect that indexing annotation sources will
have similar difficulties.

OTOH, if we do want to add support for foreign-key indexing (for tracks,
annotators, etc), maybe it's worth reopening that discussion?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#92 (comment), or mute
the thread
<https://github.com/notifications/unsubscribe-auth/
ADhisZBuTv6usFCrjzvjP9YFyMB1CaEqks5qhIJfgaJpZM4G004O>
.

—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#92 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA4iq-7ogkaDZh5FztT8OuNA18mVUdxhks5qhIclgaJpZM4G004O
.

bmcfee · 2016-08-18T17:18:49Z

Maybe go rosetta-style? Let identifiers be a list of strings of the form id_space:id_string?

That will at least validate for syntax. If you want semantic validation, that's up to a separate indexing structure that should live outside of jams.

For example, the SALAMI annotators could be identified by salami:0001 or somesuch. Similarly for annotation tools (org:software:version -> qmul:sonic-visualiser:1.2, qmul:tony:2.0, jku:madmom:0.14.1, etc), and filenames could just be standard urls.

bmcfee added the enhancement label Dec 14, 2015

bmcfee assigned ejhumphrey Dec 14, 2015

bmcfee added this to the 0.3.0 milestone Dec 14, 2015

bmcfee mentioned this issue May 6, 2016

Dictionary interface for quick access to annotations #112

Closed

bmcfee modified the milestones: 0.3.0, 0.4.0 May 11, 2017

This was referenced May 10, 2018

indexer bmcfee/muda#8

Closed

JAMS Quick Look #19

Open

bmcfee mentioned this issue Aug 9, 2019

Support jsonschema>=3.0 #202

Merged

bmcfee added the schema Issues pertaining to schema definitions label Aug 12, 2019

bmcfee mentioned this issue Aug 12, 2019

RFE: One file/one jam paradigm not suited for large datasets #86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema refactor #92

Schema refactor #92

bmcfee commented Dec 14, 2015

ejhumphrey commented Aug 18, 2016

bmcfee commented Aug 18, 2016

urinieto commented Aug 18, 2016

ejhumphrey commented Aug 18, 2016

bmcfee commented Aug 18, 2016

Schema refactor #92

Schema refactor #92

Comments

bmcfee commented Dec 14, 2015

ejhumphrey commented Aug 18, 2016

bmcfee commented Aug 18, 2016

urinieto commented Aug 18, 2016

ejhumphrey commented Aug 18, 2016

bmcfee commented Aug 18, 2016