jams.sample() [ie sampling a series of discrete annotation elements into a time-series] #109

justinsalamon · 2016-04-08T20:06:16Z

In automatic transcription, it is not uncommon to evaluate the estimated note sequence not just using the note transcription metrics, but also using the melody extraction metrics. Since the melody extraction metrics are frame-based, this requires sampling the note sequence on a fixed time grid to convert it to a time series in the expected format.

Are there any other tasks/use cases where this type of series sampling would come in handy? Is this something worth implementing as a utility function in JAMS? For non-overlapping events the mapping is straight forward. For potentially overlapping events it's a little less, but (for example) the value array could contain a list (or tuple?) in each frame containing all values mapped to the same timestamp.

Thoughts?

The text was updated successfully, but these errors were encountered:

bmcfee · 2016-04-08T20:47:03Z

I think what you're asking for is something like mir_eval's intervals_to_samples function, but for overlapping data. That is, something that provides regularly spaced samples of the interval-labeled data.

I've implemented this kind of thing a few times, and yeah, it would be useful to have a generic solution. The tricky part here is, like you say, to handle overlapping intervals sanely, and make it generic enough to apply to different namespaces.

I think this can be done with some judicious use of lambda functions. For example, you could imagine providing one function that maps a value field to an object, and a second function that combines (reduces) objects to a single object.

For example, the mapper could be scikit-learn LabelEncoder.transform method, which knows how to translate strings into sparse encoding vectors, and the reduce function could be np.maximum (or any ufunc). The semantics here would be that each frame gets the union of labels corresponding to intervals containing said frame.

More concretely, it might look like the following:

>>> LE = sklearn.preprocessing.LabelEncoder()
>>> LE.fit(array_of_tags)
>>> frames, labels = jams.sample_annotation(annotation, map=LE.transform, reduce=np.maximum)

Does this make any kind of sense? Am I over-engineering?

justinsalamon · 2016-04-08T21:03:33Z

It does make sense. In my specific case I'd define custom map and reduce functions (I just want to keep the pitch value in Hz, and since notes don't overlap reduce would just pass the value along), but I like this logic.

What did you have in mind for the returned frames and labels? Just np.ndarrays with timestamps and values?

bmcfee · 2016-04-08T21:30:24Z

What did you have in mind for the returned frames and labels? Just np.ndarrays with timestamps and values?

frames would be an ndarray of time indices (in seconds)
labels would be a list. If you have converters that map to something other than ndarrays, it doesn't necessarily make sense to stack them as an ndarray.

bmcfee · 2017-08-31T16:58:28Z

@justinsalamon any interest in picking this one back up? It does seem like a pretty useful feature.

I wonder if we can simplify the API compared to what we have above? How about something as simple as

values = ann.sample(times)

where values[i] is a list of values derived from intervals that contain time times[i]?
It's then on the caller to figure out what to do with multi-valued time steps.

Similarly, we could also support

values = ann.sample_intervals(intervals, min_overlap=0.1)

where values[i] is the set of values that overlap with the target interval intervals[i] (a start and end time) by at least min_overlap seconds.

bmcfee · 2017-09-01T20:07:20Z

Here's a quick hack of a time sampler:

def sample(data, samples):
    # data = Annotation.data
    # samples = sorted ndarray of sample times

    values = [list() for _ in samples]

    for obs in data:
        start = np.searchsorted(samples, obs.time)
        end = np.searchsorted(samples, obs.time + obs.duration, side='right')
        for i in range(start, end):
            values[i].append(obs.value)

    return values

The output of this is that values[i] is a list containing all the values of intervals in data that overlap with time samples[i].

Interval-based sampling is a bit trickier to implement. Is there need for this? Or does timestamp sampling cover most of our use cases?

bmcfee · 2017-09-08T21:13:06Z

Offline conversation with @ejhumphrey: punting interval-based sampling for now. A couple of points we should think about:

naming: to_samples() might make more sense, as sample() connotes random selection of observations
helper function to generate the sampling grid and return the tuple of times, values
option to return confidences as well as values

I'll implement the above and PR soon.

bmcfee · 2017-09-08T23:44:10Z

One more thought: for event-style annotations (eg, beats) or dense annotations (pitch_hz), this sampling doesn't quite make sense, since a value is only registered at time t if time <= t < time + duration, which almost never happens when duration=0.

I'd prefer to not try to hack around this and do something clever (ie interpolation), since this is going to be idiomatic to the namespace. Any thoughts on how to deal with this? Do we just document it carefully and let the buyer beware?

bmcfee · 2017-09-28T14:02:37Z

implemented by #173

bmcfee modified the milestone: 0.3.1 Sep 1, 2017

bmcfee added a commit that referenced this issue Sep 9, 2017

implemented #109

756bed2

bmcfee mentioned this issue Sep 9, 2017

[CR] annotation sampler #173

Merged

bmcfee closed this as completed Sep 28, 2017

ejhumphrey mentioned this issue Dec 11, 2018

on possible (minor) tweaks to Annotation.to_samples #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jams.sample() [ie sampling a series of discrete annotation elements into a time-series] #109

jams.sample() [ie sampling a series of discrete annotation elements into a time-series] #109

justinsalamon commented Apr 8, 2016

bmcfee commented Apr 8, 2016

justinsalamon commented Apr 8, 2016

bmcfee commented Apr 8, 2016

bmcfee commented Aug 31, 2017

bmcfee commented Sep 1, 2017 •

edited

Loading

bmcfee commented Sep 8, 2017

bmcfee commented Sep 8, 2017

bmcfee commented Sep 28, 2017

jams.sample() [ie sampling a series of discrete annotation elements into a time-series] #109

jams.sample() [ie sampling a series of discrete annotation elements into a time-series] #109

Comments

justinsalamon commented Apr 8, 2016

bmcfee commented Apr 8, 2016

justinsalamon commented Apr 8, 2016

bmcfee commented Apr 8, 2016

bmcfee commented Aug 31, 2017

bmcfee commented Sep 1, 2017 • edited Loading

bmcfee commented Sep 8, 2017

bmcfee commented Sep 8, 2017

bmcfee commented Sep 28, 2017

bmcfee commented Sep 1, 2017 •

edited

Loading