-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jams.sample() [ie sampling a series of discrete annotation elements into a time-series] #109
Comments
I think what you're asking for is something like mir_eval's I've implemented this kind of thing a few times, and yeah, it would be useful to have a generic solution. The tricky part here is, like you say, to handle overlapping intervals sanely, and make it generic enough to apply to different namespaces. I think this can be done with some judicious use of lambda functions. For example, you could imagine providing one function that maps a For example, the mapper could be scikit-learn More concretely, it might look like the following: >>> LE = sklearn.preprocessing.LabelEncoder()
>>> LE.fit(array_of_tags)
>>> frames, labels = jams.sample_annotation(annotation, map=LE.transform, reduce=np.maximum) Does this make any kind of sense? Am I over-engineering? |
It does make sense. In my specific case I'd define custom map and reduce functions (I just want to keep the pitch value in Hz, and since notes don't overlap reduce would just pass the value along), but I like this logic. What did you have in mind for the returned |
|
@justinsalamon any interest in picking this one back up? It does seem like a pretty useful feature. I wonder if we can simplify the API compared to what we have above? How about something as simple as values = ann.sample(times) where Similarly, we could also support values = ann.sample_intervals(intervals, min_overlap=0.1) where |
Here's a quick hack of a time sampler: def sample(data, samples):
# data = Annotation.data
# samples = sorted ndarray of sample times
values = [list() for _ in samples]
for obs in data:
start = np.searchsorted(samples, obs.time)
end = np.searchsorted(samples, obs.time + obs.duration, side='right')
for i in range(start, end):
values[i].append(obs.value)
return values The output of this is that Interval-based sampling is a bit trickier to implement. Is there need for this? Or does timestamp sampling cover most of our use cases? |
Offline conversation with @ejhumphrey: punting interval-based sampling for now. A couple of points we should think about:
I'll implement the above and PR soon. |
One more thought: for event-style annotations (eg, beats) or dense annotations (pitch_hz), this sampling doesn't quite make sense, since a value is only registered at time I'd prefer to not try to hack around this and do something clever (ie interpolation), since this is going to be idiomatic to the namespace. Any thoughts on how to deal with this? Do we just document it carefully and let the buyer beware? |
implemented by #173 |
In automatic transcription, it is not uncommon to evaluate the estimated note sequence not just using the note transcription metrics, but also using the melody extraction metrics. Since the melody extraction metrics are frame-based, this requires sampling the note sequence on a fixed time grid to convert it to a time series in the expected format.
Are there any other tasks/use cases where this type of series sampling would come in handy? Is this something worth implementing as a utility function in JAMS? For non-overlapping events the mapping is straight forward. For potentially overlapping events it's a little less, but (for example) the
value
array could contain a list (or tuple?) in each frame containing all values mapped to the same timestamp.Thoughts?
The text was updated successfully, but these errors were encountered: