-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING: add support for targets as discussed in #3 #123
Conversation
I have just added a |
A good next step would be to have it return an ObjectDict (see #126) instead of tensor or tuple |
Will do. |
What do you think of automatically inferring target_rate = sample_rate * num_frames / num_samples with the |
I would prefer to have it explicitly provided by the user. Here's an example that explains why: Yamnet has a rate of 2,0833 hz (1 classification output for every step of 480 ms). In this example the rate cannot be accurately inferred, especially if the audio is short, because e.g. 1,1 seconds and 1,3 seconds of audio will give the same number of frames. |
Noted. When warnings.warn(
f"target_rate is required by {self.__class__.__name__}. "
f"It has been automatically inferred from targets shape to {inferred_target_rate} "
f"If this is incorrect, please use target_rate to pass it directly.") |
Yes, I'm okay with that |
Updated with the API discussed in Slack. |
Thanks :) I'll have a look soon-ish. Thanks for your patience. |
This is starting to look quite good already! Here's what I think needs to be done: Before it gets merged:
After it gets merged:
|
I definitely can (and will) but I am having trouble thinking about what to test. I can do the equivalent of |
At the very least I like to have the shape and the dtype of the output tested. And that the output isn't the same as the output. But of course, smart test assertions are useful 👍 |
Thanks for the contribution 🚀 |
Thanks for merging this!
I can take care of it but can you please clarify the API?
I can easily create a notebook showing how I use this new feature in pyannote.audio.
I suggest you do that :) |
Great! Hmm, how about something like this, for example:
Sounds good to me :) |
This is an initial attempt at addressing #3.
The proposed API assumes the following shapes:
(batch_size, num_channels, num_samples)
(batch_size, num_channels, num_frames, num_classes)
num_frames
num_classes
num_samples
and is used like that