-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix time offset in key and beat analysis #2152
Conversation
I'm still unsure about what the correct offset is? Don't want to dig deeper into this class. What about the |
The frameOverlapFactor is a new config parameter, defaults to 1 and is 1/m_stepSize. |
…set of samples, starting with the first samples.
src/track/keyutils.cpp
Outdated
const int iTotalFrames = iTotalSamples / 2; | ||
QMap<mixxx::track::io::key::ChromaticKey, double> key_histogram; | ||
QMap<mixxx::track::io::key::ChromaticKey, double> key_occurrence; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a histogram and not a distribution if you only normalize the sums for debug output (see below). Also the term occurrence doesn't fit here and doesn't tell me what it actually means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, yes i was confused about what a histogram is.
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong abstraction level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, because the class has exactly this purpose: DownmixAndOverlapHelper()
It is supposed to do the overlap thing right.
But I am open to fix an abstraction issue.
I am not sure if we have the same understanding of the issue.
Let's say the we have a window size of 4 and a step size of one.
Original it stats like this:
####
_####
__####
___####
With this I have noticed that the fist result is significant for the third step. This offset I am trying to fix by. Due to windowing, the most significant pint is the middle of the window.
Now we have
__####
___####
____####
_____####
And the raw chroma detection results fits to the waveform. We have some delays due to the moving everage and Median Filter later, but that is not the point here.
In this an the previous version we have 2x step size undetected. This becomes worse due to the moving Everage and median filter. This is not yet addressed in this PR, but I can add it.
Do you have suggestion how to improve the abstraction.
Effectively you prepend a period (length = windowSize / 2) of silence to the input range to center the first window on the beginning of the input. But then you should do the same at the end of the input range, i.e. append the same amount of silence to the end of the buffer and keep sliding the window and processing until that extra silence has been consumed, too. This could only be done in What I still don't get is how those partial windows filled with silence should improve the overall analysis results? Wouldn't it be possible that the results could even get worse, because you don't analyze the actual signal? Why pad the signal with extra silence if it may already start or end with almost silence? Are you able to verify and prove the improvements? Otherwise I would recommend to keep the code simple and comprehensible by analyzing only complete windows of actual input data. |
OK, the zero padding in finalize is now in place.
Most tracks starts and end with silence so, so we do not alter the track if we add more silence at the beginning and the end. An issue are live recordings or vinyl rips because they start with a noise floor. The key detector will see a rectangular jump, containing all frequency. This does not effect the detection result because it adds the same offset to all key bins. The beat detector detects a false beat at 0:00 when the noise starts in this case. But this was already the case before this PR, because the internal variables of the beat detector are initialized with zero anyway. I will prepare an other PR, that does a plausibility check for the first and the last beat. Due to the Hanning window, the first and last samples are almost silenced anyway, so add If we assume that we hear silence before and after the track, it Why pad the signal with extra silence if it may already start or end with almost silence? Are you able to verify and prove the improvements? Otherwise I would recommend to keep the code simple and comprehensible by analyzing only complete windows of actual input data. |
The AppVeyor issues are unrelated. Is this OK now? |
size_t framesToFillWindow = m_windowSize - m_bufferWritePosition; | ||
size_t numInputFrames = framesToFillWindow; | ||
if (numInputFrames < m_windowSize / 2 - 1) { | ||
// -1 ensures that silence < m_stepSize ramains unprocessed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- typo (and I don't understand this remark)
- use math_max()
OK, I have added also a plausibility check for the first beat. |
@@ -27,12 +27,11 @@ bool DownmixAndOverlapHelper::processStereoSamples(const CSAMPLE* pInput, size_t | |||
bool DownmixAndOverlapHelper::finalize() { | |||
// We need to append at least m_windowSize / 2 - m_stepSize silence | |||
// to have a valid analysis results for the last track samples. | |||
// Since we go in fixed steps up to "m_stepSize - 1" samples remains | |||
// unrocessed. That th reason we use "m_windowSize / 2 - 1" below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor typos in comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
// the beat if this is the case. | ||
size_t firstBeat = 0; | ||
if (beats.size() >= 3) { | ||
if (m_detectionResults.at(beats.at(0)) < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 times beats.at(0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
int diff = (beats.at(1) - beats.at(0)) - (beats.at(2) - beats.at(1)); | ||
// we don't allow a signifcant tempo change after the first beat | ||
if (diff > 2 || diff < -2) { | ||
// firt beat is off grid. Skip it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
// following beats. Skip it. | ||
firstBeat = 1; | ||
} else { | ||
int diff = (beats.at(1) - beats.at(0)) - (beats.at(2) - beats.at(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why in the else part? Shouldn't this be applied regardless if firstBeat = 0 or 1, i.e. after shifting the index of the first beat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but then we would shift the first beat once more. Not sure what the intention was here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I am trying to solve is that the fist beat is often off.
It is detected when a noise starts, or it is detected if the track does not start at a beat.
If the automatic detected cue uses this beat, the track is cued in off beat.
The code drops this beat by these two conditions. Thinking about it, it is probably sufficient to use the time condition only.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the details of beat detection algorithm and would follow your recommendation. Just tried to verify the code and check for inconsistencies or unintended behaviour.
Is there anything else to do before merge? |
Thank you for the reminder .. done. |
…be anywhere in the range.
I have tweaked the results a bit, because I notices that the beat seems to be late sometimes. |
No issues so far. I didn't check if the beat detection has actually been improved, since it might only fix certain edge cases. Thank you for working on all these musical analysis algorithms, Daniel! LGTM |
Now this first sample is placed in the middle of the FFT window to not skip the first samples.