Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix time offset in key and beat analysis #2152

Merged
merged 14 commits into from
Aug 8, 2019

Conversation

daschuer
Copy link
Member

@daschuer daschuer commented Jun 9, 2019

Now this first sample is placed in the middle of the FFT window to not skip the first samples.

@uklotzde uklotzde changed the title fix time ofset in key analysis Fix time offset in key analysis Jun 9, 2019
@uklotzde
Copy link
Contributor

I'm still unsure about what the correct offset is? Don't want to dig deeper into this class.

What about the frameOverlapFactor in AnalyzerQueenMaryKey? Does this serve a similar purpose and we actually don't need to do it on our own here?

@daschuer
Copy link
Member Author

The frameOverlapFactor is a new config parameter, defaults to 1 and is 1/m_stepSize.
Here we need t make sure that the first sample is not pushed into the edges of the first hamming window having no effect.

…set of samples, starting with the first samples.
const int iTotalFrames = iTotalSamples / 2;
QMap<mixxx::track::io::key::ChromaticKey, double> key_histogram;
QMap<mixxx::track::io::key::ChromaticKey, double> key_occurrence;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a histogram and not a distribution if you only normalize the sums for debug output (see below). Also the term occurrence doesn't fit here and doesn't tell me what it actually means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, yes i was confused about what a histogram is.

src/track/keyutils.h Outdated Show resolved Hide resolved
@uklotzde uklotzde added this to the 2.3.0 milestone Jun 12, 2019
@daschuer
Copy link
Member Author

Done.

Copy link
Contributor

@uklotzde uklotzde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong abstraction level

Copy link
Member Author

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, because the class has exactly this purpose: DownmixAndOverlapHelper()
It is supposed to do the overlap thing right.
But I am open to fix an abstraction issue.

I am not sure if we have the same understanding of the issue.

Let's say the we have a window size of 4 and a step size of one.

Original it stats like this:

####
_####
__####
___####

With this I have noticed that the fist result is significant for the third step. This offset I am trying to fix by. Due to windowing, the most significant pint is the middle of the window.

Now we have

__####
___####
____####
_____####

And the raw chroma detection results fits to the waveform. We have some delays due to the moving everage and Median Filter later, but that is not the point here.

In this an the previous version we have 2x step size undetected. This becomes worse due to the moving Everage and median filter. This is not yet addressed in this PR, but I can add it.

Do you have suggestion how to improve the abstraction.

@uklotzde
Copy link
Contributor

uklotzde commented Jun 14, 2019

Effectively you prepend a period (length = windowSize / 2) of silence to the input range to center the first window on the beginning of the input. But then you should do the same at the end of the input range, i.e. append the same amount of silence to the end of the buffer and keep sliding the window and processing until that extra silence has been consumed, too. This could only be done in finalize(), because only then you know that all available input has been consumed.

What I still don't get is how those partial windows filled with silence should improve the overall analysis results? Wouldn't it be possible that the results could even get worse, because you don't analyze the actual signal? Why pad the signal with extra silence if it may already start or end with almost silence? Are you able to verify and prove the improvements? Otherwise I would recommend to keep the code simple and comprehensible by analyzing only complete windows of actual input data.

@daschuer
Copy link
Member Author

OK, the zero padding in finalize is now in place.

Wouldn't it be possible that the results could even get worse, because you don't analyze the actual signal?

Most tracks starts and end with silence so, so we do not alter the track if we add more silence at the beginning and the end. An issue are live recordings or vinyl rips because they start with a noise floor.

The key detector will see a rectangular jump, containing all frequency. This does not effect the detection result because it adds the same offset to all key bins.

The beat detector detects a false beat at 0:00 when the noise starts in this case. But this was already the case before this PR, because the internal variables of the beat detector are initialized with zero anyway.
The same happens at the end.

I will prepare an other PR, that does a plausibility check for the first and the last beat.

Due to the Hanning window, the first and last samples are almost silenced anyway, so add If we assume that we hear silence before and after the track, it

Why pad the signal with extra silence if it may already start or end with almost silence? Are you able to verify and prove the improvements? Otherwise I would recommend to keep the code simple and comprehensible by analyzing only complete windows of actual input data.

@daschuer
Copy link
Member Author

daschuer commented Jul 2, 2019

The AppVeyor issues are unrelated. Is this OK now?

size_t framesToFillWindow = m_windowSize - m_bufferWritePosition;
size_t numInputFrames = framesToFillWindow;
if (numInputFrames < m_windowSize / 2 - 1) {
// -1 ensures that silence < m_stepSize ramains unprocessed
Copy link
Contributor

@uklotzde uklotzde Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • typo (and I don't understand this remark)
  • use math_max()

@daschuer
Copy link
Member Author

daschuer commented Jul 3, 2019

OK, I have added also a plausibility check for the first beat.

@@ -27,12 +27,11 @@ bool DownmixAndOverlapHelper::processStereoSamples(const CSAMPLE* pInput, size_t
bool DownmixAndOverlapHelper::finalize() {
// We need to append at least m_windowSize / 2 - m_stepSize silence
// to have a valid analysis results for the last track samples.
// Since we go in fixed steps up to "m_stepSize - 1" samples remains
// unrocessed. That th reason we use "m_windowSize / 2 - 1" below
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor typos in comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

// the beat if this is the case.
size_t firstBeat = 0;
if (beats.size() >= 3) {
if (m_detectionResults.at(beats.at(0)) <
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 times beats.at(0)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

int diff = (beats.at(1) - beats.at(0)) - (beats.at(2) - beats.at(1));
// we don't allow a signifcant tempo change after the first beat
if (diff > 2 || diff < -2) {
// firt beat is off grid. Skip it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

// following beats. Skip it.
firstBeat = 1;
} else {
int diff = (beats.at(1) - beats.at(0)) - (beats.at(2) - beats.at(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in the else part? Shouldn't this be applied regardless if firstBeat = 0 or 1, i.e. after shifting the index of the first beat?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but then we would shift the first beat once more. Not sure what the intention was here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I am trying to solve is that the fist beat is often off.
It is detected when a noise starts, or it is detected if the track does not start at a beat.
If the automatic detected cue uses this beat, the track is cued in off beat.
The code drops this beat by these two conditions. Thinking about it, it is probably sufficient to use the time condition only.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the details of beat detection algorithm and would follow your recommendation. Just tried to verify the code and check for inconsistencies or unintended behaviour.

@daschuer daschuer changed the title Fix time offset in key analysis Fix time offset in key and beat analysis Jul 19, 2019
@daschuer
Copy link
Member Author

Is there anything else to do before merge?

@daschuer
Copy link
Member Author

Thank you for the reminder .. done.

@daschuer
Copy link
Member Author

I have tweaked the results a bit, because I notices that the beat seems to be late sometimes.
An other pending issue for a late beat is that the algorithm triggers on a rising and falling edge of a beat. Some times it picks the falling edge even tough the rising edge sounds more like a beat.
This can be fixed in a separate PR.

@uklotzde
Copy link
Contributor

uklotzde commented Aug 8, 2019

No issues so far. I didn't check if the beat detection has actually been improved, since it might only fix certain edge cases.

Thank you for working on all these musical analysis algorithms, Daniel!

LGTM

@uklotzde uklotzde merged commit 58f7a1e into mixxxdj:master Aug 8, 2019
@daschuer daschuer deleted the analyzer_start branch September 26, 2021 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants