Added sound recognition: emoji and morse code #99

vladturcuman · 2021-05-01T18:18:57Z

This was forked from the JoshuaAHill:fft_josh branch of the JoshuaAHill/codal-microbit-v2 fork.

The changes here allow the microbit to recognise sounds - a generic implementation with samples of the happy, sad, hello, twinkle and soaring emoji sounds added - and transmit and receive morse code data.

Overview of the data pipeline

Audio processor

Overview

The audio processor takes in the microphone data (sampled at 10^6 / MIC_SAMPLE_DELTA Hz which is 11000 Hz now) and produces AudioFrameAnalysis data.

An AudioFrameAnalysis has the fundamental frequencies of a frame - maximum MAXIMUM_NUMBER_OF_FREQUENCIES and ordered from the most likely to the least.

Algorithm

The audio processor accumulates microphone data as it comes in and after getting audio_samples_number of them it process the frame.

It transforms the date from time domain to frequency domain using the CMSIS fft: arm_rfft_fast_f32.

If the standard deviation (std) is lower than std_threshold then the frame is considered silent - no fundamental frequency.

It then filters out the frequencies that have magnitude lower than the mean + std_threshold * std. This ensures that only outlier frequencies are being considered.

It then filters out the neighbour frequencies around the peaks.

Some of these operations are implemented together to optimize the algorithm.

Emoji

The process of adding new sounds to be recognised is described in this issue.

Sound recogniser

Overview

The sound recogniser takes in data from the audio processor - in form of AudioFrameAnalysis. It then tries to match the history against samples of sounds.

Sound fingerprint

A sound has multiple sequences, each sequence having multiple samples to account for the randomness. Each sequence also has:

a threshold - the maximum absolute difference between the sampled frequency and the heard frequency.
maximum number of deviations - the maximum number of data points that can be more than the threshold away from the sampled frequency

A sound also has a maximum number of deviations - the total maximum deviations across all sequences.

Emoji recogniser

Overview

The emoji recogniser is a subclass of sound recogniser that defines the actual samples for the emoji sounds. They are just parts of the emoji sounds that can be recognised: remain quite consistent across multiple plays of the sound.

Example

Taking the happy sound as an example, there are a few constants defined:

happy_sequences the number of sequences in the happy sound
happy_max_deviations the maximum number of deviations in the sound - i.e. a deviation is considered a data point that is more than the allowed threshold off the sampled frequency
happy_samples a 3-dimensional array with the sampled sound:
- the first dimension is the different sequences
- the second is the samples in each sequence
- the third is the data points in each sample of each sequence
happy_thresholds an array with the thresholds for each of the sequences
happy_deviations an array with the maximum deviations for each sequence
happy_nr_samples an array with the number of samples in each sequence

All these are packaged in a Sound struct.

Morse Code

We use the following elements for morse code:

Morse code element	char representation	duration
dot	`'.'`	unit
dash	`'-'`	unit * 3
element gap	not represented	unit
letter gap	`' '`	unit * 3
word gap	`';'`	unit * 7
end of transmission	`'#'`	unit * 10

Dot and dash are audible, the rest are silent.

The reason 'element gap' is not represented is that it can be easily added appropriately at the time that the morse code is played over the speaker. When interpreting morse played by another source, it can just be ignored.

Morse Encoder

The morse encoder has two uses:

it can encode a string of characters into its morse representation.
- This is done by transforming every character using a map and adding spaces between characters and words when appropriate.
- At the end, the end of transmission character is added.
- Whenever a lowercase letter is encountered, it is converted to uppercase.
- Whenever an unknown character is encountered, it is converted to '&'.
it can decode a series of morse elements into the corresponding string.
- Decoding takes every continuous sequence of dots and dashes and transforms it into the appropriate character using another map.
- This is done by building up every continuous sequence of dots and dashes into a small buffer and, when another element is encountered, transforming it into the appropriate character using a map. The buffer is reset every time this happens.
- All of the generated characters are added to the output string.

Morse Player

The morse player takes strings of characters and converts them into morse code using a morse encoder.

Then, it takes every element one by one and plays it over the speaker.

It also adds 'element space' frames between consecutive dots and dashes, in order to help differentiate between audible elements.

Morse recogniser

Overview

The sound recogniser takes in data from the audio processor - in form of AudioFrameAnalysis. It then tries to identify whether the listening frequency was played during that frame. Based on the last frames it sends dots ('.'), dashes ('-'), letter spaces (' '), word spaces (';'), and end-of-transmission ('#') to the data sink connected.

Algorithm

The recogniser first converts the AudioFrameAnalysis data in booleans which represent whether or not the listening frequency was being played in that frame. A timeUnit is calculated based on the sampling frequency and the number of samples taken before analysis by the audio processor - it represents the number of frames a dot should take.

It has 2 values: zeros and ones which count for how many timeUnits the frequency wasn't being played/ was being played. The value synchronised keeps track of the last transition - true if the last transition was from the frequency not being played to it being played, false otherwise. At each transition, the values of zeros or ones are being interpreted into the right character and then reset.

All characters are being accumulated into a buffer and send together when the end-of-transmission character ('#') is found.

Morse Interpreter

This class takes morse data from a morse recogniser and uses a morse encoder to decode it.

It then calls an event signalling the fact that it is done processing some data.

The last processed data can then be taken from a public field.

Sample `main.cpp` for emoji recognising

#include "MicroBit.h"
#include "MicroBitAudioProcessor.h"
#include "EmojiRecogniser.h"
#include "StreamNormalizer.h"
#include "StreamSplitter.h"

MicroBitAudioProcessor *fft = NULL;
MicroBitSoundRecogniser *recogniser = NULL;

MicroBit uBit;

void onSound(MicroBitEvent event) {
    recogniser->stopAnalysing();
    ManagedString sound = EmojiRecogniser::getSoundName(event);
    uBit.display.scroll(sound);
    recogniser->startAnalysing();
}

void play_hello(MicroBitEvent) {
    recogniser->stopAnalysing();
    uBit.audio.soundExpressions.play(ManagedString("hello"));
    recogniser->startAnalysing();
}

int main()
{
    uBit.init();
    
    NRF52ADCChannel *mic = uBit.adc.getChannel(uBit.io.microphone);
    mic->setGain(7,0);
    uBit.io.runmic.setDigitalValue(1);
    uBit.io.runmic.setHighDrive(true);

    StreamNormalizer *processor = new StreamNormalizer(mic->output, 1.2f, true, DATASTREAM_FORMAT_8BIT_SIGNED, 10);
    StreamSplitter *splitter = new StreamSplitter(processor->output);
    fft         = new MicroBitAudioProcessor(*splitter, EMOJI_AUDIO_SAMPLES_NUMBER, EMOJI_STD_THRESHOLD);
    recogniser  = new EmojiRecogniser(*fft);

    recogniser->startAnalysing();

    uBit.messageBus.listen(DEVICE_ID_BUTTON_A, DEVICE_BUTTON_EVT_CLICK, play_hello);

    uBit.messageBus.listen(DEVICE_ID_EMOJI_RECOGNISER, DEVICE_EMOJI_RECOGNISER_EVT_HAPPY,   onSound);
    uBit.messageBus.listen(DEVICE_ID_EMOJI_RECOGNISER, DEVICE_EMOJI_RECOGNISER_EVT_HELLO,   onSound);
    uBit.messageBus.listen(DEVICE_ID_EMOJI_RECOGNISER, DEVICE_EMOJI_RECOGNISER_EVT_SAD,     onSound);
    uBit.messageBus.listen(DEVICE_ID_EMOJI_RECOGNISER, DEVICE_EMOJI_RECOGNISER_EVT_SOARING, onSound);
    uBit.messageBus.listen(DEVICE_ID_EMOJI_RECOGNISER, DEVICE_EMOJI_RECOGNISER_EVT_TWINKLE, onSound);
    
    while(1){
        uBit.display.print("-");
        uBit.sleep(1000);
    }   
}

Sample `main.cpp` for morse transmission

#include "MicroBit.h"
#include "MicroBitAudioProcessor.h"
#include "StreamNormalizer.h"
#include "StreamSplitter.h"
#include "MicroBitMorseInterpreter.h"
#include "MicroBitMorsePlayer.h"

MicroBitAudioProcessor *fft = NULL;
MicroBitMorseRecogniser *recogniser = NULL;
MicroBitMorseInterpreter *interpreter = NULL;

MicroBit uBit;

#define FREQUENCY 2700
#define TIME_UNIT 186

void onMessage(MicroBitEvent){
    interpreter->stopInterpreting();
    uBit.display.scroll( interpreter -> lastMessage );
    interpreter->startInterpreting();
}

void sendMessage(MicroBitEvent){
    interpreter->stopInterpreting();
    MicroBitMorsePlayer mp = MicroBitMorsePlayer(uBit, FREQUENCY, TIME_UNIT);
    mp.play("microbit");
    interpreter->startInterpreting();
}


int main(){
    uBit.init();

    NRF52ADCChannel *mic = uBit.adc.getChannel(uBit.io.microphone);
    mic->setGain(7,0);
    uBit.io.runmic.setDigitalValue(1);
    uBit.io.runmic.setHighDrive(true);

    StreamNormalizer *processor = new StreamNormalizer(mic->output, 1.2f, true, DATASTREAM_FORMAT_8BIT_SIGNED, 10);
    StreamSplitter *splitter = new StreamSplitter(processor->output);
    fft         = new MicroBitAudioProcessor(*splitter, MORSE_AUDIO_SAMPLES_NUMBER, MORSE_STD_THRESHOLD);
    recogniser  = new MicroBitMorseRecogniser(*fft, FREQUENCY, TIME_UNIT);
    interpreter = new MicroBitMorseInterpreter(*recogniser, uBit);

    interpreter->startInterpreting();

    uBit.messageBus.listen(DEVICE_ID_BUTTON_A, DEVICE_BUTTON_EVT_CLICK, sendMessage);
    uBit.messageBus.listen(DEVICE_ID_MORSE_INTERPRETER, DEVICE_MORSE_INTERPRETER_EVT_NEW_MESSAGE, onMessage);


    while(1){
        uBit.display.print(" ");
        uBit.sleep(91500);
    }

}

Input pipeline -? fft

[inputpipeline] Set default SPL level to not be active on start

Changed the MicroBitAudioProcessor to also be a DataSource. Added a Hann Window and Harmonic Product Spectrum before and after fft to get more accurate results for square wave detection. Added MicroBitSoundRecogniser as a DataSink, which recognises given patterns in the frequency data. It connects to the audio processor to get the frequency analysis. The constructor is protected such that the class becomes abstract. EmojiRecogniser inherits it and implements the constructor - adds the happy sound as the sound to be recognised. To recognise the happy sound the following main.cpp can be used: #include "MicroBitSerial.h" #include "MicroBit.h" #include "CodalDmesg.h" #include "MicroBitAudioProcessor.h" #include "EmojiRecogniser.h" #include "StreamNormalizer.h" #include "Tests.h" #include "LevelDetector.h" #include "StreamSplitter.h" static NRF52ADCChannel *mic = NULL; static StreamNormalizer *processor = NULL; static MicroBitAudioProcessor *fft = NULL; static LevelDetector *level = NULL; static StreamSplitter *splitter = NULL; MicroBitSoundRecogniser *recogniser = NULL; MicroBit uBit; void onSound(ManagedString sound) { recogniser->stopAnalisying(); uBit.display.scroll(sound); recogniser->startAnalisying(onSound); } int main() { uBit.init(); NRF52ADCChannel *mic = uBit.adc.getChannel(uBit.io.microphone); mic->setGain(7,0); uBit.io.runmic.setDigitalValue(1); uBit.io.runmic.setHighDrive(true); StreamNormalizer *processor = new StreamNormalizer(mic->output, 1.2f, true, DATASTREAM_FORMAT_8BIT_SIGNED, 10); StreamSplitter *splitter = new StreamSplitter(processor->output); fft = new MicroBitAudioProcessor(*splitter); recogniser = new EmojiRecogniser(*fft, uBit); recogniser->startAnalisying(onSound); while(1){ uBit.display.print("-"); uBit.sleep(1000); } }

…iser

…udio processor

… callback

Get recogniser and interpreter working together

…se code with a reasonable time unit

…e higher microphone sampling rate

…allback

Added documentation to rest of Morse Recognition

…odal-microbit-v2 into sound_recognition

JohnVidler · 2023-01-25T14:30:15Z

I think this needs review and some further development before we can merge this now. Its been a while and the base classes for the audio pipeline have changed.

It all looks very cool in theory however, so I'll add this to my 'to test' list to see how it works with the upstream changes.

JoshuaAHill and others added 9 commits March 3, 2021 18:16

Added FFT Files to V2 Repo

71cec58

Merge pull request #1 from JoshuaAHill/input_pipeline

11bbf4f

Input pipeline -? fft

Merge pull request #2 from JoshuaAHill/input_pipeline

0dde8b9

[inputpipeline] Set default SPL level to not be active on start

Updated FFT Files

a561c1d

Add Morse code classes and bug fixed audio processor and sound recogn…

75234cb

…iser

Re-sampled the soaring and twinkle sounds and added comments to the a…

012e975

…udio processor

Added comments to the sound recogniser

7415cf3

added comments to the emoji recogniser

d15525a

vladturcuman changed the title ~~Added the sound recognition: emoji and morse code~~ Added sound recognition: emoji and morse code May 1, 2021

This was referenced May 1, 2021

Added the sound recognition part JoshuaAHill/codal-microbit-v2#5

Closed

Sampling sounds for recognition (should be a wiki page?) #100

Open

vladturcuman and others added 17 commits May 3, 2021 21:31

Re-sampled the happy sound

7597059

Made EmojiRecogniser to output on the message bus instead of having a…

af924cb

… callback

Get recogniser and interpreter working

93e3973

Merge pull request #1 from vladturcuman/morse_recognition

0d48ede

Get recogniser and interpreter working together

Changed the microphone sampling rate to allow for frequency shift mor…

46650a2

…se code with a reasonable time unit

Added comments to morse

a62ab0f

Merged with sound_recognition

e9425a7

Changed the algorithm for morse recogniser, sampled the sounds for th…

78cd8fe

…e higher microphone sampling rate

updated description for some constants

4e25ef6

Bug fixed audio processor when sampling at 11000 hz

06d53fb

Changed MorseInterpreter to send message on bus instead of having a c…

859c6c6

…allback

Added documentation for the MorseRecogniser

97c0b4b

Mereg recogniser docs

8f175b9

More documentation

fef4f28

Final touches to morse documentation

e10543a

Merge pull request #2 from vladturcuman/morse_recognition

c0a711a

Added documentation to rest of Morse Recognition

Corrected spelling

6833bba

vladturcuman marked this pull request as ready for review May 6, 2021 10:11

vladturcuman added 2 commits May 6, 2021 11:11

Merge branch 'sound_recognition' of https://github.com/vladturcuman/c…

0fef6d8

…odal-microbit-v2 into sound_recognition

Added different threshold for zeroes than ones in morse recogniser

02e3c51

JohnVidler added this to the future-placeholder milestone Jul 4, 2022

JohnVidler self-assigned this Jan 25, 2023

JohnVidler added the enhancement New feature or request label Jan 25, 2023

microbit-carlos removed this from the future-placeholder milestone Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added sound recognition: emoji and morse code #99

Added sound recognition: emoji and morse code #99

vladturcuman commented May 1, 2021 •

edited

Loading

JohnVidler commented Jan 25, 2023

Added sound recognition: emoji and morse code #99

Are you sure you want to change the base?

Added sound recognition: emoji and morse code #99

Conversation

vladturcuman commented May 1, 2021 • edited Loading

Overview of the data pipeline

Audio processor

Overview

Algorithm

Emoji

Sound recogniser

Overview

Sound fingerprint

Emoji recogniser

Overview

Example

Morse Code

Morse Encoder

Morse Player

Morse recogniser

Overview

Algorithm

Morse Interpreter

Sample main.cpp for emoji recognising

Sample main.cpp for morse transmission

JohnVidler commented Jan 25, 2023

vladturcuman commented May 1, 2021 •

edited

Loading

Sample `main.cpp` for emoji recognising

Sample `main.cpp` for morse transmission