-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added sound recognition: emoji and morse code #99
Open
vladturcuman
wants to merge
28
commits into
lancaster-university:master
Choose a base branch
from
vladturcuman:sound_recognition
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Added sound recognition: emoji and morse code #99
vladturcuman
wants to merge
28
commits into
lancaster-university:master
from
vladturcuman:sound_recognition
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Input pipeline -? fft
[inputpipeline] Set default SPL level to not be active on start
Changed the MicroBitAudioProcessor to also be a DataSource. Added a Hann Window and Harmonic Product Spectrum before and after fft to get more accurate results for square wave detection. Added MicroBitSoundRecogniser as a DataSink, which recognises given patterns in the frequency data. It connects to the audio processor to get the frequency analysis. The constructor is protected such that the class becomes abstract. EmojiRecogniser inherits it and implements the constructor - adds the happy sound as the sound to be recognised. To recognise the happy sound the following main.cpp can be used: #include "MicroBitSerial.h" #include "MicroBit.h" #include "CodalDmesg.h" #include "MicroBitAudioProcessor.h" #include "EmojiRecogniser.h" #include "StreamNormalizer.h" #include "Tests.h" #include "LevelDetector.h" #include "StreamSplitter.h" static NRF52ADCChannel *mic = NULL; static StreamNormalizer *processor = NULL; static MicroBitAudioProcessor *fft = NULL; static LevelDetector *level = NULL; static StreamSplitter *splitter = NULL; MicroBitSoundRecogniser *recogniser = NULL; MicroBit uBit; void onSound(ManagedString sound) { recogniser->stopAnalisying(); uBit.display.scroll(sound); recogniser->startAnalisying(onSound); } int main() { uBit.init(); NRF52ADCChannel *mic = uBit.adc.getChannel(uBit.io.microphone); mic->setGain(7,0); uBit.io.runmic.setDigitalValue(1); uBit.io.runmic.setHighDrive(true); StreamNormalizer *processor = new StreamNormalizer(mic->output, 1.2f, true, DATASTREAM_FORMAT_8BIT_SIGNED, 10); StreamSplitter *splitter = new StreamSplitter(processor->output); fft = new MicroBitAudioProcessor(*splitter); recogniser = new EmojiRecogniser(*fft, uBit); recogniser->startAnalisying(onSound); while(1){ uBit.display.print("-"); uBit.sleep(1000); } }
vladturcuman
changed the title
Added the sound recognition: emoji and morse code
Added sound recognition: emoji and morse code
May 1, 2021
This was referenced May 1, 2021
Get recogniser and interpreter working together
…se code with a reasonable time unit
…e higher microphone sampling rate
Added documentation to rest of Morse Recognition
I think this needs review and some further development before we can merge this now. Its been a while and the base classes for the audio pipeline have changed. It all looks very cool in theory however, so I'll add this to my 'to test' list to see how it works with the upstream changes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This was forked from the JoshuaAHill:fft_josh branch of the JoshuaAHill/codal-microbit-v2 fork.
The changes here allow the microbit to recognise sounds - a generic implementation with samples of the happy, sad, hello, twinkle and soaring emoji sounds added - and transmit and receive morse code data.
Overview of the data pipeline
Audio processor
Overview
The audio processor takes in the microphone data (sampled at
10^6 / MIC_SAMPLE_DELTA
Hz which is 11000 Hz now) and produces AudioFrameAnalysis data.An AudioFrameAnalysis has the fundamental frequencies of a frame - maximum
MAXIMUM_NUMBER_OF_FREQUENCIES
and ordered from the most likely to the least.Algorithm
The audio processor accumulates microphone data as it comes in and after getting audio_samples_number of them it process the frame.
It transforms the date from time domain to frequency domain using the CMSIS fft:
arm_rfft_fast_f32
.If the standard deviation (std) is lower than
std_threshold
then the frame is considered silent - no fundamental frequency.It then filters out the frequencies that have magnitude lower than the
mean + std_threshold * std
. This ensures that only outlier frequencies are being considered.It then filters out the neighbour frequencies around the peaks.
Some of these operations are implemented together to optimize the algorithm.
Emoji
The process of adding new sounds to be recognised is described in this issue.
Sound recogniser
Overview
The sound recogniser takes in data from the audio processor - in form of
AudioFrameAnalysis
. It then tries to match the history against samples of sounds.Sound fingerprint
A sound has multiple sequences, each sequence having multiple samples to account for the randomness. Each sequence also has:
A sound also has a maximum number of deviations - the total maximum deviations across all sequences.
Emoji recogniser
Overview
The emoji recogniser is a subclass of sound recogniser that defines the actual samples for the emoji sounds. They are just parts of the emoji sounds that can be recognised: remain quite consistent across multiple plays of the sound.
Example
Taking the happy sound as an example, there are a few constants defined:
happy_sequences
the number of sequences in the happy soundhappy_max_deviations
the maximum number of deviations in the sound - i.e. a deviation is considered a data point that is more than the allowed threshold off the sampled frequencyhappy_samples
a 3-dimensional array with the sampled sound:happy_thresholds
an array with the thresholds for each of the sequenceshappy_deviations
an array with the maximum deviations for each sequencehappy_nr_samples
an array with the number of samples in each sequenceAll these are packaged in a Sound struct.
Morse Code
We use the following elements for morse code:
'.'
'-'
' '
';'
'#'
Dot and dash are audible, the rest are silent.
The reason 'element gap' is not represented is that it can be easily added appropriately at the time that the morse code is played over the speaker. When interpreting morse played by another source, it can just be ignored.
Morse Encoder
The morse encoder has two uses:
it can encode a string of characters into its morse representation.
it can decode a series of morse elements into the corresponding string.
Morse Player
The morse player takes strings of characters and converts them into morse code using a morse encoder.
Then, it takes every element one by one and plays it over the speaker.
It also adds 'element space' frames between consecutive dots and dashes, in order to help differentiate between audible elements.
Morse recogniser
Overview
The sound recogniser takes in data from the audio processor - in form of
AudioFrameAnalysis
. It then tries to identify whether the listening frequency was played during that frame. Based on the last frames it sends dots ('.'
), dashes ('-'
), letter spaces (' '
), word spaces (';'
), and end-of-transmission ('#'
) to the data sink connected.Algorithm
The recogniser first converts the
AudioFrameAnalysis
data in booleans which represent whether or not the listening frequency was being played in that frame. AtimeUnit
is calculated based on the sampling frequency and the number of samples taken before analysis by the audio processor - it represents the number of frames a dot should take.It has 2 values:
zeros
andones
which count for how manytimeUnit
s the frequency wasn't being played/ was being played. The valuesynchronised
keeps track of the last transition -true
if the last transition was from the frequency not being played to it being played,false
otherwise. At each transition, the values ofzeros
orones
are being interpreted into the right character and then reset.All characters are being accumulated into a buffer and send together when the end-of-transmission character (
'#'
) is found.Morse Interpreter
This class takes morse data from a morse recogniser and uses a morse encoder to decode it.
It then calls an event signalling the fact that it is done processing some data.
The last processed data can then be taken from a public field.
Sample
main.cpp
for emoji recognisingSample
main.cpp
for morse transmission