Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for openWakeWords Custom Verifier #13

Open
eckley opened this issue Jan 27, 2024 · 17 comments
Open

Support for openWakeWords Custom Verifier #13

eckley opened this issue Jan 27, 2024 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@eckley
Copy link

eckley commented Jan 27, 2024

It would be great if we could add the ability to use more of the feature sets from the openWakeWord project like Custom Verifies, as per Custom Verifier Models.
Unless I'm missing something obvious and it's already supported.

@ser
Copy link

ser commented Feb 17, 2024

I am browsing the code and I am a bit surprised that wyoming-openwakeword seems to NOT using openwakeword library which handles custom verifier models. I would love to understand - why?

I have many false triggers and without a custom verifier it's impossible to use wyoming-openwakeword to be honest :(

@ser
Copy link

ser commented Feb 18, 2024

https://community.home-assistant.io/t/poll-whats-your-biggest-struggle-with-voice-control-right-now/658018/17

Michael writes: @synesthesiam

I’ve started my implementation, but haven’t been able to test it with multiple people yet.

but there is no any code link unfortunately ....

@synesthesiam
Copy link
Contributor

To answer some questions: I don't use the official openWakeWord library because I wanted to implement batching. My implementation is optimized around having multiple audio streams all trying to detect the same wake word at the same time.

I started down the path of implementing custom verifier models (logistic regressions), but I've been wondering lately if using dynamic time warping (like my old Raven system did) might be better.

@ser
Copy link

ser commented Feb 18, 2024

OK, now it's clear, thanks for explanation :)

If you need any testers I am ready to help anytime. I can also understand code so I can be valuable as a tester I suppose. I had to switch off assistant as I simply can't stand "sorry but i did not understand that" three times an hour when i play some radio in the background - I have clear motivation to work on this!

@Greylinux
Copy link

I just opened an issue on dscripka's repo for this exact issue , not realising that this is the HA add-on version 😳 woops ! Well good to see that an issue has been created already. Thanks Mike for your excellent work on all the voice elements.

@synesthesiam synesthesiam self-assigned this Feb 22, 2024
@synesthesiam synesthesiam added the enhancement New feature or request label Feb 22, 2024
@jhbruhn
Copy link

jhbruhn commented Mar 29, 2024

Would you mind sharing your implementation of the logistic regression based custom verifiers, perhaps as a separate branch? Was it already in a usable state?

I'm trying to get reliable wyoming-satellites with local wake words running. I am currently considering implementing a second wyoming-openwakeword-standalone server which uses the original oWW library directly. That implementation doesn't need batching as it will only have one client and inherits VAD (which I've implemented in a PR already) and CVs from oWW.
But if this implementation here provided Custom Verifiers, I would not need to implement the separate handler. What is your opinion on this?

BTW thanks for all the work on the Assist feature, it's already great as is, and the fact that everything is open source enables people like me to a) understand what is going on in the completely local VA and b) somehow give back by contributing (hopefully usable) code! :)

@ser
Copy link

ser commented Mar 30, 2024

@jhbruhn i have just rewritten wyoming-openwakeword to use original libraries and it works much much better with custom verifier indeed

@jhbruhn
Copy link

jhbruhn commented Mar 30, 2024

@ser would you mind sharing your implementation? 😍

@jhbruhn
Copy link

jhbruhn commented Apr 2, 2024

I have just implemented Custom Verifiers on my fork: jhbruhn@1c84f07

But I don't feel it is in a state to make a PR and bring it into wyoming-openwakeword yet:

  1. I have not really tested it yet
  2. It probably makes sense to have different custom verifiers per wyoming client? Because different satellites have different sound characteristics/environments. Perhaps even some form of support in the wyoming protocol makes sense? I don't want to intervene if synesthesiam has a general bigger picture for this in mind 🙂
  3. Using openwakeword pickles is a bit weird because it pulls in openwakeword as a dependency. But training on startup would take a couple of seconds depending on the amount of sample data.

Edit: About 2.: an ensemble of multiple custom verifiers just ran in parallel would be fine I think, as they should be very lightweight and additionally might aid in the separation of activations for multiple devices at once.

@synesthesiam
Copy link
Contributor

I do want to rewrite wyoming-openwakeword to use the original library and include custom verifiers. I've added a new speaker field to the wake detection message so it will be possible to link a custom verifier with a speaker name. In the future, then, HA would be able to use this speaker name.

@ser
Copy link

ser commented Apr 3, 2024

@jhbruhn your implementation looks more interesting than mine, i completely replaced the @synesthesiam code with openwakeword which makes it not easy to publish as it's mess. I will test yours if it gives the same good results as mine.

@jhbruhn
Copy link

jhbruhn commented Apr 3, 2024

I want to try adding a custom verifier manager to my implementation which also manages training based on voice recordings. This way, it is a very hands off approach for custom verifiers, which can be fed with a directory of voice samples (positive and negative) , potentially from different speakers to build an ensemble of custom verifiers which also differentiate speakers. The results from that can then be used for the wyoming speaker attribute.

This way, the current batching implementation can be (somehow) kept. Perhaps the custom verifiers could run in parallel to the wakeword inference as the input features are the same, but I don't want to focus on that for now. Maybe this way @synesthesiam would not have to reimplement this wyoming service to use the original library?

This Custom Verifier manager could then also implement alternative models, perhaps through some kind of hyperparameter optimization during training, or include the aforementioned dynamic time warping algorithm from raven, which might perform even better.

The preliminary results of my implementation mentioned above seem very promising, I didn't notice any false activations, but because I could lower the thresholds in general, I'm also getting less false-negatives. But I've also noticed that, due to the sample data, it is now better at detecting my voice than other peoples voices, which makes the speaker identification capabilities even more promising.

@jhbruhn
Copy link

jhbruhn commented Apr 7, 2024

I have added a verify basic implementation of automatic Custom Verifier training based on a directory of samples categorized into different speakers: jhbruhn@5ea6254

Whenever a wakeword model is first loaded, it checks the folder for positive (per speaker) and negative samples, and if no cached model is found (either in memory or a pickle file), it trains a new verifier before the wakeword thread starts.

It trains a Logistic Regression custom verifier based on the approach demonstrated by dscripka in the openwakeword repository, and still pulls in openwakeword as a dependency. The difference to the original implementation here is though, that the logistic regression is trained on N+1 labels, where N is the amount of speakers. Thus, the labels are for each speaker, and a negative label.
Internally, sklearn should use a one-vs-all ensemble of regressors. There still is some verification needed whether that is a good approach.

The structure is modular, so it would be possible to integrate different custom verifier approaches.

What might be lacking currently is a differentiation between different clients, which might be in different sound environments. But as there is, afaik, no stable Client ID, I skipped this for now.

When a new version of the wyoming library is released, this approach can also include the speaker name with the Detection-event.

The expected sample directory structure is as follows:

<custom-verifier-samples-dir>/
  - positive/
    - speaker_1/
      - sample_1.wav
      - sample_2.wav
      - ...
    - speaker_2/
      - sample_1.wav
      - sample_2.wav
      - ...
  - negative/
    - sample_1.wav
    - sample_2.wav
    - ... 

Edit: For better functionality, it might be a good idea to programatically limit the amount of samples used for training because a) training on an Raspberry Pi can take a long time if a lot of samples are used and b) speaker identification can get biased if the amount of samples per speaker is unbalanced. A general remaining question is, how the sample collection pipeline can be improved. Ideally, the samples can be collected or at least selected via the Home Assistant interface. This would require additions to the wyoming protocol.

@codemunkie15
Copy link

Is there any update on when this might be implemented please? Voice matching is the only thing left now stopping me from converting to Assist. You guys rock!

@jhbruhn
Copy link

jhbruhn commented May 14, 2024

I have tried doing voice matching with the same logisticRegression Classifier the custom verifiers are using, with very little success. Even training a separate LogisticRegression classifier did not yield usable results. I unfortunately don't have the time to do further work on this right now, as the custom verifier functionality is enough for me. Perhaps the dynamic time warping approach synesthesiam suggested above might be a path to evaluate further for voice matching? If that could also do the custom-verifier part, the performance should be okay.

Unfortunately, my custom verifier architecture which you can find on the branch I linked above would need some rework for that, as the custom verifiers can currently only work on the extracted features, not on the audio stream directly. But that should be easy to implement as a buffer of the last 2 seconds is already stored for debug purposes IIRC.

@ser
Copy link

ser commented Jul 1, 2024

I would also love to see it implemented in official openwakeword HA add-on, as it's a way to go.

@synesthesiam
Copy link
Contributor

I have started the process of moving to the official openWakeWord library here: #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants