You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a wake word model and applied it to STT, but I ran into a problem with the prediction accuracy output range. thomas_oww.zip
I am Korean, so the test voice is a Korean male pronunciation.
When I created a model for a specific wake word using the model generation code provided in the colab and applied it, the model prediction output value came out in the 0.00x value range, so I couldn't set an appropriate wake threshold value. Has anyone had a similar experience?
Currently, when I experimented with it, it was just silent, and the oww prediction score value was output from 0.0008 to 0.00099.
When I shouted the wake word, it went up to 0.0015.
But If you make a loud noise without using a wake word, it went up to 0.002.
So it seems like it's not usable yet, because the oww performance isn't good.
That's the conclusion.
Is it because it only supports English?
print(f"idx = {idx}, scores[-1] = {scores[-1]}") This is the output value under the code.
def _process_wakeword(self, data):
"""
Processes audio data to detect wake words.
"""
if self.wakeword_backend in {'pvp', 'pvporcupine'}:
pcm = struct.unpack_from(
"h" * self.buffer_size,
data
)
porcupine_index = self.porcupine.process(pcm)
if self.debug_mode:
print (f"wake words porcupine_index: {porcupine_index}")
return self.porcupine.process(pcm)
elif self.wakeword_backend in {'oww', 'openwakeword', 'openwakewords'}:
pcm = np.frombuffer(data, dtype=np.int16)
prediction = self.owwModel.predict(pcm)
# print(f"prediction = {prediction}\n")
max_score = -1
max_index = -1
wake_words_in_prediction = len(self.owwModel.prediction_buffer.keys())
self.wake_words_sensitivities
if wake_words_in_prediction:
for idx, mdl in enumerate(self.owwModel.prediction_buffer.keys()):
scores = list(self.owwModel.prediction_buffer[mdl])
# print(f"idx = {idx}, scores[-1] = {scores[-1]}")
if scores[-1] >= self.wake_words_sensitivity and scores[-1] > max_score:
max_score = scores[-1]
max_index = idx
if self.debug_mode:
print (f"wake words oww max_index, max_score: {max_index} {max_score}")
# print(f"max_index = {max_index}\n")
return max_index
else:
if self.debug_mode:
print (f"wake words oww_index: -1")
return -1
if self.debug_mode:
print("wake words no match")
return -1
The text was updated successfully, but these errors were encountered:
I created a wake word model and applied it to STT, but I ran into a problem with the prediction accuracy output range.
thomas_oww.zip
I am Korean, so the test voice is a Korean male pronunciation.
When I created a model for a specific wake word using the model generation code provided in the colab and applied it, the model prediction output value came out in the 0.00x value range, so I couldn't set an appropriate wake threshold value. Has anyone had a similar experience?
Currently, when I experimented with it, it was just silent, and the oww prediction score value was output from 0.0008 to 0.00099.
When I shouted the wake word, it went up to 0.0015.
But If you make a loud noise without using a wake word, it went up to 0.002.
So it seems like it's not usable yet, because the oww performance isn't good.
That's the conclusion.
Is it because it only supports English?
print(f"idx = {idx}, scores[-1] = {scores[-1]}") This is the output value under the code.
The text was updated successfully, but these errors were encountered: