You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the problem or limitation you are having in your project
The microphone input is unreliable and sounds distorted on different platforms when it is put to use.
I have gone into the implementation in depth as discussed in my previous proposal.
TDLR:
When the rate of the input samples from microphone does not exactly match the rate of the output sample in the audio system the buffering between them fails and causes bugs.
The current implementation uses an extra buffer that causes lag in the audio encoding.
Microphone data is not required in the audio system, as evidenced by the fact that it must always be routed to a bus that is set to mute.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
The microphone would be input in the same way as the midi and the mouse is input, and can be handled in the same way.
The most common purpose of microphone data is for packaging and compressing into Opus packets for transmission across the network to another player. The unpack and playback stage in the other player's game has to handle the jitter and buffering of packets as well the mismatch between the clockrates on different computers. Compared to this, any jittering on the rate of input is irrelevant.
The second most common purpose of microphone data is as a trigger (shout or blowing in the mic) to set off an event.
The least common use is to feed the sound into the speakers because it causes a feedback screech.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
Basing this on the Midi Input we should control this with the functions:
Then we emit an InputEventMicrophone every time the AudioDriver::input_buffer has more than chunk_size samples in it, with the properties:
data : PackedVector2Array # length = chunk_size
abs_max : float # maximum volume of the samples in the chunks
These packets could be resampled and encoded into opus chunks the moment they are emitted to the _input() function, and not in a process loop that has to poll the buffer to check if it's got enough samples to form a chunk. The default Opus chunk size is 20ms long which is 882 samples at 44.1kHz.
This interface also very useful if you want to use a voice activated sound (eg shout or blow into the microphone to fire a bullet) by monitoring the value of abs_max as you would the events from a pressure sensitive pad.
To avoid copying large chunks of data around in the InputEventMicrophone, we could instead return indexes into the input_buffer and copy the data to the Opus encoding buffer with a function:
The code required for this implementation of this feature would be modest. It is the case that all five platforms fill the input_buffer via the AudioDriver::input_buffer_write(int32_t sample) function in chunks of varying size. At the end of each of these chunks we need to call a new function:
AudioDriver::generate_microphone_events()
This would construct as many events in the buffer as necessary that matched the chunk_size, and then call Input::get_singleton()->parse_input_event(event) on each one to enter it into the system.
If this enhancement will not be used often, can it be worked around with a few lines of script?
No
Is there a reason why this should be core and not an add-on in the asset library?
The microphone is core.
The text was updated successfully, but these errors were encountered:
What are the plans for backwards compatibility? Will both the current and new system co-exist? Or should parts of the old system get deprecated?
Also be aware that many nodes in a game will have _input virtual methods - flooding each of them with microphone inputs at such a high rate will likely impact performance of gdscript code. Wouldn't it make more sense to have a more centralized API than using _input for this, which only calls a single method per chunk. Such as passing a Callable to AudioServer. Or registering a custom class like a MicrophoneHandler on AudioServer (works similarly to the editor plugin apis), which will have a virtual method called on itself.
Very good points. In general, Input events (eg button presses) should be pushed into the system as they arrive rather than polled.
The lack of filtering by event type on the _input(event) functions is missing optimization. I haven't found the proposal (which must exist) where we could define the function:
func _input(event : InputEventKey)
so that the events are filtered before they enter GDScript
There are pragmatic inconsistencies to consider:
Why do Mouse Motions and Screen Drags arrive as InputEvents, but joystick values require you to poll Input.get_joy_axis()?
Why do on-screen buttons emit signals, but physical keyboard buttons arrive as InputEvents?
Microphone audio chunks are arguably events that occur when each buffer chunk fills. They are not like positional data (joysticks) to be polled. The Opus standard allows for packets up to 60ms long.
We could receive the audio chunks from a Signal on the AudioServer instead of processing them through the Input.parse_input_event() function. The important point is that they are receive as they are created without an expectation that their timing is consistent with the Audio system.
We could implement AudioStreamMicrophone using AudioStreamGenerator in GDScript with a decent buffer size that adapted to the conditions. It would also be easy to make recordings.
Describe the project you are working on
A reliable and easy to use Voice over IP plugin https://github.com/goatchurchprime/two-voip-godot-4
Describe the problem or limitation you are having in your project
The microphone input is unreliable and sounds distorted on different platforms when it is put to use.
I have gone into the implementation in depth as discussed in my previous proposal.
TDLR:
Describe the feature / enhancement and how it helps to overcome the problem or limitation
The microphone would be input in the same way as the midi and the mouse is input, and can be handled in the same way.
The most common purpose of microphone data is for packaging and compressing into Opus packets for transmission across the network to another player. The unpack and playback stage in the other player's game has to handle the jitter and buffering of packets as well the mismatch between the clockrates on different computers. Compared to this, any jittering on the rate of input is irrelevant.
The second most common purpose of microphone data is as a trigger (shout or blowing in the mic) to set off an event.
The least common use is to feed the sound into the speakers because it causes a feedback screech.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
Basing this on the Midi Input we should control this with the functions:
Then we emit an
InputEventMicrophone
every time the AudioDriver::input_buffer has more thanchunk_size
samples in it, with the properties:These packets could be resampled and encoded into opus chunks the moment they are emitted to the
_input()
function, and not in a process loop that has to poll the buffer to check if it's got enough samples to form a chunk. The default Opus chunk size is 20ms long which is 882 samples at 44.1kHz.This interface also very useful if you want to use a voice activated sound (eg shout or blow into the microphone to fire a bullet) by monitoring the value of abs_max as you would the events from a pressure sensitive pad.
To avoid copying large chunks of data around in the InputEventMicrophone, we could instead return indexes into the
input_buffer
and copy the data to the Opus encoding buffer with a function:The code required for this implementation of this feature would be modest. It is the case that all five platforms fill the input_buffer via the AudioDriver::input_buffer_write(int32_t sample) function in chunks of varying size. At the end of each of these chunks we need to call a new function:
This would construct as many events in the buffer as necessary that matched the chunk_size, and then call
Input::get_singleton()->parse_input_event(event)
on each one to enter it into the system.If this enhancement will not be used often, can it be worked around with a few lines of script?
No
Is there a reason why this should be core and not an add-on in the asset library?
The microphone is core.
The text was updated successfully, but these errors were encountered: