Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: decode function leaks memory #19

Closed
antimora opened this issue Jul 23, 2020 · 6 comments
Closed

Bug: decode function leaks memory #19

antimora opened this issue Jul 23, 2020 · 6 comments

Comments

@antimora
Copy link

antimora commented Jul 23, 2020

I am using miniaudio to decode MP3 bytes. I have a ML training task that decodes thousands of MP3 through many thousand training iterations. I have noticed the memory usage blew up and the OS started swapping.

I did a simple test and confirmed indeed miniaudio has memory leak.

from miniaudio import SampleFormat, decode

# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()

for i in range(10000):
    decoded_audio = decode(audio_bytes,
                           nchannels=1, sample_rate=16000,
                           output_format=SampleFormat.SIGNED32)

Set file: common_voice_en_20603299.zip

10000 iterations uses up 3 GB memory:


[tmp]$ /bin/time -v /usr/bin/python3 /workspaces/ml/tmp/test9.py
        Command being timed: "/usr/bin/python3 /workspaces/ml/tmp/test9.py"
        User time (seconds): 56.05
        System time (seconds): 1.52
        Percent of CPU this job got: 101%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:56.81
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3274532
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 798090
        Voluntary context switches: 30
        Involuntary context switches: 723
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I did further testing/debugging and concluded calling lib.ma_decode_memory iteratively causes memory leak.

According to :
https://github.com/dr-soft/miniaudio/blob/b80f7f949152f93a0af499b4d6d07b8e60d0e673/extras/miniaudio_split/miniaudio.h#L4120

ma_free probably needs to be called to free allocated memory like in this example: mackron/miniaudio#97 (comment)

Maybe also it is a good idea to release frames and memory allocated by ffi.new call. (see https://cffi.readthedocs.io/en/latest/ref.html#ffi-release-and-the-context-manager)

Here is code to prove repeat calls to ma_decode_memory yields memory leak.

from pathlib import Path

from miniaudio import (DecodeError, DitherMode, SampleFormat,
                       _array_proto_from_format, _width_from_format,
                       ffi, lib)

# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()

data = audio_bytes
nchannels = 1
sample_rate = 16000
dither = DitherMode.NONE
output_format = SampleFormat.SIGNED16
sample_width = _width_from_format(output_format)
samples = _array_proto_from_format(output_format)
frames = ffi.new("ma_uint64 *")
memory = ffi.new("void **")

decoder_config = lib.ma_decoder_config_init(output_format.value, nchannels, sample_rate)
decoder_config.ditherMode = dither.value

for i in range(30000):

    result = lib.ma_decode_memory(data, len(data), ffi.addressof(decoder_config), frames, memory)
    if result != lib.MA_SUCCESS:
        raise DecodeError("failed to decode data", result)
@antimora
Copy link
Author

antimora commented Jul 23, 2020

I found a work around. I am sharing for those who are in the same boat.

It appears mp3_read_f32 function of pyminiaudio correctly frees allocated memory and therefore does not suffer from the memory leak. But currently mp3_read_f32 is broken due to #18 that I reported. So here is a re-implementation for those who wish to use pyminiaudio's in memory MP3 decoding. It includes a mini test that shows there is no memory leak:

import array
from pathlib import Path

from miniaudio import DecodeError, ffi, lib

# get mp3 bytes
audio_bytes = Path('common_voice_en_20603299.mp3').read_bytes()

def mp3_read_f32(data: bytes) -> array:
    '''Reads and decodes the whole mp3 audio data. Resulting sample format is 32 bits float.'''
    config = ffi.new('drmp3_config *')
    num_frames = ffi.new('drmp3_uint64 *')
    memory = lib.drmp3_open_memory_and_read_pcm_frames_f32(data, len(data), config, num_frames, ffi.NULL)
    if not memory:
        raise DecodeError('cannot load/decode data')
    try:
        samples = array.array('f')
        buffer = ffi.buffer(memory, num_frames[0] * config.channels * 4)
        samples.frombytes(buffer)
        return samples, config.sampleRate, config.channels
    finally:
        lib.drmp3_free(memory, ffi.NULL)
        ffi.release(num_frames) # Release num_frames memory as a precaution. 


for i in range(10000):
    decoded_audio, sample_rate, channels = mp3_read_f32(audio_bytes)

Here are OS stats for executing this.

[tmp]$ /bin/time -v /usr/bin/python3 /workspaces/ml/tmp/test10.py

        Command being timed: "/usr/bin/python3 /workspaces/ml/tmp/test10.py"
        User time (seconds): 27.25
        System time (seconds): 0.64
        Percent of CPU this job got: 102%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.16
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 196120
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 30339
        Voluntary context switches: 39
        Involuntary context switches: 2387
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

The maximum memory usage is 196.12 MB.

@irmen
Copy link
Owner

irmen commented Oct 30, 2020

Could you perhaps retry with the latest version 1.36? There have been various updates to the miniaudio library the past few months
Or is it really a problem in the python wrapper module

@antimora
Copy link
Author

antimora commented Oct 31, 2020

I am using a workaround and I haven't tried with updated libraries. Were you able to recreate the bug? I have outlined all the steps to recreate it.

@irmen
Copy link
Owner

irmen commented Oct 31, 2020

not sure why you're tagging Cameron here :)

@irmen
Copy link
Owner

irmen commented Oct 31, 2020

commit 77bd203 introduces the pre-emptive releasing of those cffi resources. As #18 is now fixed, your workaround for that particular function should no longer be necessary. I've checked with 10.000 iterations and it doesn't grow memory.

I'll investigate the memory use of the decode function later.

@irmen irmen closed this as completed in 9e90dab Nov 1, 2020
@irmen
Copy link
Owner

irmen commented Nov 1, 2020

I believe this is now fixed in the new release. As are the API bugs in the mp3_load functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants