-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PlatformPlayer: Implement initial IMA ADPCM decoding support #192
Conversation
In order to decode IMA ADPCM wav files, we first read the incoming wav's header and check if the reported Audio Format is '17', indicating that it is an IMA ADPCM stream. If it is, we begin to decode its samples into signed PCM16LE while maintaining the same amount of channels, and sampling rate. Once that is done, we build a new header for the decoded stream to reflect its new format and then send the completed stream back to PlatformPlayer's wavPlayer so it can handle it like a standard wav stream. Additionally, a simple 'low-pass filter' was added to combat crackling on resulting samples, as it was really overbearing, it vastly improves quality at almost no cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I have a possible explanation for the scratchiness anyway.
{ | ||
if (inputSize % block_size == 0) | ||
{ | ||
prevSample[0] = (int) (((input[inputIndex]& 0xFF)) | ((input[inputIndex+1]) << 8) & 0xFF00); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'd take a look to see if there was anything odd that might explain the crackling. I'm not sure I understand what this is supposed to be doing. I pulled up the implementation in ffmpeg, but that didn't help either.
I'm guessing this part is supposed to the initial values from the preamble? If so, why does this look like we're in the data chunk? Last year when I was playing with it, I remember this being in its own chunk.
The code I have from last June parses the header and handles each 'chunk' according to what it is. (It's a but lazy up to the "fmt " chunk) My test ADPCM file has a "fact" chuck (4 bytes) that precedes the "data" chunk with the actual data. I can send that to you if you want so you can see how it works. I would guess that this is what you actually want to read. Oh, and you don't want to try to interpret the data chunk header as data either, which you probably are if you're assuming the data starts 48 bytes in.
Line 82, starts off like it's going to read a little endian 16-bit integer, but goes completely off the rails. It puts the first byte in the least significant position, the second byte in the most significant position, then it discards the first byte! This has the effect of just reading the second byte and multiplying it by 256. I'm not sure if this needs to be signed or not, but it should be sign-extended if that's the case.
Line 83 grabs the third byte and saves it as the previous step, which is purportedly the step index, but we don't look it up in the step table, which seems like it would be important.
Line 84 advances the input index beyond byte 4, which we haven't read. I don't know what this could be, so I'm guessing it's just padding for alignment. A comment would help. What follows should be data chunk, which is a four-byte string "data" and the chunk size, a [correction] 32bit int (little endian). After that is the actual data.
The loop at 88 is odd as well. You check to see if there's more than one channel, but assume that there are only two if that's the case. (this is probably fine in practice, but a simple loop would handle an arbitrary number of channels and eliminate the redundant code. The padding is a bit of a mystery though. I don't know if the preamble for each channel will be four bytes or three, with whatever padding at the end depending on the number of channels. (Four channels wouldn't need any). If we have a stereo file, a quick look with a hex editor will tell all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I knew i was missing something... not only is that little-endian read wrong as you pointed out, there's also quite a few mistakes early on (like not checking this value on the table as you also pointed out, not clamping the value of prevStep in case it is out of bounds, and forcefully passing the wrong data from the header to serve as BlockAlign since dataSize was the only one that resulted in any kind of playback...). BlockAlign/FrameSize/BlockSize was rather confusing due to different sources and documents relating to them with those different names despite all of them being the same thing... i'll refer to it as frameSize from now on
As for what this function is supposed to do... it's meant to be able to decode the whole ADPCM stream while still checking for framesize chunks without depending on a helper function, which is that "% block_size == 0" case on line 80. This is what toggles the function between reading the preamble (because when that 'if' becomes true it means we're at the very beginning of a framesize chunk and the preamble is right in line to be read) and decoding stereo or mono adpcm samples, which also have their own share of issues now that i gave this implementation another look after a while.
The way i have it currently set up, it only needs a few key parts from the header to work, those being number of channels and framesize. Those should theoretically be enough to decode an ADPCM stream, and after making some fixes, that does seem to be the case, as the audio is pretty much 'noiseless' (as in, sounds like ) except for the very beginning of the stream which still has a lot of noise... but more on that later.
About the skip on byte 4 (or byte 3, going from 0 to 3) of the preamble: Apparently it is a reserved byte and usually set to '0' according to MultimediaWiki. This would explain why some C implementations i've seen check if this byte is null or not as a means of ascertaining end of stream, as null evaluates to 0 there, it really does seem to serve the purpose of padding.
On the topic of stereo: True, i could use an if statement to cycle through more than two channels, but on a second though we probably won't need to go over 2 channels, as i am yet to find a j2me app that uses stereo adpcm, so i imagine anything beyond that is pretty much non-existent on the platform.
And now back to the noise i mentioned above: It really only happens on the very start of each decoded stream, and is also perfectly reproducible every time. After some back and forth i thought of 2 reasons as to why that might be happening:
1 - The predictor and step are set incorrectly on the first chunk to be decoded, although there aren't many documents that delve into this part of the decode process, the only one i could find is again from MultimediaWiki which says those are usually set as '0' at the start of the process, so in the case of Step, it's actually '7' since that's the first index on the table... and as far as i know this is already being done in the code.
2 (and the most likely reason) - I'm messing something up when passing the stream to be decoded and it's decoding data that shouldn't be decoded (as you pointed out yet again), because i'm passing the whole stream to the decoder, including the header which i get conflicting info on its actual size, some sources say it's 44-bytes long, others point to 60 bytes... This is also helped by the fact that sometimes, only when compiled and ran with Java 8, FreeJ2ME freezes when decoding an ADPCM stream, Java 9 and upwards is rock-solid, but still... there's definitely something wrong or else Java 8 would be solid as well, some out-of-bounds access might be happening in there.
In any case, there's a commit incoming with many fixes and lots of comments to help understand what's being done in there.
*/ | ||
private void buildHeader(byte[] buffer, int numChannels, int sampleRate) | ||
{ | ||
final short bitsPerSample = 16; /* 16-bit PCM */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is everything final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method only builds the wav header one time per call, so those values don't really change after their first declaration and they're never used outside of this method, which is why they're final.
Would that be a problem? Currently i'm treating those the same way as a 'const' in C... I'm open to suggestions if there's a better way to handle this task.
bah! I had a whole thing about the actual decoding! I don't know what happened to it. Well, here's the executive summary. I'm not writing that out again tonight: See the reference implementation on page 32: The bit where you have all the ifs? That's an outdated optimization. The order of those ifs matters. It's outdated, so you can just do the multiplication as it'll be faster anyway. That what ffmpeg does, as you can see on like 733 here: https://www.ffmpeg.org/doxygen/0.6/adpcm_8c-source.html Anyhow, I think the scratchiness is due to some mix of problems with the decoder and problems reading the file. |
The initial implementation had tons of issues (incorrect reading of chunks and their preambles, lack of clamping on some ima step indices just to name a few). This commit fixes some of them while also adding comments to help understand what is being done inside the decoder's class.
This is an additional pass to further cleanup and improve the code. Checking if the fourth preamble byte is '!= 0' has no use here and does more harm than good, and in cases the preamble is read one or more times, the decoder accounts for it on the resulting stream's size for better time and space efficiency by eliminating the need to copy the data into a new correctly sized byte array. IMA ADPCM's header is also fully stripped from the decoding process as well, it is only used to get information regarding format, number of channels, bit rate and so on.
Hmm... can't figure out why it still crackles a bit only on certain files. Most streams have no crackling at all with the updated code, only Asphalt 4's menu SFX shows any kind of crackling towards the end when played, but that's about all i could find that sounds a bit strange. Also, those playback repeats are definitely something to look at (on another issue perhaps), since FreeJ2ME constantly plays the same audio file two and sometimes even three times consecutively instead of a single one. |
Time for some final touches on this one. Header creation was reworked, decoding algorithm is now much better with little to no crackling even on small streams, everything that isn't going to change (and arrays that aren't going to be reassigned) are now final for better performance, and the code is a bit more readable as well.
Alright, now it's way better. No real crackling anymore even on shorter streams (at least can't hear any on Asphalt 4, 6 and Motocross Trial Extreme, which are the ones i could hear before), and the header is created and slotted at the start of the decoded stream without issue. In theory this means we are able to dump PCM and ADPCM wav files later for debugging if we really need to. Still doesn't sound exactly like J2MELoader, but that one seems to rely a lot on Android's underlying systems, and i've no idea what they do to those files under the hood, not to mention FreeJ2ME still has a ton of issues with audio playback. @recompileorg once you have time, please take a look at this and tell me what you think. Note: Yeah, still a lot of things being declared "final", as that's the only way i could find to improve performance enough to minimize freezes on some of my more limited hardware with Java 8 without incurring in sync issues by using threads... weird that those freezes only happen in Java 8 though. |
Failing to reset the stream will make FreeJ2ME load PCM files without their header, which can cause issues with media playback. The header only needs to be removed for IMA ADPCM files, as those have to be decoded into PCM.
Sorry, I just can't find the time to give it a full and proper review. It's okay to merge, but I would have liked to dig a bit deeper into it. |
No problem, been pretty short on time myself those last few months. Been planning on adding some prebuilts on my repo so that users have an easy way of downloading FreeJ2ME without manually building it by themselves, but between work, other hobbies and home affairs, i wasn't able to yet. Gonna see if i can separate some time tomorrow to sort that out and update the compatibility list while i'm at it. |
Closes #149.
Copying message from commit:
In order to decode IMA ADPCM wav files, we first read the incoming wav's header and check if the reported Audio Format is '17', indicating that it is an IMA ADPCM stream. If it is, we begin to decode its samples into signed PCM16LE while maintaining the same amount of channels, and sampling rate. Once that is done, we build a new header for the decoded stream to reflect its new format and then send the completed stream back to PlatformPlayer's wavPlayer so it can handle it like a standard wav stream.
Additionally, a simple 'low-pass filter' was added to combat crackling on resulting samples, as it was really overbearing, it vastly improves quality at almost no cost.