Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MPEG-H support #952

Merged
merged 10 commits into from
Jun 30, 2021
Merged

Add MPEG-H support #952

merged 10 commits into from
Jun 30, 2021

Conversation

nvincen
Copy link
Contributor

@nvincen nvincen commented Jun 9, 2021

Continuation from #933, and closes #930.

I messed up my pull request.

Comment on lines 392 to 398
MHAConfiguration mhac;

//23008-3 decoder specific config
CodecConfiguration codec_configuration;

// Returns the box type of codec configuration box from audio format.
FourCC GetCodecConfigurationBoxType(FourCC format) const;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPEG-H decoders need a codec_configuration values, similar to video codecs. So I added a codec_configuration box to allow for the holding and parsing of the configuration_stream.

If there is a better way to implement, please suggest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be handled in MHAConfiguration mhac already? Isn't the mhac the codec configuration for MPEG-H?

I understand that it is a bit confusing, but the approach for audio and video are slightly different. For audio, we are listing all the possible audio decoder configurations while for video, we are using the same CodecConfiguration structure. The code should really be cleaned up.

I think we can just remove line 393 to 398. The functionality has already been implemented in mhac. Let me know if I misunderstood your intention and sorry for the confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 393 - 398

Comment on lines +1723 to +1724
RCHECK(data.size() > 1);
mpeg_h_3da_profile_level_indication = data[1];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be inside an if (buffer->Reading()) condition? i.e.

  if (buffer->Reading()) {
    RCHECK(data.size() > 1);
    mpeg_h_3da_profile_level_indication = data[1];
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is better I think, though it does not really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just left it alone.

@nvincen
Copy link
Contributor Author

nvincen commented Jun 14, 2021

@kqyang - Would you mind having another look?

@nvincen
Copy link
Contributor Author

nvincen commented Jun 22, 2021

@kqyang - Friendly ping to review the latest changes

Copy link
Contributor

@kqyang kqyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vincent. Sorry for the delay in the code review. The CL looks mostly good to me with a few more comments (with most of them as style nits).

Comment on lines 143 to 146
// The signalling of the codecs parameters is according to RFC6381 [11] and ISO/IEC 23008-3 clause 21 [7].
// The value consists of the following two parts separated by a dot:
// - the sample entry 4CC code ('mha1', 'mha2', 'mhm1', 'mhm2')
// - ‘0x’ followed by the hex value of the profile-levelid, as defined in in ISO/IEC 23008-3 [7]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nits: Keep within 80 characters per line limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed with below suggestion.

Comment on lines 149 to 152
// The signalling of the codecs parameters is according to RFC6381 [11] and ISO/IEC 23008-3 clause 21 [7].
// The value consists of the following two parts separated by a dot:
// - the sample entry 4CC code ('mha1', 'mha2', 'mhm1', 'mhm2')
// - ‘0x’ followed by the hex value of the profile-levelid, as defined in in ISO/IEC 23008-3 [7]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the duplicated comments and replace it with:

// See above.

Or even better, combine the two cases:

case kCodecMha1:
case kCodecMhm1:
  ...
  return base::StringPrintf("%s.0x%02x", FourCCToString (CodecToFourCC(codec)), audio_object_type);
FourCC CodecToFourCC(Codec codec) {
  switch (codec) {
    case kCodecMha1:
      return FOURCC_mha1;
    case kCodecMhm1:
      return FOURCC_mhm1;
    default:
      return FOURCC_NULL;
  }

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged the two cases.

Comment on lines +58 to +59
kCodecMha1,
kCodecMhm1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move them before kCodecAudioMaxPlusOne, i.e. before line 57?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +1723 to +1724
RCHECK(data.size() > 1);
mpeg_h_3da_profile_level_indication = data[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is better I think, though it does not really matter.

DECLARE_BOX_METHODS(MHAConfiguration);

std::vector<uint8_t> data;
std::uint8_t mpeg_h_3da_profile_level_indication;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace std::uint8_t with uint8_t to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 392 to 398
MHAConfiguration mhac;

//23008-3 decoder specific config
CodecConfiguration codec_configuration;

// Returns the box type of codec configuration box from audio format.
FourCC GetCodecConfigurationBoxType(FourCC format) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be handled in MHAConfiguration mhac already? Isn't the mhac the codec configuration for MPEG-H?

I understand that it is a bit confusing, but the approach for audio and video are slightly different. For audio, we are listing all the possible audio decoder configurations while for video, we are using the same CodecConfiguration structure. The code should really be cleaned up.

I think we can just remove line 393 to 398. The functionality has already been implemented in mhac. Let me know if I misunderstood your intention and sorry for the confusion.

Copy link
Contributor

@kqyang kqyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Two more style comments.

return base::StringPrintf("mhm1.0x%02x", audio_object_type);
// - ‘0x’ followed by the hex value of the profile-levelid, as defined
// in in ISO/IEC 23008-3 [7]
return base::StringPrintf("%s.0x%02x", FourCCToString(CodecToFourCC(codec)).c_str(), audio_object_type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 characters line limit.

Can you break it into three lines, e.g.

      return base::StringPrintf("%s.0x%02x",
                                FourCCToString(CodecToFourCC(codec)).c_str(),
                                audio_object_type);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I missed that one. Fixed.

Comment on lines 1240 to 1254
TEST_F(BoxDefinitionsTest, MHACSampleEntry) {
AudioSampleEntry entry;
entry.format = FOURCC_mhaC;
entry.data_reference_index = 2;
entry.channelcount = 5;
entry.samplesize = 16;
entry.samplerate = 44100;
Fill(&entry.mhac);
entry.Write(this->buffer_.get());

AudioSampleEntry entry_readback;
ASSERT_TRUE(ReadBack(&entry_readback));
ASSERT_EQ(entry, entry_readback);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can be removed as it tests the same thing as the test above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Contributor

@kqyang kqyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question: can you generate a small test content (~3 seconds) and create an end to end test, similar to https://github.com/google/shaka-packager/blob/master/packager/app/test/packager_test.py#L1022?

You can then run:

out/Release/packager_test.py PackagerFunctionalTest.testMpegHMp4 --test_update_golden_files

to update the golden files.

This step is optional but good to have. It is fine to skip it if you cannot fine or generate a copy right free content easily.

Comment on lines +90 to +92
FOURCC_mha1 = 0x6d686131,
FOURCC_mhaC = 0x6d686143,
FOURCC_mhm1 = 0x6d686d31,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not noticing it earlier. I think it has to be moved after line 94 to follow alphabetical order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arranged to alphabetical order.

@nvincen
Copy link
Contributor Author

nvincen commented Jun 30, 2021

One more question: can you generate a small test content (~3 seconds) and create an end to end test, similar to https://github.com/google/shaka-packager/blob/master/packager/app/test/packager_test.py#L1022?

You can then run:

out/Release/packager_test.py PackagerFunctionalTest.testMpegHMp4 --test_update_golden_files 

to update the golden files.

This step is optional but good to have. It is fine to skip it if you cannot fine or generate a copy right free content easily.

I agree it's good to have a small file added. I was able to find some files online and added it to the PR.

@kqyang
Copy link
Contributor

kqyang commented Jun 30, 2021

Thanks for adding the end to end test. Can you generate a smaller file, i.e. take the first three seconds only?

@nvincen
Copy link
Contributor Author

nvincen commented Jun 30, 2021

Thanks for adding the end to end test. Can you generate a smaller file, i.e. take the first three seconds only?

This was a file that I found floating around on the web. Do you know how I could trim the file to 3 seconds? I wasn't able to get ffmpeg to trim the file.

@kqyang
Copy link
Contributor

kqyang commented Jun 30, 2021

This was a file that I found floating around on the web.

Does it explicitly indicate that it can be shared freely? If not, we won't be able to use it in Shaka Packager.

Do you know how I could trim the file to 3 seconds? I wasn't able to get ffmpeg to trim the file.

Does the below command work?

ffmpeg -ss 00:00:00 -i input.mp4 -to 00:00:03 -c copy output.mp4

If not, it's fine not to having end to end test.

This reverts commit ada3493.
@nvincen
Copy link
Contributor Author

nvincen commented Jun 30, 2021

This was a file that I found floating around on the web.

Does it explicitly indicate that it can be shared freely? If not, we won't be able to use it in Shaka Packager.

Do you know how I could trim the file to 3 seconds? I wasn't able to get ffmpeg to trim the file.

Does the below command work?

ffmpeg -ss 00:00:00 -i input.mp4 -to 00:00:03 -c copy output.mp4

If not, it's fine not to having end to end test.

It looks like it's copyrighted actually:

bash-4.2# ffmpeg -i /media/tmp/mha1.mp4
ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (GCC)
  configuration: --prefix=/local/p4clients/pkgbuild-U9Jv8/workspace/src/FFmpeg/build/private/install --disable-static --enable-shared --enable-encoder=vorbis --enable-libvorbis --enable-libmp3lame --enable-libvpx --enable-libaom --enable-libfdk-aac --enable-libopus --enable-openssl
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x5555557c2c40] stream 0, timescale not set
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x5555557c2c40] Could not find codec parameters for stream 0 (Audio: none (mha1 / 0x3161686D), 48000 Hz, 0 channels, 666 kb/s): unknown codec
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/media/tmp/mha1.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42isom
    creation_time   : 2020-08-05T18:08:52.000000Z
    album_artist    : Pharrell Williams
    track           : 1
    date            : 2020
    disc            : 1/1
    copyright       : (P) 2013 Back Lot Music, under exclusive license to Columbia Records, a Division of Sony Music Entertainment
    artist          : Pharrell Williams
    title           : Happy
    composer        : Pharrell Williams
    album           : Pharrell Williams 360 Reality Audio [360 Reality Audio MPEG-H Tidal]
  Duration: 00:03:54.36, start: 0.000000, bitrate: 680 kb/s
    Stream #0:0(und): Audio: none (mha1 / 0x3161686D), 48000 Hz, 0 channels, 666 kb/s (default)
    Metadata:
      creation_time   : 2020-08-05T18:08:52.000000Z
      handler_name    : soun
    Stream #0:1: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 1280x1280 [SAR 300:300 DAR 1:1], 90k tbr, 90k tbn, 90k tbc (attached pic)
At least one output file must be specified

My last commit reverts the E2E test.

@kqyang kqyang merged commit f018c9a into shaka-project:master Jun 30, 2021
@kqyang
Copy link
Contributor

kqyang commented Jun 30, 2021

@nvincen I have just merged the change. Thanks for working on this feature and supporting Shaka Packager project!

@ronak2121
Copy link

@kqyang is there any ETA on when this support will be available in a release?

@kqyang
Copy link
Contributor

kqyang commented Aug 31, 2021

@joeyparrish Do we have plans to release v2.6 soon?

@ronak2121
Copy link

Hi @joeyparrish any update on when this support would be in a release?

@joeyparrish
Copy link
Member

I need to check that our new low-latency feature is ready to ship. Assuming it is, I could release v2.6.0 within the next week. Monday is a holiday in the US, so possibly Tuesday if LL is ready.

@joeyparrish
Copy link
Member

Looks like it's ready. I'm going to start the release process.

@joeyparrish
Copy link
Member

@ronak2121
Copy link

Awesome thank you!

@github-actions github-actions bot added the status: archived Archived and locked; will not be updated label Jul 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: archived Archived and locked; will not be updated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for MHA1 audio on packager
4 participants