Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Add audio and subtitle stream parsing #50

Conversation

PanCakeConnaisseur
Copy link
Contributor

@PanCakeConnaisseur PanCakeConnaisseur commented Oct 15, 2024

  • Refactor code into common processing for all streams and spezialized functions for audio and video.
  • Update code examples
  • Change line breaks in lib.rs to LF
  • Add handling of streams that contain additional identifiers in square
    brackets such as Stream #0:1[0x2](eng):. These are found in mov
    containers.

Partly discussed in #49

@PanCakeConnaisseur
Copy link
Contributor Author

PanCakeConnaisseur commented Oct 15, 2024

@nathanbabcock: After almost finishing the variant where there was one event and separate struct for each parsed event type (audio, video, ...) I gave the enum-based solution another go. That variant is the subject of this PR. Although I have to match on a generic ParsedInputStream event and then check whether it is video event (see updated README.md), the rest of the code is cleaner and (negligibly) faster. Therefore, I think that the enum-based solution is better after all. What do you think? Feel free to critique away.

You can see the "original" variant in a branch in my fork. Its almost complete except changes in the documentation and some small improvements.

P.S. Note, that this is a breaking change, since I moved some of the fields, such as pix_fmt, into video-specific structs.

P.P.S. I saw that you use the usize type in several places. I adopted that, to be coherent with existing code. Is there a reason why you used usize instead of u32? AFAIK, the latter is more idiomatic and suited for things that are just numbers and don't represent memory addresses. Since my PR introduces a breaking change already, this might be a good time to replace everything with u32. This would make using structs from ffmpeg_sidecar in client code easier, since it would avoid a cast.

* Refactor code into common processing for all streams and spezialized
  functions for audio and video.
* Update code examples
* Change line breaks in lib.rs to LF
* Add handling of streams that contain additional identifiers in square
  brackets such as `Stream #0:1[0x2](eng):`. These are found in mov
  containers.
@PanCakeConnaisseur PanCakeConnaisseur force-pushed the read-metadata-from-audio-streams-with-enums branch from 5828fed to 3e3d3db Compare October 15, 2024 01:13
Copy link
Owner

@nathanbabcock nathanbabcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the solution you landed on for this. The helper methods with matches! go a long way in making it easy to use. I will leave a few small comments that are mostly stylistic, but other than that I think it is in great shape if you're happy with it as well.

Good point about usize. I think the original intuition was that fields such as parent_index and stream_index would primarily be used with bracket syntax to index into e.g. FfmpegMetadata.outputs[parent_index] and seems to require a usize in order to do that. Although funnily enough the FfmpegDuration struct has input_index: u32 instead which has to be manually cast to usize elsewhere... 🤦 So clearly some standardization would be helpful. Personally I would go with usize for indices and u32 for everything else (including the new sample_rate), but I could be convinced otherwise.

Regardless this is looking good to merge into main. After that we can collect a few other breaking changes (including removing some deprecated functions from a recent PR), and then release a new major version once everything is sorted out!

src/event.rs Outdated Show resolved Hide resolved
src/event.rs Outdated
}
pub fn video_data(&self) -> VideoStream {
if let TypeSpecificData::Video(video_stream) = &self.type_specific_data {
video_stream.clone()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit and works just fine either way, but perhaps we can also return a borrowed &VideoStream (or Option<&VideoStream>) and let the user decide whether they want to clone(). Again taking inspiration from serde_json which returns a borrowed value.

Copy link
Contributor Author

@PanCakeConnaisseur PanCakeConnaisseur Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here, too. This is more in line with Rust's philosophy of not copying implicitly. Thanks for the feedback. I created an additional commit.

You can squash on merge, right? I have been using mostly Gitlab for the past few years, so I don't know where to activate it in Github and whether only you can activate it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the squash & merge button is embedded directly into the merge button itself, which I think only maintainers can trigger.

image

@PanCakeConnaisseur
Copy link
Contributor Author

Personally I would go with usize for indices and u32 for everything else (including the new sample_rate), but I could be convinced otherwise.

AFAIK, usize is used in pointer-related scenarios which are more low-level. I don't think indices are the same category - they do represent an offset in memory to get to the stored element but not the actual memory address. I would prefer u32 for everything, because this uniformity would make things simpler an I don't see an advantage of using usize.

parent_index...input_index

I just noticed that parent_index is actually the input index, isn't it? Then, I would also suggest to rename parent_index to input_index everywhere for uniformity. AFAIK, ffmpeg only uses the word input and input_index would be easier to understand IMHO.

@PanCakeConnaisseur
Copy link
Contributor Author

Some other minor things that I thought about:

  1. What do you think about video_metadata() instead if video_data(). The former is more descriptive, the latter could be misunderstood to return the actual video data. I chose the latter, because its shorter and "clear enough in the context" but I might be wrong.
  2. I am not super happy with the name TypeSpecificData but couldn't come up with something better.

@nathanbabcock
Copy link
Owner

  1. I'm fine with u32 instead of usize across the board. On a related note maybe it's better to prefer vec.get(i) over vec[i] which seems to do an implicit unwrap().
  2. input_index would be a good name instead of parent_index but this number could correspond to either an input OR an output (depending on which list/event it's contained inside). The umbrella term "parent" isn't ideal but I can't think of what else to name it.
  3. I'm agnostic as to video_metadata() vs video_data(), I think both are clear enough in context.
  4. Agreed that TypeSpecificData sounds a bit too vague for a public export. I think all it needs is the word Stream in there somewhere. Maybe StreamTypeData or StreamTypeMetadata, something like that.

The values in stream metadata structs are not related to memory or array
indexing and should therefore use "number" types.
@PanCakeConnaisseur
Copy link
Contributor Author

u32, usize, vec[i]

I dove deeper into this subject and it is more involved than I thought. One of the realizations I had, is that len() always returns a usize and that accessing Vecs using indices seems to not be idiomatic in Rust. Using iterators would prevent us dealing with those types entirely. I added a new commit that removes the "public" uses of usize without touching the internals much. This way we can introduce the breaking change now and change the internals later if needed/wanted.

input_index vs parent_index

You're right. I had forgotten about outputs. I then considered file_index but since streams can stem from stdout and stdin, this would not fit either. Let's leave it as-is then.

video_metadata() vs video_data()

I realized that the entire Stream struct is actually StreamMetadata and AudioStream is AudioStreamMetadata, etc. I am not sure, that you would want to make these changes. After consideration, I would be fine to leave them as-is, even if its not a super accurate name. Also, Stream is located in event.rs, so you could say that this is a stream event struct and then its not about the actual stream data anymore. Coming up with a perfect taxonomy is probably a rabbit hole which is not worth descending into.

TypeSpecificData

I would prefer not to loose "specific" since it communicates that the attribute of this type contains specialized data only available for certain types of streams. What do you think about StreamTypeSpecificData. Its quite a handful, so maybe its also best to just leave this name as-is.

I don't have anything else regarding this PR, unless you want to discuss the naming further. From my side, we can merge.

Copy link
Owner

@nathanbabcock nathanbabcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed one more commit to go with StreamTypeSpecificData. Since the helper methods cover 99% of use cases, a user would rarely need to import or reference the name of this enum directly, so I think a lengthy extra-descriptive name is fine here!

Thanks for contributing this, and especially for bouncing ideas back and forth. The iteration was super helpful.

I will merge to main for now and move towards a new version release relatively soon, this weekend or maybe next.

@nathanbabcock nathanbabcock merged commit 4e88950 into nathanbabcock:main Oct 18, 2024
3 checks passed
@PanCakeConnaisseur
Copy link
Contributor Author

Thank you for the valuable feedback, too.

@PanCakeConnaisseur PanCakeConnaisseur deleted the read-metadata-from-audio-streams-with-enums branch October 18, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants