Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix asfvideo unicode handling #2581

Merged
merged 1 commit into from
Apr 17, 2023
Merged

fix asfvideo unicode handling #2581

merged 1 commit into from
Apr 17, 2023

Conversation

neheb
Copy link
Collaborator

@neheb neheb commented Apr 6, 2023

Use convertStringCharset to convert instead of reimplementing.

Some data is UTF-32 and other is UTF-16. Instead of implementing another function for Windows, convert from UCS2-LE to UTF-8 twice.

@ghost
Copy link

ghost commented Apr 6, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@benmccann
Copy link
Contributor

Error:  The output of the testcase mismatch the reference
[INFO] The output has been saved to file /home/runner/work/exiv2/exiv2/test/tmp/sample_960x540.asf.out
[INFO] simply_diff:
/home/runner/work/exiv2/exiv2/test/data/test_reference_files/sample_960x540.asf.out: 16 lines
/home/runner/work/exiv2/exiv2/test/tmp/sample_960x540.asf.out: 16 lines
The first mismatch is in line 11:
< Xmp.video.ExtendedContentDescription         XmpSeq      1  major_brand: mp42, minor_version: 0, compatible_brands: mp42mp41isomavc1, WM/EncodingSettings: Lavf57.83.100,   major_brand: mp42, minor_version: 0, compatible_brands: mp42mp41isomavc1, WM/EncodingSettings: Lavf57.83.100, 
> Xmp.video.ExtendedContentDescription         XmpSeq      1  major_brand
: mp42
, minor_version
: 0
, compatible_brands
: mp42mp41isomavc1
, WM/EncodingSettings
: Lavf57.83.100
,   major_brand
: mp42
, minor_version
: 0
, compatible_brands
: mp42mp41isomavc1
, WM/EncodingSettings
: Lavf57.83.100
, 

@neheb neheb mentioned this pull request Apr 7, 2023
@neheb
Copy link
Collaborator Author

neheb commented Apr 7, 2023

Yeah. This PR works great locally. Not sure how I can fix honestly.

@benmccann
Copy link
Contributor

It seems like it might be more expensive performance-wise to do the conversion twice?

@neheb
Copy link
Collaborator Author

neheb commented Apr 14, 2023

Right. If using iconv. No difference on Windows.

@@ -53,8 +39,11 @@ std::string readStringWcharTag(const BasicIo::UniquePtr& io, size_t length) {
Internal::enforce(length <= io->size() - io->tell(), Exiv2::ErrorCode::kerCorruptedMetadata);
DataBuf FieldBuf(length + 1);
io->readOrThrow(FieldBuf.data(), length, ErrorCode::kerFailedToReadImageData);
std::u16string wst(FieldBuf.begin(), FieldBuf.end());
return utf16ToUtf8(wst);
std::string wst(FieldBuf.begin(), FieldBuf.end() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce locally.
when opening the file exiv2/test/tmp/sample_960x540.asf.out, we can see a NULL (^@) character at the end every converted wstring
perhaps we need to not convert the last 2 bytes : std::string wst(FieldBuf.begin(), FieldBuf.end() - 2);

I did this changes locally and test passes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this will break UTF-8 conversion. Unfortunately, I'm away from my computer for ~2 months so I can't test properly. I don't think the current tests have UTF8 data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fixed now? I haven't been following what's been happening, but I see that all the checks are passing and the code changes look good to me, so I'm happy to approve this if everything's working.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well merge. If it's not working properly with UTF8 data, it's at least better than before.

@@ -53,8 +39,11 @@ std::string readStringWcharTag(const BasicIo::UniquePtr& io, size_t length) {
Internal::enforce(length <= io->size() - io->tell(), Exiv2::ErrorCode::kerCorruptedMetadata);
DataBuf FieldBuf(length + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can take advantage of this pull request to delete the useless reseved extra byte : DataBuf FieldBuf(length);
same thing in line 51

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to cause memory errors.

@neheb neheb force-pushed the utf branch 3 times, most recently from b83a3f5 to 8d496d0 Compare April 15, 2023 04:41
@codecov
Copy link

codecov bot commented Apr 15, 2023

Codecov Report

Merging #2581 (3c4d4bf) into main (2df5b59) will decrease coverage by 0.01%.
The diff coverage is 60.00%.

@@            Coverage Diff             @@
##             main    #2581      +/-   ##
==========================================
- Coverage   63.92%   63.91%   -0.01%     
==========================================
  Files         103      103              
  Lines       22311    22309       -2     
  Branches    10795    10795              
==========================================
- Hits        14262    14259       -3     
  Misses       5828     5828              
- Partials     2221     2222       +1     
Impacted Files Coverage Δ
src/helper_functions.cpp 85.71% <60.00%> (-2.93%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Use convertStringCharset to convert instead of reimplementing.

Some data is UTF-32 and other is UTF-16. Instead of implementing another
function for Windows, convert from UCS2-LE to UTF-8 twice.

Signed-off-by: Rosen Penev <[email protected]>
@neheb neheb merged commit b2cd60e into Exiv2:main Apr 17, 2023
@neheb neheb deleted the utf branch April 17, 2023 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants