Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Correctly detect format when using md5sum with check flag #4645

Merged
merged 9 commits into from
May 27, 2023
Merged

fix: Correctly detect format when using md5sum with check flag #4645

merged 9 commits into from
May 27, 2023

Conversation

kamilogorek
Copy link
Contributor

The first contribution to this repository, so hello there!

Looking at how sparse /tests/by-util/test_hashsum.rs tests are, I wasn't sure if I should add more of them there, or do you rely solely on GNU compatibility tests in this case.

Fixes #4611

The main issue wasn't not handling check flag, but rather regexp being too greedy and treating , b and *c as binary|text markers, where first entry of a should dictate that the format is BSD-reverse, which should skip reading the marker entirely for the remaining of the file.

References:

Binary mode is the default on systems where it’s significant, otherwise text mode is the default.

@sylvestre
Copy link
Contributor

hi
Could you please add some tests ? thanks

@kamilogorek
Copy link
Contributor Author

I wasn't sure if I should add more of them there, or do you rely solely on GNU compatibility tests in this case.

I'll read it as yes then :) Sure thing, added.

@sylvestre
Copy link
Contributor

Windows is failing with:

---- test_hashsum::test_check_md5sum_reverse_bsd stdout ----
write(default): C:\Users\RUNNER~1\AppData\Local\Temp\.tmptXOCuh\a
write(default): C:\Users\RUNNER~1\AppData\Local\Temp\.tmptXOCuh\ b
write(default): C:\Users\RUNNER~1\AppData\Local\Temp\.tmptXOCuh\*c
thread 'test_hashsum::test_check_md5sum_reverse_bsd' panicked at 'Couldn't write *c: The filename, directory name, or volume label syntax is incorrect. (os error 123)', tests\common\util.rs:826:33

@kamilogorek
Copy link
Contributor Author

Hmm, * is a reserved character in Windows. Do you already handle similar cases in another tool in the codebase, so I could see what's the chosen approach?

@uutils uutils deleted a comment from github-actions bot Mar 28, 2023
@uutils uutils deleted a comment from github-actions bot Mar 29, 2023
@sylvestre
Copy link
Contributor

Hmm, * is a reserved character in Windows. Do you already handle similar cases in another tool in the codebase, so I could see what's the chosen approach?

you should look at other examples :)

@kamilogorek
Copy link
Contributor Author

I tried with no luck, thus my question. I'm not very familiar with the historical choices of this repository just yet.
I checked gnuwin32, and they do not perform any other path manipulations other than translating \n and \\.
Also, note to self: default for gnuwin32 is binary, not text.

@sylvestre
Copy link
Contributor

ok, just rename the file from the example. It is still dumb on linux to have * in a file name :)

@kamilogorek
Copy link
Contributor Author

It is possible though, so used conditional assertions, just to be inline with gnu tests on unix

@uutils uutils deleted a comment from github-actions bot Apr 3, 2023
@uutils uutils deleted a comment from github-actions bot Apr 3, 2023
@sylvestre
Copy link
Contributor

We have some differences with GNU:

2023-03-29T17:41:40.2037347Z FAIL: tests/misc/md5sum-bsd
2023-03-29T17:41:40.2037437Z ===========================
2023-03-29T17:41:40.2037447Z 
2023-03-29T17:41:40.2037508Z a: OK
2023-03-29T17:41:40.2037588Z  b: OK
2023-03-29T17:41:40.2037666Z *c: OK
2023-03-29T17:41:40.2037745Z dd: OK
2023-03-29T17:41:40.2037827Z  : OK
2023-03-29T17:41:40.2037905Z a: OK
2023-03-29T17:41:40.2037982Z  b: OK
2023-03-29T17:41:40.2038042Z *c: OK
2023-03-29T17:41:40.2038117Z dd: OK
2023-03-29T17:41:40.2038193Z  : OK
2023-03-29T17:41:40.2038270Z a: OK
2023-03-29T17:41:40.2038347Z  b: OK
2023-03-29T17:41:40.2038523Z *c: OK
2023-03-29T17:41:40.2038583Z dd: OK
2023-03-29T17:41:40.2038660Z  : OK
2023-03-29T17:41:40.2038883Z --- check2.exp	2023-03-29 17:16:24.048190389 +0000
2023-03-29T17:41:40.2039085Z +++ check2.err	2023-03-29 17:16:24.048190389 +0000
2023-03-29T17:41:40.2039207Z @@ -1 +1 @@
2023-03-29T17:41:40.2039424Z -md5sum: WARNING: 1 line is improperly formatted
2023-03-29T17:41:40.2039566Z +md5sum: warning: 1 line is improperly formatted
2023-03-29T17:41:40.2039667Z md5sum: b: No such file or directory
2023-03-29T17:41:40.2039766Z b: FAILED open or read
2023-03-29T17:41:40.2039887Z md5sum: c: No such file or directory
2023-03-29T17:41:40.2039983Z c: FAILED open or read
2023-03-29T17:41:40.2040294Z MD5 (/dev/null) = d41d8cd98f00b204e9800998ecf8427e
2023-03-29T17:41:40.2040381Z a: OK
2023-03-29T17:41:40.2040444Z  b: OK
2023-03-29T17:41:40.2040523Z *c: OK
2023-03-29T17:41:40.2040603Z dd: OK
2023-03-29T17:41:40.2040681Z  : OK
2023-03-29T17:41:40.2040763Z a\b: OK
2023-03-29T17:41:40.2040842Z a\: OK
2023-03-29T17:41:40.2041046Z --- exp	2023-03-29 17:16:24.084190638 +0000
2023-03-29T17:41:40.2041216Z +++ out	2023-03-29 17:16:24.084190638 +0000
2023-03-29T17:41:40.2041344Z @@ -1 +1,2 @@
2023-03-29T17:41:40.2041778Z -\MD5 (test\n\\\\file) = d41d8cd98f00b204e9800998ecf8427e
2023-03-29T17:41:40.2042036Z +MD5 (test
2023-03-29T17:41:40.2042329Z +\\file) = d41d8cd98f00b204e9800998ecf8427e
2023-03-29T17:41:40.2042541Z FAIL tests/misc/md5sum-bsd.sh (exit status: 1)

@sylvestre
Copy link
Contributor

@kamilogorek ping ? :)

@kamilogorek
Copy link
Contributor Author

Ah sorry, totally missed your comment, thanks for the ping.

Fixed 2nd issue (debug representation is escaped for display() - https://doc.rust-lang.org/std/path/struct.Path.html#method.display), however the WARNING: capitalization is set globally via show_warning!, so not sure if it's worth the change really.

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/tail-2/inotify-dir-recreate is no longer failing!
GNU test failed: tests/misc/sha1sum-vec. tests/misc/sha1sum-vec is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha224sum. tests/misc/sha224sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha256sum. tests/misc/sha256sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha384sum. tests/misc/sha384sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha512sum. tests/misc/sha512sum is passing on 'main'. Maybe you have to rebase?

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/tail-2/inotify-dir-recreate is no longer failing!
GNU test failed: tests/misc/sha1sum-vec. tests/misc/sha1sum-vec is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha224sum. tests/misc/sha224sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha256sum. tests/misc/sha256sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha384sum. tests/misc/sha384sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/misc/sha512sum. tests/misc/sha512sum is passing on 'main'. Maybe you have to rebase?

// BSD reversed mode format is similar to the default mode, but doesn’t use a character to distinguish binary and text modes.
let mut bsd_reversed = None;

fn gnu_re_template(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add a rustdoc comment here? thanks

caps.name("binary").unwrap().as_str() == "*",
),
Some(caps) => {
// We set the format during first pass, as mixing of formats is disallowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this into a function like

fn handle_captures(caps: Captures, bytes_marker: &str, bsd_reversed: &mut Option<bool>) -> Result<(String, String, bool), HashsumError> {
    if bsd_reversed.is_none() {
        let is_bsd_reversed = caps.name("binary").is_none();
        let format_marker = if is_bsd_reversed {
            ""
        } else {
            r"(?P<binary>[ \*])"
        }
        .to_string();

        *bsd_reversed = Some(is_bsd_reversed);
        gnu_re = gnu_re_template(&bytes_marker, &format_marker)?;
    }

    Ok((
        caps.name("fileName").unwrap().as_str().to_string(),
        caps.name("digest").unwrap().as_str().to_ascii_lowercase(),
        if *bsd_reversed == Some(false) {
            caps.name("binary").unwrap().as_str() == "*"
        } else {
            false
        },
    ))
}

and call it with
Some(caps) => handle_captures(caps, &bytes_marker, &mut bsd_reversed)?,

@@ -629,6 +634,36 @@ where
))
.map_err(|_| HashsumError::InvalidRegex)?;

fn handle_captures(
caps: Captures,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please fix the clippy warnings ? (sorry)

@kamilogorek
Copy link
Contributor Author

Ah, forgot that warnings are failures in CI and missed that locally. Updated

@sylvestre
Copy link
Contributor

thanks :)

@uutils uutils deleted a comment from github-actions bot May 27, 2023
@uutils uutils deleted a comment from github-actions bot May 27, 2023
@sylvestre sylvestre merged commit 0743464 into uutils:main May 27, 2023
@kamilogorek kamilogorek deleted the md5sum-check-format branch May 28, 2023 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

md5sum doesn't handle -c
2 participants