Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for glob patterns in quotes #536

Merged
merged 8 commits into from
Jul 31, 2023

Conversation

SalOne22
Copy link
Contributor

Added support for glob patterns in windows paths with spaces that surrounded with quotes

This pull request can be closed if you decided to not change the input files behavior. Because this can be less robust than wild crate. In wild special characters in quotes ("*") are intentionally not expanded.

closes #373

Copy link
Collaborator

@andrews05 andrews05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that windows neither handles this automatically nor even has a "standard" way of dealing with it, but I think this is a good change as it seems glob is more widely used than wild.

Question: Is it still possible to provide a path containing a literal *?

Cargo.toml Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
@SalOne22
Copy link
Contributor Author

Handling * on linux is easy, just need to escape that character with \, but on windows is impossible to escape *, and impossible to name a file with *.

@SalOne22 SalOne22 requested a review from andrews05 July 12, 2023 09:10
@AlexTMjugador
Copy link
Collaborator

AlexTMjugador commented Jul 12, 2023

I believe we need to think more about how to properly fix this.

The wild crate documentation has this remark about how trivially expanding each argument one by one might introduce issues with quoting:

It is more robust than using glob() on values from std::env::args(), because this crate is aware of argument quoting, and special characteres in quotes ("*") are intentionally not expanded.

clap may take care of such quotation shenanigans already, but it may not.

[...] on windows is impossible to escape *, and impossible to name a file with *.

Actually, it is possible to name a file * on Windows if the Win32 file namespaces are used, which is something that most applications and users aren't aware of. For example, Java programs can create files with names that make the Windows Explorer choke. And NTFS itself supports those file names without issues: it is possible to mount a NTFS block device on Linux and do touch '*' on it to create such a file.

[...] it seems glob is more widely used than wild.

wild depends on glob, so I'd argue that its fitness for purpose is not changed by putting glob into the equation: it was already there.

How do built-in Windows commands handle these quotations and glob expansion? Given that Windows is being awkward here as it usually tends to be, I think that's the closest thing to a "source of truth" we have for OxiPNG.

@SalOne22
Copy link
Contributor Author

Here is comparison of different programs (cp, oxipng with wild, oxipng with glob)

There is how cp command works in PowerShell:

PS C:\Users\vladd\Pictures> ls '.\Camera Roll\'


    Catalog: C:\Users\vladd\Pictures\Camera Roll


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        29.04.2023     12:00         640922 [email protected]
-a----        11.07.2023     17:31       11979363 [email protected]
-a----        29.04.2023     11:59         640922 [email protected]
-a----        11.07.2023     17:31       11979363 [email protected]
-a----        02.04.2023     20:42       15254494 felix-rottmann-n-Ky-79zeXM-unsplash.jpg
-a----        11.07.2023     17:31      132507102 felix-rottmann-n-Ky-79zeXM-unsplash.png
-a----        29.04.2023     12:00        7795303 [email protected]
-a----        11.07.2023     17:31      127038626 [email protected]
-a----        29.04.2023     11:59        7795303 [email protected]
-a----        11.07.2023     17:31      127038626 [email protected]
-a----        02.04.2023     20:42        4714469 gustavo-zambelli-SIC3k8IMhhA-unsplash.jpg
-a----        11.07.2023     17:31       52923463 gustavo-zambelli-SIC3k8IMhhA-unsplash.png


PS C:\Users\vladd\Pictures> cp '.\Camera Roll\*.png' ..\PicTest\
PS C:\Users\vladd\Pictures> ls ..\PicTest\


    Catalog: C:\Users\vladd\PicTest


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        11.07.2023     17:31       11979363 [email protected]
-a----        11.07.2023     17:31       11979363 [email protected]
-a----        11.07.2023     17:31      132507102 felix-rottmann-n-Ky-79zeXM-unsplash.png
-a----        11.07.2023     17:31      127038626 [email protected]
-a----        11.07.2023     17:31      127038626 [email protected]
-a----        11.07.2023     17:31       52923463 gustavo-zambelli-SIC3k8IMhhA-unsplash.png

this is how oxipng with wild works:

PS C:\Users\vladd\Pictures> oxipng.exe --dir ..\PicTest\ '.\Camera Roll\*.png'
error: The following required arguments were not provided:
    <files>...

USAGE:
    oxipng.exe --dir <directory> <files>...

For more information try --help
PS C:\Users\vladd\Pictures> oxipng.exe --dir ..\PicTest '.\Camera Roll\*.png'
Processing: .\Camera Roll\*.png
Failed to open file for reading

version with glob:

PS C:\Users\vladd\Documents\GitHub\oxipng> cargo r -- --dir C:\Users\vladd\PicTest\ 'C:\Users\vladd\Pictures\Camera Roll\*.png'
    Finished dev [optimized + debuginfo] target(s) in 0.08s
     Running `target\debug\oxipng.exe --dir C:\Users\vladd\PicTest\ "C:\Users\vladd\Pictures\Camera Roll\*.png"`
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
Processing: C:\Users\vladd\Pictures\Camera Roll\felix-rottmann-n-Ky-79zeXM-unsplash.png
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
Processing: C:\Users\vladd\Pictures\Camera Roll\gustavo-zambelli-SIC3k8IMhhA-unsplash.png
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
...
PS C:\Users\vladd\Documents\GitHub\oxipng> cargo r -- --dir C:\Users\vladd\PicTest 'C:\Users\vladd\Pictures\Camera Roll\*.png'
    Finished dev [optimized + debuginfo] target(s) in 0.09s
     Running `target\debug\oxipng.exe --dir C:\Users\vladd\PicTest "C:\Users\vladd\Pictures\Camera Roll\*.png"`
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
Processing: C:\Users\vladd\Pictures\Camera Roll\gustavo-zambelli-SIC3k8IMhhA-unsplash.png
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
Processing: C:\Users\vladd\Pictures\Camera Roll\felix-rottmann-n-Ky-79zeXM-unsplash.png
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
Processing: C:\Users\vladd\Pictures\Camera Roll\[email protected]
...

@AlexTMjugador
Copy link
Collaborator

Thanks, those results are interesting. However, PowerShell is a bit more sane and feels more "Unix-like" in these aspects: I wouldn't be surprised if it did glob expansion. How does the picture look like with the old cmd?

@SalOne22
Copy link
Contributor Author

Works exactly the same as on PowerShell:

C:\Users\vladd>cd Pictures

C:\Users\vladd\Pictures>copy "Camera Roll\*.png" ..\PicTest\
Camera Roll\[email protected]
Camera Roll\[email protected]
Camera Roll\felix-rottmann-n-Ky-79zeXM-unsplash.png
Camera Roll\[email protected]
Camera Roll\[email protected]
Camera Roll\gustavo-zambelli-SIC3k8IMhhA-unsplash.png
Скопировано файлов:         6.

C:\Users\vladd\Pictures>dir ..\PicTest
 Том в устройстве C не имеет метки.
 Серийный номер тома: A06D-5A44

 Содержимое папки C:\Users\vladd\PicTest

12.07.2023  17:36    <DIR>          .
12.07.2023  16:15    <DIR>          ..
11.07.2023  17:31        11 979 363 [email protected]
11.07.2023  17:31        11 979 363 [email protected]
11.07.2023  17:31       132 507 102 felix-rottmann-n-Ky-79zeXM-unsplash.png
11.07.2023  17:31       127 038 626 [email protected]
11.07.2023  17:31       127 038 626 [email protected]
11.07.2023  17:31        52 923 463 gustavo-zambelli-SIC3k8IMhhA-unsplash.png
               6 файлов    463 466 543 байт
               2 папок  21 361 975 296 байт свободно

@AlexTMjugador
Copy link
Collaborator

AlexTMjugador commented Jul 12, 2023

This is interesting. The results suggest that if native Windows commands don't allow escaping * either, then they may not work that well with files containing asterisks in their names. @kornelski, please excuse the mention, but given that you authored the wild crate and seem knowledgeable about this stuff, can you give some insight of what's the most correct thing to do here?

@andrews05
Copy link
Collaborator

andrews05 commented Jul 12, 2023

I believe we need to think more about how to properly fix this.

You're right, we do need to sure about this and I'll continue trying to research it...

The wild crate documentation has this remark about how trivially expanding each argument one by one might introduce issues with quoting:

It is more robust than using glob() on values from std::env::args(), because this crate is aware of argument quoting, and special characteres in quotes ("*") are intentionally not expanded.

I think this issue concerns using glob on other systems where globbing is already performed. Given @SalOne22 has restricted usage here to Windows only, I think the behaviour is correct (as in, as normal/expected as anything else on Windows).

src/main.rs Outdated
@@ -295,7 +295,7 @@ Heuristic filter selection strategies:
8 => BigEnt Highest Shannon entropy of bigrams
9 => Brute Smallest compressed size (slow)",
)
.get_matches_from(wild::args());
.get_matches();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just note that there's a subtle difference to be aware of here: wild::args() corresponds to env::args(), while get_matches() uses env::args_os() (corresponding to wild::args_os()). It affects what happens with paths that contain invalid UTF-8.
Not suggesting to change anything, just pointing this out.

@andrews05
Copy link
Collaborator

andrews05 commented Jul 13, 2023

Everything I can find suggests that while it is technically possible for paths containing * or ? to exist, they simply aren't valid in Windows and you shouldn't expect anything to support them. So it makes sense that they should always be treated as wildcards in the shell, regardless of quotes.

That said, I've just realised the glob function can return nothing if there were no matches, even if no wildcard was used. So if you do oxipng /non-existant/path you confusingly get no output.
@SalOne22 Sorry for changing my mind, but I think the function should be restructured something like this where it falls back to the input path on any error or empty result:

fn apply_glob_pattern(path: PathBuf) -> Vec<PathBuf> {
    let matches = path
        .to_str()
        .and_then(|pattern| glob::glob(pattern).ok())
        .map(|paths| paths.flatten().collect::<Vec<_>>());
    match matches {
        Some(paths) if !paths.is_empty() => paths,
        _ => vec![path],
    }
}

Possibly related: #170 (comment). We should make sure this is all working as expected too.

@AlexTMjugador
Copy link
Collaborator

So it makes sense that they should always be treated as wildcards in the shell, regardless of quotes.

That is a reasonable choice, in my opinion. However, in Unix-like shells, it is possible to pass "paths like"/this*.png to commands to escape spaces while allowing glob expansion in the latter part. If Windows also allows this syntax, using it would allow OxiPNG to both work with odd filenames and expand globs. Of course, this syntax is arguably strange to most people, it may not work well on Windows, and I reckon I'm biased to handle these tricky filenames properly because I mount my Windows volumes in Linux environments, but maybe that's why wild decided not to expand globs in this case.

@SalOne22
Copy link
Contributor Author

Here is some tests from windows PowerShell:

PS C:\Users\vladd\Pictures> oxipng.exe -P '.\Camera Roll\'*.png
.\Camera Roll\ is a directory, skipping
Processing: 2022-10-25_21.10.34.png
    1920x1080 pixels, PNG format
    4x8 bits/pixel, RGB + Alpha (non-interlaced)
    IDAT size = 1316423 bytes
    File size = 1316513 bytes
Trying: Entropy
Found better combination:
    zc = 11  f = Entropy   1182298 bytes
    IDAT size = 1182298 bytes (134125 bytes decrease)
    file size = 1182376 bytes (134137 bytes = 10.19% decrease)
Running in pretend mode, no output
Processing: 5120x2880.png
    5120x2880 pixels, PNG format
    3x8 bits/pixel, RGB (non-interlaced)
    IDAT size = 9497362 bytes
    File size = 9500519 bytes
Trying: None
Found better combination:
    zc = 11  f = None      5294159 bytes
    IDAT size = 5294159 bytes (4203203 bytes decrease)
    file size = 5297208 bytes (4203311 bytes = 44.24% decrease)
Running in pretend mode, no output
PS C:\Users\vladd\Pictures> oxipng.exe -P '.\Camera Roll'\*.png
.\Camera Roll is a directory, skipping
Processing: \*.png
Failed to open file for reading

This is tests from cmd:

C:\Users\vladd\Pictures>oxipng.exe -P '.\Camera Roll\'*.png
Processing: '.\Camera
Failed to open file for reading
Processing: Roll\'*.png
Failed to open file for reading

C:\Users\vladd\Pictures>oxipng.exe -P ".\Camera Roll\"*.png
Processing: .\Camera Roll"*.png
Failed to open file for reading

C:\Users\vladd\Pictures>oxipng.exe -P ".\Camera Roll"\*.png
Processing: Camera Roll\[email protected]
    4288x3216 pixels, PNG format
    4x8 bits/pixel, RGB + Alpha (non-interlaced)
    IDAT size = 11979306 bytes
    File size = 11979363 bytes
Reducing image to 3x8 bits/pixel, RGB (non-interlaced)
Trying: Entropy
Found better combination:
    zc = 11  f = Entropy   5701194 bytes
    IDAT size = 5701194 bytes (6278112 bytes decrease)
    file size = 5701251 bytes (6278112 bytes = 52.41% decrease)
Running in pretend mode, no output
Processing: Camera Roll\[email protected]
    4288x3216 pixels, PNG format
    4x8 bits/pixel, RGB + Alpha (non-interlaced)
    IDAT size = 11979306 bytes
    File size = 11979363 bytes
Reducing image to 3x8 bits/pixel, RGB (non-interlaced)

@andrews05
Copy link
Collaborator

Can you try oxipng -r -P C:\ using your branch here?

@SalOne22
Copy link
Contributor Author

PS C:\Users\vladd\Documents\GitHub\oxipng> oxipng.exe -r -P C:\

(no output)

PS C:\Users\vladd\Documents\GitHub\oxipng> cargo r -q -- -r -P C:\

(no output)

@SalOne22
Copy link
Contributor Author

Sorry for the question, but what is the --recursion flag for? We can use **/* pattern to recursively process directories. This works with wild as well.

@AlexTMjugador
Copy link
Collaborator

Sorry for the question, but what is the --recursion flag for? We can use **/* pattern to recursively process directories. This works with wild as well.

That's a good question! I think that flag can be useful for very dense directory hierarchies, where shell glob expansion would cause a too long command line parameter string.

@AlexTMjugador
Copy link
Collaborator

So if the way I see things is correct, we should choose from two approaches here:

  • Meet most people expectations and imitate what built-in Windows commands do, which always expand glob characters, at the cost of introducing problems with technically legal but practically odd filenames that contain those characters.
  • Keep OxiPNG's glob expansion behavior consistent between platforms, which implies not expanding glob characters between quotes and allows handling odd filenames, but may be a bit more awkward at first.

What do you think it's the right tradeoff here? I'd lean towards the correctness and logical predictability of keeping glob expansion behavior consistent between platforms, but UX is also an important concern.

@SalOne22
Copy link
Contributor Author

About escaping * and similar (from glob documentation):

The metacharacters ?, *, [, ] can be matched by using brackets (e.g. [?]). When a ] occurs immediately following [ or [! then it is interpreted as being part of, rather then ending, the character set, so ] and NOT ] can be matched by []] and [!]] respectively. The - character can be specified inside a character sequence pattern by placing it at the start or the end, e.g. [abc-].

@andrews05
Copy link
Collaborator

andrews05 commented Jul 13, 2023

On a Mac, consistent UX within the platform is more important to me than consistent UX across platforms. I'm bothered by applications that use non-native UIs or follow Windows conventions. A good cross-platform application is one that adapts to each platform to "fit in" appropriately.
Expectations are a little different on the command line as opposed to a GUI app, but it would still be weird as heck if you installed some command that was ported from Windows and it used \x for switches instead of -x.

Given we are currently breaking people's expectations and making it harder for them to use oxipng, I'm in favour of switching to glob. Note also that ".\Camera Roll\"*.png works in both unix shells and typical Windows commands but doesn't work with wild (only ".\Camera Roll"\*.png works), so the idea that we're currently consistent between platforms doesn't always hold true. (Even on unix the exact behaviour is determined by the shell, with many small variances)

@AlexTMjugador
Copy link
Collaborator

Alright, at the moment I'm convinced that expanding globs between quotations is the better way to go on Windows. People with advanced needs will bend over backwards with unusual filenames anyway. Let's see if Kornel or Josh have something to add to the discussion.

@andrews05
Copy link
Collaborator

@SalOne22 Could you update from master and fix merge conflicts?

@shssoichiro shssoichiro merged commit 708a019 into shssoichiro:master Jul 31, 2023
Pr0methean pushed a commit to Pr0methean/oxipng that referenced this pull request Dec 1, 2023
Added support for glob patterns in windows paths with spaces that
surrounded with quotes

closes shssoichiro#373
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows: Cannot specify infile in double quotes
4 participants