Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to provide a list of inputs from a file or stdin #662

Open
polarathene opened this issue Jun 24, 2022 · 9 comments
Open

Option to provide a list of inputs from a file or stdin #662

polarathene opened this issue Jun 24, 2022 · 9 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed workaround

Comments

@polarathene
Copy link
Contributor

While it's likely a niche case, if providing many explicit inputs to lychee, as file paths and URLs can be lengthy, I am a bit concerned about exceeding the character limit (at least I think there is one for some shells with command args).

It may be worthwhile having an option to provide inputs via a line delimited list instead? (--inputs-from or similar?)


My use-case was with a generated list of files to check (output from git diff-tree), which was simple enough to get, but slightly problematic providing as a list of args to lychee via a shell this way (especially with edge cases like file paths with spaces). I ended up using xargs for the first time to handle this, but found it required -d '\n' to work correctly.

I'm not super great with shell commands/scripts, and I struggled a bit with other approaches before finding the simple solution with xargs. Being able to read a list of lines from a file as inputs, via a supported option for lychee would prevent others like myself from stumbling about in bash/shell 😅


It's totally fine if you'd rather hold off on such an option until there's enough demand, or someone were to open a PR for it 👍

For anyone else that lands here, you can use whatever command to get your list of inputs (1 per line), and provide them to lychee with xargs like this: your-command | xargs -d '\n' lychee --, that simple 😄 (quotes around \n is important)

More info

your-command examples:

  • echo -e 'path/to/file\npath/to/another file.md' (more useful in scripts or testing)
  • cat path/to/input-list-file (if you stored your input list to a file)
  • git diff-tree --no-commit-id --name-only --diff-filter 'AM' -r --merge-base master remote/branch-name (generated list output from a binary or script of your own)

These work with xargs, but prior to getting a solution with xargs I had tried approaches with echo/printf and such but I had problems especially when testing with file paths that had spaces:

  • xargs -d '\n' lychee -- < <(echo -e 'path/to/file\npath/to/another file.md')
  • xargs -d '\n' lychee -- <<< $(echo -e 'path/to/file\npath/to/another file.md')

Instead of piping output of your-command to stdin, you can use < (redirection) / <<< (here-string), or <(your-command; another-command) which is process substitution to provide a temporary pseudo-file stream, which would be useful when a file is expected I think.

Within a script there's other options like expanding an array of path values lychee -- "${FILE_PATHS[@]}", which should keep it all intact if some paths have spaces, provided the array was constructed correctly (another pain point I've had with bash with other projects).

@mre
Copy link
Member

mre commented Jun 29, 2022

As you pointed out, xargs is a common way to read inputs from a file. Let's see if there's more interest in adding an --input-from argument. We can add it once we get a few 👍 / comments.

@mre
Copy link
Member

mre commented Jul 1, 2022

Relevant discussion over on ripgrep: BurntSushi/ripgrep#273

@mre mre changed the title [Feat]: Option to provide a list of inputs from a file or stdin Option to provide a list of inputs from a file or stdin Jul 1, 2022
@polarathene
Copy link
Contributor Author

polarathene commented Jul 1, 2022

Relevant discussion over on ripgrep: BurntSushi/ripgrep#273

Worth noting is the common --files-from option used across mentioned programs in that discussion. If lychee adds support, it might make sense to use that, or as other input types are supported: --inputs-from (plural).


On supported platforms (assuming this option was added too), one could still provide a generated file-list without writing a file explicitly:

lychee --inputs-from <(git diff-tree --name-only --diff-filter 'AM' -r --merge-base base-branch-name pr-branch-name) --

Which as my earlier "More info" section notes, is process substitution to provide a temporary pseudo-file stream.

@sschuberth
Copy link

While it's likely a niche case, if providing many explicit inputs to lychee, as file paths and URLs can be lengthy

I wouldn't say it's that nichey at all. A rather common case should be to check files in your local Git working tree. As the working tree may contain a lot of generated / build files that cannot be easily excluded by path glob, you probably only want to check those files that are committed to Git, where such an option becomes handy.

Otherwise lychee also checks all of the generate files, which in my case contained a lot of links to GitHub and triggered a rate limit, locking me out of GitHub for a few minutes 😒

@lebensterben
Copy link
Member

instead we may invest more in integrating with version control system, such as ignoring files excluded by vcs by default and add an CLI option to only scan files managed by vcs.

@sschuberth
Copy link

instead we may invest more in integrating with version control system

I was thinking about that, too, but that's less generic, and then people will show up to get Mercurial, Subverion, you-name-it supported, too...

@lebensterben
Copy link
Member

by vcs I meant all of them. but we don't have to implement all of them at once.

@mre
Copy link
Member

mre commented Jan 29, 2024

After re-reading the comments here and over at ripgrep, I think we should go with --files-from as the name for this option as @polarathene suggested, because file, rsync, and ack also use --files-from.

I will mark this as ready to be worked on.

  • Add --files-from argument, which takes a list of input files to be checked.

@mre mre added good first issue Good for newcomers help wanted Extra attention is needed and removed request-for-comments labels Jan 29, 2024
@mre
Copy link
Member

mre commented Jan 6, 2025

In the meantime, @thomas-zahner implemented gitignore support in #1500, which might be helpful for people running into this.

@mre mre added the workaround label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed workaround
Projects
None yet
Development

No branches or pull requests

4 participants