Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make process utils more generic #779

Merged
merged 2 commits into from
Nov 18, 2021
Merged

Conversation

th1000s
Copy link
Collaborator

@th1000s th1000s commented Nov 16, 2021

One of the last changes to the utils PR was to make it less generic, because what apart from git blame is going to use it - but then you started git grep support, so here is the regenericization.

None => break result,
Some("--") => {
result.extend(args_it);
break result;
Copy link
Owner

@dandavison dandavison Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest: return result would be equivalent behavior, right? Do you prefer break result because it's working with the expression-oriented language?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, also it does not evade a dbg!( { .. } ) by returning from inside it. And here we are well past the early return phase.

match (it.next(), it.next()) {
// git blame or git -C/-c etc. and then (maybe) blame
(Some(git), Some(blame))
if git.contains("git") && (blame == "blame" || blame.starts_with('-')) => {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the thinking behind contains rather than ==?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue that we need to match /usr/local/bin/git? (And what will the path look like on Windows?) If so maybe it's worth parsing the string as a Path and extracting the file name, since git is a short word that could generate false positive matches?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blame == "blame" || blame.starts_with('-'))

I was failing to understand this function yesterday -- if blame.starts_with('-') then when do we get round to checking that the string blame does occur at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to match git.exe and other paths, but this would also pick up magit but fail on Git.exe - something Path based and removing the extension would be better. In the current patch however I completely ignore it (_git).

Regarding blame, I now parse everything, so it should always be present in the second position.

fn calling_process_cmdline<F, T>(extract_args: F) -> Option<T>
where
F: Fn(&[String]) -> ProcessArgs<T>,
{
let mut info = sysinfo::System::new();
let my_pid = std::process::id() as Pid;

// 1) Try the parent process. If delta is set as the pager in git, then git is the parent process.
let parent = parent_process(&mut info, my_pid)?;

Copy link
Owner

@dandavison dandavison Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to write tests with the ability to control what appears to be the parent command. What do you think about refactoring something like the following?

#[cfg(not(test))]
fn get_parent_process_command() -> Option<Vec<String>> {
    let mut info = sysinfo::System::new();
    let my_pid = std::process::id() as Pid;
    let parent = parent_process(&mut info, my_pid)?;
    Some(parent.cmd().iter().map(|s| s.to_owned()).collect())
}

#[cfg(test)]
fn get_parent_process_command() -> Option<Vec<String>> {
    // TODO
    // If env var DELTA_TEST_FAKE_PARENT_PROCESS_COMMAND is set, then parse that.
}

I'm not familiar with testing/patching/DI techniques in Rust -- is there a better way to do this than to use an environment variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar either, and I don't want to pick one now as they all seem quite intrusive and would also somehow have to wrap the external sysinfo traits.

I added a ProcessInterface trait around sysinfo::System with a real and a mock implementation. Nothing is exported, so this only helps with local unit tests. The mock one also does not record how it is called, so only the final result can be checked, even if the wrong code path was used to create it.

Let me know what you think!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems great for unit testing. I do have one comment: I think it doesn't provide a way to do the style of "end-to-end" tests that we have e.g. in test_example_diffs.rs. My env var hack would permit those I believe, so might be tempting to add that, or a better alternative if we have one? I'd quite like to input some example git grep -p and git grep -W output lines, make the parent process command appear to match, call integration_test_utils::run_delta, and check that delta has formatted the grep output accordingly (with styled context header in the former case, without in the latter).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added something less intrusive than process-global env vars in #783 (docs and names are pending), let me know if this can be used for the integration tests as you envision.

In hindsight, ProcessInterface was overkill since it is only good for testing the internal process model itself. Still, better than manually testing once and then letting it rot away.

@dandavison dandavison mentioned this pull request Nov 17, 2021
@dandavison
Copy link
Owner

This looks excellent. I've rebased my grep branch on top of it and made a start on the required git grep options utility, and will look at the testing approach you're suggesting. I think I'm seeing that the pipe detection tends not to succeed with the grep calls I was trying.

@dandavison dandavison merged commit 57325f9 into dandavison:master Nov 18, 2021
@dandavison
Copy link
Owner

Merged.

I think I'm seeing that the pipe detection tends not to succeed with the grep calls I was trying.

That was entirely incorrect, I just hadn't finished implementing the function according to the pattern you'd set out. Here's what I've now added for querying grep commands: 914a5a8

@dandavison
Copy link
Owner

I think this module will be very useful. For example, we could (should!) use it to detect git show $commit:/path/to/file.ext and syntax highlight the code accordingly.

@dandavison
Copy link
Owner

dandavison commented Nov 18, 2021

It's seeming that it would be useful to be able to ask about

  • The executable name
  • The name of the subcommand in effect if any
  • The options passed to the subcommand (the "keys" at least, if not the "values")
  • The positional arguments (typically file paths)

Shall we aim to have something return a struct with these entries? In addition to git, it might be helpful to know when we have output from rg or grep etc. But it's mainly parsing git command lines that's the issue.

I think there's some ambiguity between option "values" and positionals (we can't rely on the user to use --).

Would it be helpful to consider specifying a grammar more formally for the command lines that we are trying to parse? I'm guessing that we can't use clap because it would fail to parse unless we correctly specified a grammar for the entire git command line API. (And I don't think someone has implemented that for us in one of the git-in-rust projects?) But a simpler, more permissive grammar?

That might be overengineering though, perhaps we almost have what we need already.

@th1000s
Copy link
Collaborator Author

th1000s commented Nov 18, 2021

pipe detection

This will always remain a corner case and people will wonder "why did the syntax highlighting break". The detection will work the first time because nothing from disk is cached and so git is slow, but the second time it is faster and the file type is now missing. Something like libmagic would be a good fallback. However, currently...

$ file src/main.rs
src/main.rs: C source, ASCII text

Maybe it is also possible to wrap the calling git process to keep the pipe open for at least n seconds regardless of the number of written bytes. Shell support to query the piping process would be the ideal solution of course.

@dandavison
Copy link
Owner

Just to check I'm understanding, in git | delta is the issue that the OS creates a buffer associated with the pipe and, if git's output is small enough, it's possible for git to send all its output into the buffer and exit, before delta has finished (or even started) reading from the pipe?

@dandavison
Copy link
Owner

In #774 I've further modified utils/process.rs so that it attempts to classify the parent process as Option<GrepCommand>. I believe that gives us the parsing functionality we need for now; we'll need to evolve a little more to detect git show commit:./file.ext.

pub enum GrepCommand {
    GitGrep((HashSet<String>, HashSet<String>)),  // long and short option keys
    OtherGrep, // rg, grep, ag, ack, etc
}

@th1000s th1000s mentioned this pull request Nov 21, 2021
@th1000s
Copy link
Collaborator Author

th1000s commented Nov 27, 2021

Indeed, your understanding of the git | delta pipe issue is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants