Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preload is useless with tainting #262

Open
JRaspass opened this issue Feb 15, 2023 · 11 comments · May be fixed by #264
Open

Preload is useless with tainting #262

JRaspass opened this issue Feb 15, 2023 · 11 comments · May be fixed by #264

Comments

@JRaspass
Copy link
Contributor

JRaspass commented Feb 15, 2023

Given the following stripped down example:

lib/MyExpensive.pm:

package MyExpensive;

sleep 1;

1;

lib/MyPreload.pm:

package MyPreload;

use Test2::Harness::Runner::Preload;

stage DEFAULT => sub { default; preload 'MyExpensive' };

1;

t/foo.t:

use Test2::V0 -target => 'MyExpensive';

ok $ENV{T2_HARNESS_PRELOAD};

done_testing;

Running the tests with preload shows everything passes and the extra second isn't attributed to the test's "startup" cost:

$ yath test -TP MyPreload
( PASSED )  job  1    t/foo.t
(  TIME  )  job  1    Startup: 0.04258s | Events: 0.00018s | Cleanup: 0.00407s | Total: 0.04683s

However if we add a tainting shebang to the test file:

#!perl -T

use Test2::V0 -target => 'MyExpensive';

ok $ENV{T2_HARNESS_PRELOAD};

done_testing;

Then the test fails and takes longer:

$ yath test -TP MyPreload
[  FAIL  ]  job  1  + <UNNAMED ASSERTION>
(  DIAG  )  job  1    Failed test at t/foo.t line 5.
(  DIAG  )  job  1    Seeded srand with seed '20230215' from local date.
( FAILED )  job  1    t/foo.t
(  TIME  )  job  1    Startup: 1.08933s | Events: 0.00030s | Cleanup: 0.00322s | Total: 1.09285s

Now imagine that you have hundreds of tests, all with tainting enabled and lots of expensive libraries that take seconds to load, suddenly you're in my situation where $WORK's codebase takes tens of minutes to test :-(

I thought this was a regression since I'm sure this used to work but I've yet to find a version where it works correctly.

I appreciate this is hard to make work as you can only adjust the taint mode at perl startup time and the worker you're forking doesn't have it enabled so can we either have multiple workers based on the set of interpreter flags we see in test files? Or an ugly hack to run all of yath under taint worker and all.

@exodist
Copy link
Member

exodist commented Feb 19, 2023

Hmm. It may be possible to add some logic to preload tools that acounts for this.

Maybe something like this:
taint_stage FOO => sub { ... }

This can add a stage that need sot have taint checking turned on. When the runner is initializing stages it can launch a new perl interpreter to start the stage, instead of a plain fork. The one it starts can have taint enabled at the start.

This would only really work for top-level stages, as the new process will not be able to inherit anything preloaded in earlier stages. But child-stages of the taint one(s) would inherit taint checking.

I am not sure how soon I could get to this. But Using the specification above, if you wanted to look into writing a PR I should be able to review it and/or give pointers and release it once it is working.

@JRaspass
Copy link
Contributor Author

So after bashing my head against this for a few hours I'm not really sure how to achieve what you suggest and could do with a few pointers.

So I found where you fork in Test2::Harness::Runner::Preloader::launch_stage() and while I could easily exec a new perl interpreter with tainting enabled, I don't see how to make said interpreter have the correct state to continue execution and go on to call start_stage. Damn this would be so much easier if tainting could be toggled at runtime.

I also played with a far easier but much hackier approach where the yath script would re-exec itself at the start if you passed a taint flag or it read it from the config file, this way the whole thing runs under tainting. I had to sprinkle a few untainting regexes in places like App::Yath::load_options() that do runtime requires with user supplied data to make it work. This avoids having to remember state as I just get it to start again from the start with the same command line options and only exec if ${^TAINT} != taint flag/option. So this would lead to a slower start for people requesting a tainted yath but they're probably in the minority anyways.

@exodist
Copy link
Member

exodist commented Feb 21, 2023

If you want to add one or more taint options to yath itself so that the entire thing runs with taint, I would review and probably merge that.

One thing to note is that yath itself does not need to start with taint, only the yath-runner process (and processes forked from it) need the taint mode enabled for the tests to run in taint. the ::test or ::start command(s) launch yath-runner by executing yath runner ... so you can find that and just make sure it is started with taint as needed based on command line arguments.

@JRaspass
Copy link
Contributor Author

So I expanded the test cases to try and understand the runner logic better, is my understanding correct? Each test is always executed from a new process, those processes can be forks of perl which in turn can have libraries preloaded into them. The shebangs in test files influence which tests share a preloaded fork.

I expanded the test file to look like this:

#!perl

use Test2::V0 -target => 'MyExpensive';

diag "pid:     $$";
diag "ppid:    ${\getppid}";
diag "preload: @{[ $ENV{T2_HARNESS_PRELOAD} // 0 ]}";
diag "taint:   ${^TAINT}";
diag "warning: $^W";

pass;

done_testing;

Turned it into three files which vary in shebang:

$ head -n1 t/perl*.t
==> t/perl.t <==
#!perl

==> t/perl-T.t <==
#!perl -T

==> t/perl-w.t <==
#!perl -w

Added a bit more debug to the "expensive" library:

package MyExpensive;

warn "$$ is loading me...\n";
sleep 3;

1;

Added a shell test for good measure:

#!/bin/sh

echo '#' preload: $T2_HARNESS_PRELOAD >&2

echo ok 1
echo 1..1

And this is the output I get:

$ yath test -TP MyPreload
(INTERNAL)     54623 is loading me...
( STDERR )  job  1    54655 is loading me...
(  DIAG  )  job  1    pid:     54655
(  DIAG  )  job  1    ppid:    54623
(  DIAG  )  job  1    preload: 0
(  DIAG  )  job  1    taint:   1
(  DIAG  )  job  1    warning: 0
( PASSED )  job  1    t/perl-T.t
(  TIME  )  job  1    Startup: 3.10599s | Events: 0.00168s | Cleanup: 0.00284s | Total: 3.11051s
(  DIAG  )  job  2    pid:     54656
(  DIAG  )  job  2    ppid:    54623
(  DIAG  )  job  2    preload: 1
(  DIAG  )  job  2    taint:   0
(  DIAG  )  job  2    warning: 1
( PASSED )  job  2    t/perl-w.t
(  TIME  )  job  2    Startup: 0.03992s | Events: 0.00146s | Cleanup: 0.00410s | Total: 0.04548s
(  DIAG  )  job  3    pid:     54657
(  DIAG  )  job  3    ppid:    54623
(  DIAG  )  job  3    preload: 1
(  DIAG  )  job  3    taint:   0
(  DIAG  )  job  3    warning: 0
( PASSED )  job  3    t/perl.t
(  TIME  )  job  3    Startup: 0.04139s | Events: 0.00156s | Cleanup: 0.00426s | Total: 0.04721s
(  DIAG  )  job  4    preload:
( PASSED )  job  4    t/sh.t
(  TIME  )  job  4    Startup: 0.00789s | Events: 0.00000s | Cleanup: -0.00252s | Total: 0.00537s

So a few observations (feel free to correct any of them).

  • The initial loading line before any test output is the preloader which is then used for perl.t and perl-w.t.
  • The second loading line is inside the taint test and if we had additional taint tests they would also have loading lines, ie there is no preloading whatsoever for these tests and the ENV flag backs this up.
  • The warning global is correct in the perl-w.t test despite it being forked from the same perl as the normal test so you must be emulating this behaviour by parsing the shebang flags yourself and toggling the globals? Yet against I wish ${^TAINT} was writable. I can't think of any flags that wouldn't be possible to do at runtime, even -C is possible by messing with layers and/or decoding @argv.
  • There's a big gap between parent PID and PID and all tests seem to share a parent PID so that confused me.
  • Where's the logic for determining that a test can't use a preload, you must be inspecting the shebang?

@JRaspass
Copy link
Contributor Author

Oh it's worth noting that I couldn't get any of the above to work without applying my hack from #211 (comment). I kinda get the impression that tainting isn't a very popular use case?

@exodist
Copy link
Member

exodist commented Feb 21, 2023

shbang processing is mostly done here: https://github.com/Test-More/Test2-Harness/blob/master/lib/Test2/Harness/TestFile.pm

However the decision to skip the preload for taint happens here: https://github.com/Test-More/Test2-Harness/blob/master/lib/Test2/Harness/Runner/Job.pm#L419 Basically it refuses to do preload+fork for any flags other than -w as it is the only one that can be enabled at runtime.

I am not sure about the popularity of taint mode. I have never worked anywhere that uses it.

@JRaspass
Copy link
Contributor Author

Thanks for the info. As for potential fixes I think regardless of whether it happens in the Runner or the Preloader somewhere a fork need to be replaced with a fork + exec perl -T which means working out some way to convert the state of the currently running fork into some kind of IPC process to pass to the newly formed interpreter, that's the bit I'm really struggling with. That's what motivated my approach to just say screw it and run the whole process tree under a tainted perl instead.

If anyone has any bright ideas on how to do the IPC I'm all ears because at present I'm pretty stalled on this issue.

@exodist
Copy link
Member

exodist commented Feb 21, 2023

I think the best option is to add a taint mode option.
First you need to add the option in https://github.com/Test-More/Test2-Harness/blob/master/lib/App/Yath/Options/Runner.pm#L18
Then you need to modify https://github.com/Test-More/Test2-Harness/blob/master/lib/App/Yath/Command/test.pm#L890 to add the taint flag (This starts the yath-runner process, and it does not fork to do it, completely new instance of perl)
Then finally update the logic at https://github.com/Test-More/Test2-Harness/blob/master/lib/Test2/Harness/Runner/Job.pm#L419 so that it does allow fork if the flag is -T and the runner has taint mode enabled.

Then clean up any places where data in the runner is currently considered tainted, like your hack up above.

@JRaspass
Copy link
Contributor Author

Thanks, I'll give this approach a go.

@JRaspass JRaspass linked a pull request Feb 25, 2023 that will close this issue
@JRaspass
Copy link
Contributor Author

Quick update, this approach does seem to work. I have a very rough around the edges PoC commit that seems to DTRT. ATM it hardcodes tainting on and disables the shebang flag checking but they're all fixable.

The initial work is in the draft PR #264. Not wild about the sprinkling of untaint() everywhere. Open to better ideas.

@JRaspass
Copy link
Contributor Author

Sorry for the radio silence, other stuff came up and I haven't been able to touch this much, this is what's currently known left to do (there's probably more!):

  1. I've described the flag as best I can in Options::Runner but it might also need POD?
  2. I really really hate the logic on working out if we can fork:
# This approach won't scale if we allow even more swiches.
my @allowed_switches = '-w';

# Allow taint and taint + warnings if we're a tainted runner.
push @allowed_switches => qw/-T -wT -Tw/ if ${^TAINT};

my $allowed_switches     = join '|', map { quotemeta } @allowed_switches;
my $allowed_switches_re  = qr/\s*(?:$allowed_switches)\s*/;

return $self->{+USE_FORK} = 0 if grep { $_ !~ $allowed_switches_re } $self->switches;

# We're running under the taint but the test hasn't requested taint.
return $self->{+USE_FORK} = 0 if ${^TAINT} && !grep { /\s*-w?Tw?\s*/ } $self->switches;

The complication mainly stems from parsing the shebang with regexes and accounting for argument bundling. I'm hesitant to start actually parsing the shebang properly as we'd have to ensure we parse it the same way perl does. Suggestions on how to improve this logic are VERY welcome!
3. I'm really struggling on how to add a test for this new taint flag. I agree with you that without it it'll be very possible to add a regression in this functionality. Any pointers here would be welcome too.

So mainly I'm blocked on points 2 and 3. In terms of good news though, the patch works! I've been trying it out at $WORK and it really shaves heaps of time off our test suite run, so that's good 🎉 (99% of our tests enable tainting since our product also does).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants