buildomat: retain log files from test failures #542

jclulow · 2021-12-22T01:10:52Z

As seen in #540, sometimes a test run under buildomat fails. We see in the stdio output of the job the path of a log file; viz., /tmp/omicron_nexus-af7cc18609452f5b-test_paginated_single_column_descending.3657.7.log. We don't currently preserve the contents of the log file, though.

This PR adds a buildomat output rule to save any file that matches the glob /tmp/omicron*.log which would have caught this log file. Buildomat will save any matching files, including links to their contents (in the check run page and in the full details of the underlying job), whether the job succeeds or fails. As long as successful tests clean up their log files, but failures do not, we will only preserve the failures.

jclulow · 2021-12-22T01:49:02Z

I guess, looking at the check run, that it does in fact create and leave a few log files even in the success case! All up only about 28KB though.

ahl · 2021-12-22T03:07:55Z

Thanks for doing this and the 28K per build seems like something we can live with until we eventually need to clean them up. If we're stressed about it may be we could gzip them in the interim.

davepacheco · 2021-12-22T18:02:24Z

@jclulow What do you think about setting $TMPDIR to something more specific inside /tmp (like /tmp/omicron_ci) and then saving that whole directory? I'd like to save the CockroachDB temporary directories as well. I believe we'll need those to debug #540. In my setup, I've got TMPDIR set to /dangerzone/omicron_tmp. When I use omicron-dev db-run, the database temporary directory was /dangerzone/omicron_tmp/.tmpa1H8X6/data/cockroach-temp218835377.

I think ideally we would do this only for the cargo test phase and not cargo build because the build process leaves a bunch of rustc directories as well as the crdb-base directory. This looks easy to do with the current buildomat script (which runs test separately after build).

jclulow · 2021-12-22T19:54:44Z

@jclulow What do you think about setting $TMPDIR to something more specific inside /tmp (like /tmp/omicron_ci) and then saving that whole directory? I'd like to save the CockroachDB temporary directories as well. I believe we'll need those to debug #540. In my setup, I've got TMPDIR set to /dangerzone/omicron_tmp. When I use omicron-dev db-run, the database temporary directory was /dangerzone/omicron_tmp/.tmpa1H8X6/data/cockroach-temp218835377.

Doesn't seem unreasonable in principle. I have two initial questions:

what would you estimate as the average and maximum file count under the separate TMPDIR directory?
how large are these files in aggregate?

I think ideally we would do this only for the cargo test phase and not cargo build because the build process leaves a bunch of rustc directories as well as the crdb-base directory. This looks easy to do with the current buildomat script (which runs test separately after build).

Would we want to perhaps preserve these files, then, only if the cargo test phase fails?

davepacheco · 2021-12-23T17:43:12Z

@jclulow What do you think about setting $TMPDIR to something more specific inside /tmp (like /tmp/omicron_ci) and then saving that whole directory? I'd like to save the CockroachDB temporary directories as well. I believe we'll need those to debug #540. In my setup, I've got TMPDIR set to /dangerzone/omicron_tmp. When I use omicron-dev db-run, the database temporary directory was /dangerzone/omicron_tmp/.tmpa1H8X6/data/cockroach-temp218835377.

Doesn't seem unreasonable in principle. I have two initial questions:
* what would you estimate as the average and maximum file count under the separate `TMPDIR` directory?

* how large are these files in aggregate?

Are you asking how many files get created at all (possibly removed) or that get left at the end of a test run?

This is hard to gauge and will certainly change over time. I'll use the failure I injected in #546 as a starting point. I injected a single failure into a test that had set up a database instance. According to the output from this run, GitHub uploaded 38 files totalling 789914 bytes of logical size 279733 bytes after zip compression. This is probably a minimum (and also typical) for any test that starts a database and then fails. There are currently 31 tests using the #[nexus_test] macro, which probably comprises most of the tests that create a database.

I have not checked whether other tests create and then remove other kinds of files on success. Offhand, I don't know of any other uses of files in $TMPDIR (aside from the database and the tests' log files). So the largest ones are probably the tests that create the most data. But I don't think anything creates meaningfully more than the rest.

So putting this together, I'd estimate we probably have 40 tests that each create about 40 files totalling 1 MiB per test. On a successful run, these are all removed. If all the tests failed, that would be 1,600 files totalling 40 MiB.

I think it would be reasonable to put these all into a tarball to preserve metadata and to avoid file count becoming much of a problem.

I think ideally we would do this only for the cargo test phase and not cargo build because the build process leaves a bunch of rustc directories as well as the crdb-base directory. This looks easy to do with the current buildomat script (which runs test separately after build).

Would we want to perhaps preserve these files, then, only if the cargo test phase fails?

I think that would be fine. My experience from working on #546 is that on a successful build and test, the only thing that's left in $TMPDIR is "crdb-base", the seed directory. (See #548.) If we add that to an exclude list (which I think we want to do anyway), maybe it would be simplest to just always save anything found in $TMPDIR?

buildomat: retain log files from test failures

1176f55

ahl approved these changes Dec 22, 2021

View reviewed changes

davepacheco mentioned this pull request Dec 22, 2021

tests that make their own logctx need to clean it up on success #543

Merged

davepacheco mentioned this pull request Dec 22, 2021

save test outputs from failed GitHub Actions #546

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buildomat: retain log files from test failures #542

buildomat: retain log files from test failures #542

jclulow commented Dec 22, 2021 •

edited

Loading

jclulow commented Dec 22, 2021

ahl commented Dec 22, 2021

davepacheco commented Dec 22, 2021

jclulow commented Dec 22, 2021

davepacheco commented Dec 23, 2021

buildomat: retain log files from test failures #542

Are you sure you want to change the base?

buildomat: retain log files from test failures #542

Conversation

jclulow commented Dec 22, 2021 • edited Loading

jclulow commented Dec 22, 2021

ahl commented Dec 22, 2021

davepacheco commented Dec 22, 2021

jclulow commented Dec 22, 2021

davepacheco commented Dec 23, 2021

jclulow commented Dec 22, 2021 •

edited

Loading