Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildomat: retain log files from test failures #542

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

jclulow
Copy link
Collaborator

@jclulow jclulow commented Dec 22, 2021

As seen in #540, sometimes a test run under buildomat fails. We see in the stdio output of the job the path of a log file; viz., /tmp/omicron_nexus-af7cc18609452f5b-test_paginated_single_column_descending.3657.7.log. We don't currently preserve the contents of the log file, though.

This PR adds a buildomat output rule to save any file that matches the glob /tmp/omicron*.log which would have caught this log file. Buildomat will save any matching files, including links to their contents (in the check run page and in the full details of the underlying job), whether the job succeeds or fails. As long as successful tests clean up their log files, but failures do not, we will only preserve the failures.

@jclulow
Copy link
Collaborator Author

jclulow commented Dec 22, 2021

I guess, looking at the check run, that it does in fact create and leave a few log files even in the success case! All up only about 28KB though.

@ahl
Copy link
Contributor

ahl commented Dec 22, 2021

Thanks for doing this and the 28K per build seems like something we can live with until we eventually need to clean them up. If we're stressed about it may be we could gzip them in the interim.

@davepacheco
Copy link
Collaborator

@jclulow What do you think about setting $TMPDIR to something more specific inside /tmp (like /tmp/omicron_ci) and then saving that whole directory? I'd like to save the CockroachDB temporary directories as well. I believe we'll need those to debug #540. In my setup, I've got TMPDIR set to /dangerzone/omicron_tmp. When I use omicron-dev db-run, the database temporary directory was /dangerzone/omicron_tmp/.tmpa1H8X6/data/cockroach-temp218835377.

I think ideally we would do this only for the cargo test phase and not cargo build because the build process leaves a bunch of rustc directories as well as the crdb-base directory. This looks easy to do with the current buildomat script (which runs test separately after build).

@jclulow
Copy link
Collaborator Author

jclulow commented Dec 22, 2021

@jclulow What do you think about setting $TMPDIR to something more specific inside /tmp (like /tmp/omicron_ci) and then saving that whole directory? I'd like to save the CockroachDB temporary directories as well. I believe we'll need those to debug #540. In my setup, I've got TMPDIR set to /dangerzone/omicron_tmp. When I use omicron-dev db-run, the database temporary directory was /dangerzone/omicron_tmp/.tmpa1H8X6/data/cockroach-temp218835377.

Doesn't seem unreasonable in principle. I have two initial questions:

  • what would you estimate as the average and maximum file count under the separate TMPDIR directory?
  • how large are these files in aggregate?

I think ideally we would do this only for the cargo test phase and not cargo build because the build process leaves a bunch of rustc directories as well as the crdb-base directory. This looks easy to do with the current buildomat script (which runs test separately after build).

Would we want to perhaps preserve these files, then, only if the cargo test phase fails?

@davepacheco
Copy link
Collaborator

@jclulow What do you think about setting $TMPDIR to something more specific inside /tmp (like /tmp/omicron_ci) and then saving that whole directory? I'd like to save the CockroachDB temporary directories as well. I believe we'll need those to debug #540. In my setup, I've got TMPDIR set to /dangerzone/omicron_tmp. When I use omicron-dev db-run, the database temporary directory was /dangerzone/omicron_tmp/.tmpa1H8X6/data/cockroach-temp218835377.

Doesn't seem unreasonable in principle. I have two initial questions:

* what would you estimate as the average and maximum file count under the separate `TMPDIR` directory?

* how large are these files in aggregate?

Are you asking how many files get created at all (possibly removed) or that get left at the end of a test run?

This is hard to gauge and will certainly change over time. I'll use the failure I injected in #546 as a starting point. I injected a single failure into a test that had set up a database instance. According to the output from this run, GitHub uploaded 38 files totalling 789914 bytes of logical size 279733 bytes after zip compression. This is probably a minimum (and also typical) for any test that starts a database and then fails. There are currently 31 tests using the #[nexus_test] macro, which probably comprises most of the tests that create a database.

I have not checked whether other tests create and then remove other kinds of files on success. Offhand, I don't know of any other uses of files in $TMPDIR (aside from the database and the tests' log files). So the largest ones are probably the tests that create the most data. But I don't think anything creates meaningfully more than the rest.

So putting this together, I'd estimate we probably have 40 tests that each create about 40 files totalling 1 MiB per test. On a successful run, these are all removed. If all the tests failed, that would be 1,600 files totalling 40 MiB.

I think it would be reasonable to put these all into a tarball to preserve metadata and to avoid file count becoming much of a problem.

I think ideally we would do this only for the cargo test phase and not cargo build because the build process leaves a bunch of rustc directories as well as the crdb-base directory. This looks easy to do with the current buildomat script (which runs test separately after build).

Would we want to perhaps preserve these files, then, only if the cargo test phase fails?

I think that would be fine. My experience from working on #546 is that on a successful build and test, the only thing that's left in $TMPDIR is "crdb-base", the seed directory. (See #548.) If we add that to an exclude list (which I think we want to do anyway), maybe it would be simplest to just always save anything found in $TMPDIR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants