-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fedora-ostree-pruner #79
Conversation
64ffb6a
to
19b592f
Compare
ok this is ready for an initial round of review. I've done some initial tests, but need to run more |
prune_prod_repo(args.test) | ||
|
||
# If we were asked to run in a loop, then run once a day | ||
if args.loop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I wanted to run this in a loop rather than run it as part of a kubernetes cron jobs is because historically it's been really hard to get logs from openshift pods in fedora infra. I've recently learned we have a kibana instance and should be able to get logs more easily. I'll try to confirm that and then maybe we'll convert this into a cronjob..
on the topic of frequency. Pruning is actually pretty I/O intensive (since it has to read a lot from the repo just to see if there is anything to prune) and slow on these large repos. I think maybe we should just run it on like Friday or Saturday (once a week) during the night.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I wanted to run this in a loop rather than run it as part of a kubernetes cron jobs is because historically it's been really hard to get logs from openshift pods in fedora infra. I've recently learned we have a kibana instance and should be able to get logs more easily. I'll try to confirm that and then maybe we'll convert this into a cronjob..
I've found getting logs from the kibana instance to be unreliable. Never really got it to work well. So I stand by this.
on the topic of frequency. Pruning is actually pretty I/O intensive (since it has to read a lot from the repo just to see if there is anything to prune) and slow on these large repos. I think maybe we should just run it on like Friday or Saturday (once a week) during the night.
But I still think we should do this once a week during the night. Maybe I'll convert this into a systemd timer (run inside the container).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally would stick with this approach over a systemd timer.
Yeah, agreed this will be a very heavy I/O operation. Every seven days sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the problem is trying to make it happen at a specific day/time. basically what happens here is it waits X period of time, runs (could take a really long time to run) then waits X period of time. It's offset by the time it took to run, so you get an inconsistent pattern. Maybe there is a python package for "sleep until X time on X day" that we could use instead.
here is some output from running this against the prod repo from
|
ok i added a commit which switches to a generic prune for content for deleted refs. This should be a little more efficient than running a prune for each deleted ref. |
Talked to @jlebon about the safety of running the pruner concurrently with other applications. Pasting here for posterity:
|
09d5d83
to
5812255
Compare
I guess the question is, should we implement that here? I think probbaly not and just let the default behavior apply. |
OSTREECOMPOSEREPO = '/mnt/koji/compose/ostree/repo' | ||
OSTREEPRODREPO = '/mnt/koji/ostree/repo' | ||
OSTREECOMPOSEREPO = '/mnt/fedora_koji_prod/koji/compose/ostree/repo' | ||
OSTREEPRODREPO = '/mnt/fedora_koji_prod/koji/ostree/repo' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's there because I can test this out on some composer staget machine (which has read-only mounts under /mnt/fedora_koji_prod/koji
) will remove before final push.
# The # of commits to retain in the compose repo for each branch | ||
COMPOSE_REPO_POLICY = 'depth:60' | ||
|
||
# The policy for each ref in the prod repo. None means "keep all" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good here to document the syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to take a stab at this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok took a stab at updating the comment with the syntax
PROD_REF_POLICIES[f'fedora/{arch}/coreos/{stream}'] = None | ||
|
||
# Beneath this is code, no config needed here | ||
def runcmd(cmd: list, **kwargs: int) -> subprocess.CompletedProcess: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that int
annotation correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so: https://www.python.org/dev/peps/pep-0484/#id37
FEDORA_STABLE_LIST = [33, 34, 35] | ||
FEDORA_EOL_LIST = [27, 28, 29, 30, 31, 32] | ||
|
||
ATOMIC_HOST_ARCHES = ['x86_64', 'aarch64', 'ppc64le'] | ||
SILVERBLUE_ARCHES = ['x86_64', 'aarch64', 'ppc64le'] # Applies to Kinoite | ||
FEDORA_COREOS_ARCHES = ['x86_64', 'aarch64'] | ||
|
||
# https://github.com/coreos/fedora-coreos-tracker/blob/master/stream-tooling.md#introduction | ||
FEDORA_COREOS_PRODUCTION_STREAMS = ['next', 'testing', 'stable'] | ||
|
||
# The # of commits to retain in the compose repo for each branch | ||
COMPOSE_REPO_POLICY = 'depth:60' | ||
|
||
# The policy for each ref in the prod repo. None means "keep all" | ||
PROD_REF_POLICIES = dict() | ||
for arch in SILVERBLUE_ARCHES: | ||
# Keep only the last 90 days of rawhide Silverblue/Kinoite | ||
PROD_REF_POLICIES[f'fedora/rawhide/{arch}/silverblue'] = 'time:90' | ||
PROD_REF_POLICIES[f'fedora/rawhide/{arch}/kinoite'] = 'time:90' | ||
PROD_REF_POLICIES[f'fedora/rawhide/{arch}/workstation'] = 'delete' | ||
# For Silverblue/Kinoite stable keep all stable/updates (they are aliased). For testing, | ||
# keep just the last 90 days. | ||
for release in FEDORA_STABLE_LIST: | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/silverblue'] = None | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/updates/silverblue'] = None | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/testing/silverblue'] = 'time:90' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/kinoite'] = None | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/updates/kinoite'] = None | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/testing/kinoite'] = 'time:90' | ||
# For EOL Silverblue/Kinoite since the updates ref and stable ref are aliased | ||
# we'll specify depth of 1 for both of those. | ||
for release in FEDORA_EOL_LIST: | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/silverblue'] = 'depth:0' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/updates/silverblue'] = 'depth:0' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/testing/silverblue'] = 'delete' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/kinoite'] = 'depth:0' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/updates/kinoite'] = 'depth:0' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/testing/kinoite'] = 'delete' | ||
# Delete any references to Atomic Workstation | ||
for release in FEDORA_EOL_LIST: | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/workstation'] = 'delete' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/updates/workstation'] = 'delete' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/testing/workstation'] = 'delete' | ||
for arch in ATOMIC_HOST_ARCHES: | ||
# Delete all atomic host rawhide | ||
PROD_REF_POLICIES[f'fedora/rawhide/{arch}/atomic-host'] = 'delete' | ||
# For EOL ATOMIC HOST we keep only the last commit on the stable ref | ||
for release in FEDORA_EOL_LIST: | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/atomic-host'] = 'depth:0' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/updates/atomic-host'] = 'delete' | ||
PROD_REF_POLICIES[f'fedora/{release}/{arch}/testing/atomic-host'] = 'delete' | ||
for arch in FEDORA_COREOS_ARCHES: | ||
# For production Fedora CoreOS Streams we don't prune anything right now | ||
for stream in FEDORA_COREOS_PRODUCTION_STREAMS: | ||
PROD_REF_POLICIES[f'fedora/{arch}/coreos/{stream}'] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally all this would be a separate JSON or YAML file instead so we clearly delineate configuration from code. Then it could eventually be a configmap instead of just baked into the container.
OK to start like this too though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I started here because I couldn't find a compact way to represent all of what we're doing here in a config file.. Here I'm using for loops to apply configuration to many different branches. ideas on best way to expose this for yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it would need some design work. Maybe we can circle back to this after the first prune since at least all the deleted refs will be gone.
It might be useful to set up something like coreos/fedora-coreos-pipeline#359 except run before the pruner, and run against all active commits so that we know we have those backed up as container images, and can rely on container registry GC. |
Ultimately, I think we should consider the ostree repos to be more like a cache. We should be able to rebuild and get "the same thing". (And really, we should think of RPMs in the same way, more like a cache of compiling the source code like Nix does) I think a first problem here is getting to a repository of manageable size for which future GCs are more reasonable. To that end, one approach might be to:
(This suggestion really parallels my one above to use Now, regarding questions around concurrency of writes and pruning - in general today, ostree has a very dumb "lock the world" approach to GC - it's not "mark and sweep". In general, if one has a prune running concurrently with writes, it is certainly possible for prune to delete an object which is expected to exist by an incoming write. However, my instinct says that in practice this is going to be quite unlikely. It'd need to be something like trying to revert the tip of a ref to a package that existed before the prune cutoff. In the end, if we detect that the tip of a ref somehow lost an object, well, we can just rebuild that thing right? If the "GC via repo swap" approach works well enough, that also seems fine to me to continue as a SOP. I haven't done any benchmarking but my instinct says that when the amount of "dead objects" grows to 20% or more of live data (and here it's probably like 10x more dead objects than live, right?) it's better to make the operation O(live data) instead of O(all data) like We could probably make the "prune via swap" be a more first-class thing in ostree, like |
I think this is true for CoreOS because we have the tarball (or OCI archive) for each build in our original build archive so we can rebuild and get a new set of Fedora CoreOS commits from scratch. For Silverblue and Kinoite, I think the only place the content lives is in the repo. |
Yes, we have the OCI archive for FCOS, and but we can easily do so for other editions too -
It would be embarassing if we somehow lost the recent ostree commits for them, but hardly fatal. The client will happily upgrade to a new commit built from the same RPMs. (zincati and the upgrade graph I think care more about this though for FCOS) But, I don't see any reason to put the tips of the refs at risk - that's why I am saying we should explicitly copy those out with |
Some more IRC discussions about this:
While I agree that races affecting commit content between importing and pruning objects will be rare, I think we should aim for stronger semantics for this. ISTM the core issue here is that we're not consistently using locks. I think repo swapping and saving/restoring content as SOP would still be subject to races without any locking. The prune code already takes an exclusive lock. I think we need to support writing taking shared locks (and respecting exclusive locks) even if no transactions are in use. Then at least a compose can just fail if it tries to push at the same time the pruner is running, which is better than allowing it but possibly losing data. We could have it wait instead, but I'm not sure how viable that is depending on how long the pruner will take. I like the idea of the repo swap for the initial prune (but e.g. done manually by releng running a script when we know it's safe to do). |
In the case we're using transactions, that's already done for us. Otherwise, we should do it ourselves. Otherwise we run the risk of racing with the pruner. For more context, see: ostreedev/ostree#2474 coreos/fedora-coreos-releng-automation#79
In the case we're using transactions, that's already done for us. Otherwise, we should do it ourselves. Otherwise we run the risk of racing with e.g. a prune operation. For more context, see: ostreedev/ostree#2474 coreos/fedora-coreos-releng-automation#79
In the case we're using transactions, that's already done for us. Otherwise, we should do it ourselves. Otherwise we run the risk of racing with e.g. a prune operation. For more context, see: ostreedev/ostree#2474 coreos/fedora-coreos-releng-automation#79
if everything looks good here now I'll proceed to squash the fixup commits and also run tests (which will probably take some time). |
072ff1f
to
ecf80f9
Compare
a033f04
to
a44b3b0
Compare
@jlebon - pushed up a new commit to take advantage of |
a44b3b0
to
bd1ca6e
Compare
bd1ca6e
to
4c2fb3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do a full re-review, but the interdiffs look sane to me!
This will be run in Fedora's infrastructure and will prune out OSTree repos that are used to serve content to Fedora users.
Just delete the ref first and then we'll do a generic prune for unreachable objects after looping through all the refs in the repo. This also means if more than one refs points to a commit we won't delete it like we were doing with the `ostree prune --delete-commit=$commit` approach.
When we prune commits the static deltas will get dropped too so this function is not needed.
OStree just learned a new `--commit-only` option to `ostree prune` [1]. This allows us to just delete the commits in the history of a ref that we desire to prune based on our policy, which is a fast operation. After we've deleted all the commit objects we can then go back and do one sweep over the repo to now clean up any unreachable objects. Previously everytime we ran `ostree prune` and deleted a certain amount of history from a ref the prune operation also computed reachability for every object in the repo at that time, which is expensive. If you have many refs this means the entire pruning process can take a long time. With this new mode of operation deleting the commit objects should take little time per run and then we can have one big expensive run at the very end that does the expensive reachability determination for every object in the repo.
This commit adds a decorator that will catch execptions and continue execution so that we don't lose logs in the container we are running in.
4c2fb3c
to
ea1f901
Compare
@jlebon - made one more update here. It's worth noting that the code here hardcodes us running in test mode, so theoretically no changes apply yet and we need a followup commit to get some pruning to actually take place. |
This is no longer baked into COSA for some reason and config-bot needs it.
runcmd(cmd) | ||
|
||
|
||
def catch_exceptions_and_continue(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I'm not sure about is this. Right now we do:
prune_compose_repo(...)
prune_prod_repo(...)
Is it the right thing to do to continue with pruning the prod repo if an error occurred in the compose repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're mostly independent, but probably depends on the error. i.e. if it's some sort of I/O
error the problem will probably apply to both operations.
This will be run in Fedora's infrastructure and will prune out OSTree
repos that are used to serve content to Fedora users.