-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
content: require backing store for checkpoint put #6251
Comments
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Sep 3, 2024
Problem: A backing store is required for content.flush but it is not required for content.checkpoint-put. This is inconsistent and can lead to checkpointing problems done the line. Require content.checkpoint-put to only work if there is a backing store available. As a consequence, remove code that handled "cached" checkpoints when a backing store is not available. Fixes flux-framework#6251
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Sep 3, 2024
Problem: A backing store is required for content.flush but it is not required for content.checkpoint-put. This is inconsistent and can lead to checkpointing problems done the line. Require content.checkpoint-put to only work if there is a backing store available. As a consequence, remove code that handled "cached" checkpoints when a backing store is not available. Fixes flux-framework#6251
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Sep 4, 2024
Problem: A backing store is required for content.flush but it is not required for content.checkpoint-put. This is inconsistent and can lead to checkpointing problems done the line. Require content.checkpoint-put to only work if there is a backing store available. As a consequence, remove code that handled "cached" checkpoints when a backing store is not available. Fixes flux-framework#6251
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 18, 2024
Problem: A backing store is required for content.flush but it is not required for content.checkpoint-put. This is inconsistent and can lead to checkpointing problems done the line. Require content.checkpoint-put to only work if there is a backing store available. As a consequence, remove code that handled "cached" checkpoints when a backing store is not available. Fixes flux-framework#6251
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 18, 2024
Problem: A backing store is required for content.flush but it is not required for content.checkpoint-put. This is inconsistent and can lead to checkpointing problems done the line. Require content.checkpoint-put to only work if there is a backing store available. As a consequence, remove code that handled "cached" checkpoints when a backing store is not available. Fixes flux-framework#6251
chu11
added a commit
to chu11/flux-core
that referenced
this issue
Dec 18, 2024
Problem: A backing store is required for content.flush but it is not required for content.checkpoint-put. This is inconsistent and can lead to checkpointing problems done the line. Require content.checkpoint-put to only work if there is a backing store available. As a consequence, remove code that handled "cached" checkpoints when a backing store is not available. Fixes flux-framework#6251
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As noted in #6242, there appears to be an inconsitency in the content module
content.flush
will fail if a backing store is not loadedcontent.checkpoint-put
will succeed if a backing store is not loadedthis appears to be inconsistent with each other. And since normally we want to
content.flush
before doingcontent.checkpoint-put
, it's not clear how to handle this inconsistency.It's not 100% clear to me why it was done this way, but my theory is it's b/c we can write data to the content module and
content.flush
later on when the backing module is loaded. So the idea was to support that same idea with checkpoints. When a backing module is loaded, the checkpoints stored in memory are flushed to the backing store. However, no checkpoint "flush" equivalent RPC exists.IMO there are two things we could do to make this consistent:
A) return ENOSYS on
checkpoint-put
if the backing store does not exist. In/etc/rc1
and/etc/rc3
we already do special handling to ensure that backing store necessary things are not done if thenone
backing module is used. This would also allow fixes in #6242 and #6240 to be smoother.B) support a
content-checkpoint-flush
equivalent tocontent.flush
. So we can store checkpoints in memory and flush later on. (alternatelycontent.flush
could do both, but that requires writing a checkpoint before callingcontent.flush
. Or alternately a new "do both" flush could be done.)I like option
A
better. I think conceptually users will not think that acheckpoint
should work only in memory, it works b/c a backing store exists.AFAICT, the only thing this would change is a bunch of tests that currently test that
checkpoint-put
works without a backing store. Inrc1
we already load the backing module before doing a content restore. So I think all cases that assume a backing store exists for checkpointing already checks that a backing store is desired.The text was updated successfully, but these errors were encountered: