Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

content: require backing store for checkpoint put #6251

Open
chu11 opened this issue Aug 31, 2024 · 0 comments
Open

content: require backing store for checkpoint put #6251

chu11 opened this issue Aug 31, 2024 · 0 comments
Assignees

Comments

@chu11
Copy link
Member

chu11 commented Aug 31, 2024

As noted in #6242, there appears to be an inconsitency in the content module

  • a content.flush will fail if a backing store is not loaded
  • a content.checkpoint-put will succeed if a backing store is not loaded

this appears to be inconsistent with each other. And since normally we want to content.flush before doing content.checkpoint-put, it's not clear how to handle this inconsistency.

It's not 100% clear to me why it was done this way, but my theory is it's b/c we can write data to the content module and content.flush later on when the backing module is loaded. So the idea was to support that same idea with checkpoints. When a backing module is loaded, the checkpoints stored in memory are flushed to the backing store. However, no checkpoint "flush" equivalent RPC exists.

IMO there are two things we could do to make this consistent:

A) return ENOSYS on checkpoint-put if the backing store does not exist. In /etc/rc1 and /etc/rc3 we already do special handling to ensure that backing store necessary things are not done if the none backing module is used. This would also allow fixes in #6242 and #6240 to be smoother.

B) support a content-checkpoint-flush equivalent to content.flush. So we can store checkpoints in memory and flush later on. (alternately content.flush could do both, but that requires writing a checkpoint before calling content.flush. Or alternately a new "do both" flush could be done.)

I like option A better. I think conceptually users will not think that a checkpoint should work only in memory, it works b/c a backing store exists.

AFAICT, the only thing this would change is a bunch of tests that currently test that checkpoint-put works without a backing store. In rc1 we already load the backing module before doing a content restore. So I think all cases that assume a backing store exists for checkpointing already checks that a backing store is desired.

@chu11 chu11 self-assigned this Aug 31, 2024
chu11 added a commit to chu11/flux-core that referenced this issue Sep 3, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put.  This is inconsistent
and can lead to checkpointing problems done the line.

Require content.checkpoint-put to only work if there is a backing
store available.  As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.

Fixes flux-framework#6251
chu11 added a commit to chu11/flux-core that referenced this issue Sep 3, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put.  This is inconsistent
and can lead to checkpointing problems done the line.

Require content.checkpoint-put to only work if there is a backing
store available.  As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.

Fixes flux-framework#6251
chu11 added a commit to chu11/flux-core that referenced this issue Sep 4, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put.  This is inconsistent
and can lead to checkpointing problems done the line.

Require content.checkpoint-put to only work if there is a backing
store available.  As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.

Fixes flux-framework#6251
chu11 added a commit to chu11/flux-core that referenced this issue Dec 18, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put.  This is inconsistent
and can lead to checkpointing problems done the line.

Require content.checkpoint-put to only work if there is a backing
store available.  As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.

Fixes flux-framework#6251
chu11 added a commit to chu11/flux-core that referenced this issue Dec 18, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put.  This is inconsistent
and can lead to checkpointing problems done the line.

Require content.checkpoint-put to only work if there is a backing
store available.  As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.

Fixes flux-framework#6251
chu11 added a commit to chu11/flux-core that referenced this issue Dec 18, 2024
Problem: A backing store is required for content.flush but it
is not required for content.checkpoint-put.  This is inconsistent
and can lead to checkpointing problems done the line.

Require content.checkpoint-put to only work if there is a backing
store available.  As a consequence, remove code that handled
"cached" checkpoints when a backing store is not available.

Fixes flux-framework#6251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant