Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3314: CSI Changed Block Tracking #4082

Merged
merged 59 commits into from
Jun 12, 2024
Merged

Conversation

ihcsim
Copy link
Contributor

@ihcsim ihcsim commented Jun 12, 2023

  • One-line PR description: KEP for CSI changed block tracking (CBT)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 12, 2023
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 12, 2023
Signed-off-by: Ivan Sim <[email protected]>
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 29, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 29, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 29, 2023

The `BlockMetadataType` enumeration specifies the style used: `FIXED_LENGTH` or `VARIABLE_LENGTH`.
When the *block-based* style (`FIXED_LENGTH`) is used it is up to the SP plugin to define the
block size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will backup clients make use of FIXED_LENGTH/VARIABLE_LENGTH?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A backup client would, in general, have to map the "view" of the block device returned by the API to its own view of the block device. For example, a client could have its own concept of block size and would have to map, say, a VMware extent map, to a range of blocks; conversely, it may have to map a range of AWS EBS blocks to an extent map.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of FIXED_LENGTH/VARIABLE_LENGTH is still not clear to me.

The API returns an ascending, non-overlapping sequence of BlockMetadata elements. Backup clients may accumulate adjacent BlockMetadata elements (i.e. convert from blocks to extents) and then align offsets/lengths to the backup client's internal block size, but this does not require BlockMetadataType.

Can you give a concrete example of how a backup client will use BlockMetadataType to clarify?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we discussed this in the CSI WG, the argument was thus:

  • If the SP uses FIXED_LENGTH, it's a promise that lengths will be constant for the given pair of snapshots, which may unlock certain optimizations on the backup software side which wouldn't be possible with VARIABLE_LENGTH
  • If the SP uses VARIABLE_LENGTH, it remains free to return whatever lengths it wants -- possible folding consecutive change blocks and thereby compressing the metadata, and the backup software has to cope with it.

If these details aren't clear from the CSI spec, then it's a failure of spec writing, and we should amend the spec to be more clear about the exact semantics of that field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bswartz While the spec covers the effect of BlockMetadataType on the Length field, it's still unclear to me how a backup client would use this information.

The one example I can think of, making decisions about backup data layout (e.g. block size and the backup client's internal metadata parameters), doesn't seem to be well-served by BlockMetadataType:

BlockMetadataType only applies for the current API call. Maybe it could even change between successive calls on the same snapshot pair. Definitely between different snapshot pairs from the same volume.

Is a backup client that uses this information making the assumption that BlockMetadataType and the block size does not change? If yes, then either backup clients cannot use BlockMetadataType for this purpose or the API design needs adjustment.

An API that captures permanent attributes like block size would be something like GetVolumeMetadataHints(volume_id) -> VolumeMetadataHints where the spec guarantees the stability of the hints across snapshots of the volume.

Or maybe I'm still missing how backup clients could use BlockMetadataType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recall that these new snapshot metadata APIs are stream-oriented, so "one API call" is actually the whole stream of metadata for a given pair of snapshots. If the SP commits to always use a Length of 4096 (for example) then the backup software can safely allocate a fixed-size bitmap where each bit represents 4k of change for the whole snasphot.

If an SP returns variable length blocks, then the backup software has to be prepared to use something other than a bitmap to represent the delta, or be prepared to guess a optimal bitmap granularity and be prepared to adjust delta blocks to fit the bitmap, possibly wasting small amounts of space by backing up unchanged data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXED_LENGTH is expensive over the wire because adjacent BlockMetadata elements cannot be sent merged. It would be more efficient for the SP to include an optional block size hint field in the response and allow the Length field to be vary instead of having BlockMetadataType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thing the API is alpha and we can evolve it. I agree this isn't the most efficient transport mechanism, but it's dramatically more efficient that what's being done in the absence of this feature. The designers were optimizing for simplicity rather than efficiency, which is the right place to focus in an alpha version of anything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for considering this feedback!

repeated BlockMetadata block_metadata = 3;
}

message GetMetadataDeltaRequest {
Copy link

@stefanha stefanha Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between GetMetadataAllocatedRequest and GetMetadataDeltaRequest aside from the snapshot_id vs base_snapshot_id/target_snapshot_id fields?

If there is no difference, maybe make base_snapshot_id optional (like the CSI spec's Snapshot group_snapshot_id field) and unify the two requests to reduce duplication. When base_snapshot_id is empty then allocated data is reported. When base_snapshot_id is non-empty then the delta is reported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a difference between GetMetadataAllocatedRequest and GetMetadataDeltaRequest aside from the snapshot_id vs base_snapshot_id/target_snapshot_id fields?

No, not really. We had considered your suggestion and decided to explicitly document separate methods for clarity.

```
cbt.storage.k8s.io/driver: NAME_OF_THE_CSI_DRIVER
```
The presence of this label allows a backup application to efficiently locate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have multiple CRs for the same driver?

Instead of using labels, can the name of the object == the name of the driver, which is what we require for the CSIDriver object: https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/csi-driver-v1/

Or should we put it in the CSIDriver object itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see why we can't use the object name, instead of labels. We explored using the CSIDriver object to store this information, but was told that it was mainly used for kubelet/storage related spec, not things like HTTP endpoints.

Copy link
Contributor

@carlbraganza carlbraganza May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ambiguous to find multiple CRs for a given driver; which one should the application pick?

I agree the object could be named for the driver but the spec doesn't mandate that - it would totally depend on the driver installer. The reason is that the spec does not require these CRs to be in any particular namespace; the app finds them via a label search.

I think there were discussions on modifying the CSIDriver object in the WG and the concept was shot down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which spec doesn't mandate the name of the CR? We could define this requirement as such like we did for the CSIDriver object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I'll add something on the name of the CR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten to remove the CR label requirement - fixed now!

existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

##### Prerequisite testing updates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure that our standard k8s-csi logging methods are not logging any tokens/secrets.

Copy link
Contributor

@carlbraganza carlbraganza May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sentiment, but should we be stating that in the spec?

Copy link
Member

@msau42 msau42 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think this needs to be explicit in the spec. It's just an implementation requirement for our common libraries/sidecars as this has been a source of CVEs in the past

@xing-yang
Copy link
Contributor

/assign @liggitt
for review from SIG-Auth.

@@ -0,0 +1,3 @@
kep-number: 3314
alpha:
approver: "@johnbelamaric"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at this before as shadow, and John is overbooked. So let's switch to me with PRR approval on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jun 5, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jun 5, 2024
@soltysh
Copy link
Contributor

soltysh commented Jun 7, 2024

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 7, 2024
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PRR is mostly good, I'll wait for approval from sig-storage and sig-auth.

- Components depending on the feature gate:
- [x] Other
- Describe the mechanism:
The new components will be implemented as part of the out-of-tree CSI framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still holds, not a blocking, but would be nice to answer this and questions around monitoring and troubleshooting.

to Kubernetes backup applications
by creating a [SnapshotMetadataService CR](#snapshot-metadata-service-custom-resource)
that contains the service's TCP endpoint address, CA certificate and
an audience string needed for token authentication.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow up offline on this, but we may want to enforce some syntax/restrictions on the allowed audience strings. The primary goal is to prevent reusing some reserved audience for kube-apiserver. cc @liggitt

@msau42
Copy link
Member

msau42 commented Jun 12, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2024
@xing-yang
Copy link
Contributor

@soltysh could you provide approval for PRR? Thanks.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
for PRR

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ihcsim, msau42, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 12, 2024
@k8s-ci-robot k8s-ci-robot merged commit cf708c9 into kubernetes:master Jun 12, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.