-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object bucket provisioning #2100
Conversation
…e references to accessSecret, clarity principal and serviceAcct
/assign @xing-yang |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments might be handled with some doc cleanup, but I'd really like to understand the BAR/BA "delete on pod termination" rationale.
spec: | ||
protocol: | ||
name: [3] | ||
version: [4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there's been some discussion about this, but I'm generally uncomfortable about pulling version out separately. Among other things, it implies you are going to do something with this information (like assume backward compatibility or some other semantic-version rule). But we don't own the versioning scheme, so there really isn't anything you can do with it other than clearly declare the requestors expectation and the driver's support. For that, I think building the version into the protocol name makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this sentiment in general. However, this field is filled by COSI in some cases and we think it is better to have COSI fill an unfilled field rather than update an existing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the prototype and avoid the controversy of what binding logic will look like for now, I'm fine with removing version from prototype from all objects. For alpha/beta, we can discuss and reintroduce.
1. `finalizers`: added by COSI to defer `BucketRequest` deletion until backend deletion succeeds. | ||
1. `protocol.name`: (required) specifies the desired protocol. One of {“s3”, “gs”, or “azureBlob”}. | ||
1. `protocol.version`: (optional) specifies the desired version of the `protocol`. For "s3", a value of "v2" or "v4" could be used. | ||
1. `bucketPrefix`: (optional) prefix prepended to a generated new bucket name, eg. “yosemite-photos-". If `bucketInstanceName` is supplied then `bucketPrefix` is ignored because the request is for access to an existing bucket. If `bucketPrexix` is specified then it is added to the `Bucket` instance as a label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is bucketPrefix optional for greenfield? I think this section could be made a little easier to understand by clearly calling out when a parameter only applies to either greenfield or brownfield.
1. `protocol.version`: (optional) specifies the desired version of the `protocol`. For "s3", a value of "v2" or "v4" could be used. | ||
1. `bucketPrefix`: (optional) prefix prepended to a generated new bucket name, eg. “yosemite-photos-". If `bucketInstanceName` is supplied then `bucketPrefix` is ignored because the request is for access to an existing bucket. If `bucketPrexix` is specified then it is added to the `Bucket` instance as a label. | ||
1. `bucketClassName`: (optional for greenfield, omitted for brownfield) name of the `BucketClass` used to provision this request. If omitted for a greenfield bucket request, a default bucket class matching the protocol, if available, is used. If the greenfield bucket class is missing or does not support the requested protocol, an error is logged and the request is retried (with exponential backoff). A `BucketClass` is necessary for greenfield requests since BCs support a list of allowed namespaces. A `BucketClass` is not needed for brownfield requests since the `Bucket` instance, created by the admin, also contains `allowedNamespaces`. | ||
1. `bucketInstanceName`: (optional) name of the cluster-wide `Bucket` instance. If blank, then COSI assumes this is a greenfield request and will fill in the name during the binding step. If set by the user, then this names the `Bucket` instance created by the admin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is bucketInstanceName required for brownfield?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch. I'll fix the doc
spec: | ||
provisioner: [4] | ||
retentionPolicy: [5] | ||
anonymousAccessMode: [6] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear on why this is relevant here. Who will be looking at this field?
But this opens up a broader question - I assume this field in the class is a direction to the driver/backend to add anonymous access to the bucket. But we're not providing a mechanism to access a bucket anonymously, right? We should be clear about that - that the anonymous access is not access from within k8s - it is "out of band" access. Would clear up the confusion around this for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do NOT provide a mechanism to anonymously consume the bucket. We only provide a mechanism to provision with anonymous access. I will update the doc.
- "publicReadOnly": Read only, uncredentialed users can call ListBucket and GetObject. | ||
- "publicWriteOnly": Write only, uncredentialed users can only call PutObject. | ||
- "publicReadWrite": Read/Write, same as `ro` with the addition of PutObject being allowed. | ||
1. `bucketClassName`: Name of the associated bucket class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is either supplied by the controller (when greenfield) or ignored (brownfield), right? so shouldn't this be "optional"?
|
||
#### BucketClass | ||
|
||
An immutable, cluster-scoped, custom resource to provide admins control over the handling of bucket provisioning. The `BucketClass` (BC) defines a retention policy, driver specific parameters, and the provisioner name. A list of allowed namespaces can be specified to restrict new bucket creation and access to existing buckets. A default bucket class can be defined for each supported protocol. This allows the bucket class to be omitted from a `BucketRequest`. Relevant `BucketClass` fields are copied to the `Bucket` instance to handle the case of the BC being deleted or re-created. If an object store supports more than one protocol then the admin should create a `BucketClass` per protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the class really immutable? Specifically, are changes to "isDefaultBucketClass" disallowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is immutable except for "isDefaultBucketClass". I'll update doc
1. `bucketInstanceName`: name of the `Bucket` instance bound to this BA. | ||
1. `bucketAccessRequest`: an `objectReference` containing the name, namespace and UID of the associated `BucketAccessRequest`. | ||
1. `serviceAccount`: an `ObjectReference` containing the name, namespace and UID of the associated `BAR.serviceAccountName`. If empty then integrated Kubernetes -> cloud identity is not being used, in which case, `BucketAccess.principal` contains the user identity, which is minted by the provisioner. | ||
1. `mintedSecretName`: name of the provisioner-generated Secret containing access credentials. This Secret exists in the provisioner’s namespace, is read by the cosi-node-adapter, and written to the secret mount defined in the app pod's csi-driver spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional? only if serviceAccount is empty/null?
{“Effect”:“Allow”,“Action”:“s3:GetObject”,“Resource”:“arn:aws:s3:::profilepics/*“}, | ||
{“Effect”:“Deny”,“Action”:“s3:*“,”NotResource”:“arn:aws:s3:::profilepics/*“} | ||
``` | ||
1. `principal`: username/access-key for the object storage provider to uniquely identify the user who has access to this bucket. This value is minted by the provisioner (and set in the BA) when the `BucketAccessRequest` omits a ServiceAccount. There several use cases involving `principal`: 1) static/driverless brownfield where the admin creates the BA and fills in principal, so here principal is minted by the admin. 2) brownfield where COSI creates the `BucketAccess` and fills in `principal` based on the driver's minted principal. 3) the `BucketAccessRequest` specifies a ServiceAccount which is linked by the backend to a user id, and `principal` is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me many reads of this before I think I understand it. Let me give it a shot and see if I've got it right:
a. Not relevant if serviceAccount is supplied.
b. For brownfield cases when there is already a storage-side principal that you want to use for access, the admin can supply this, the driver will (possibly) enable access for that principal to the bucket in question and will return credentials.
c. For greenfield cases, this could be populated by the driver. Maybe so that the driver can request deletion of this principal during deprovisioning?
At any rate, some crisper explanation would be very useful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is accurate. I agree, clearer explanation is required here.
csi: [3] | ||
driver: cosi.sigs.k8s.io [4] | ||
volumeAttributes: [5] | ||
bucketAccessRequestName: <my-bar-name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any information on what is actually written to the volume. For example:
-
I would expect that we always need endpoint information written somewhere
-
I would expect credential information to be optional, depending on whether or not identity mapping is being used
-
I would expect that there is a well-defined location and formation for both of the above information, but that it is probably protocol-specific. It would be worth being at least partially explicit about this. Also, I've heard some mumbling about mapping this stuff to environment variables (I can't see how that would work given the possibility of having multiple of these volumes in a Pod), but if that's still the plan, it should be clear on how it is accomplished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endpoint and credential information will be passed in via a file, whose format is determined by the protocol. The provisioner is expected to fill the contents of the file appropriately with both endpoint and credential information. This is then mounted as a file into the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is determined by the protocol, wouldn't it make sense to document file names/locations for each protocol?
Also, I assume that environment variable mapping is not in scope, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 KEP should detail what files are surfaced where for which protocols.
|
||
When the app pod is started, the kubelet will gRPC call _NodeStageVolume_ and _NodePublishVolume_, which are received by the cosi-node-adapter. The pod waits until the adapter responds to the gRPC requests. The adapter ensures that the target bucket has been provisioned and is ready to be accessed. | ||
|
||
When the app pod terminates, the kubelet gRPC calls _NodeUnstageVolume_ and _NodeUnpublishVolume_, which the cosi-node-adapter also receives. The adapter orchestrates tear-down of bucket access resources, but does not delete the `BucketRequest` nor `Bucket` instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels SUPER janky to me. Based on this line (and a later comment), it sounds like you are saying that when a pod terminates, you delete a Kubernetes resource that was separately created by the user. The equivalent would be to delete a PVC when a Deployment is deleted. Maybe that solves some nasty edge case and is warranted, but I think it warrants a pretty well articulated rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an error in the doc. You're right, the termination of the pod does not delete BA* resources.
- When the pod terminates on its own, we do not delete any BA* resources
- If BAR is deleted, then we wait for pod to terminate before clearing BA* resources
- If BA is deleted, then we forcefully pull the creds from underneath the pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addresses my issues, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of open questions, but those are mostly around the details. I believe we've achieved consensus on the overall direction. Approving to unblock prototype. The expectation is that this KEP will be revised based on the prototype, and API review prior to an alpha implementation.
/lgtm
/approve
Adding a hold to let the authors decide when they want to merge.
/hold
spec: | ||
protocol: | ||
name: [3] | ||
version: [4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the prototype and avoid the controversy of what binding logic will look like for now, I'm fine with removing version from prototype from all objects. For alpha/beta, we can discuss and reintroduce.
- "publicReadOnly": Read only, uncredentialed users can call ListBucket and GetObject. | ||
- "publicWriteOnly": Write only, uncredentialed users can only call PutObject. | ||
- "publicReadWrite": Read/Write, same as `ro` with the addition of PutObject being allowed. | ||
> Note: `anonymousAccessMode` is intended for workloads to consume the bucket out of band, i.e. COSI does not provide a automated mechanism for a k8s workload to consume the bucket anonymously |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If COSI does not provide a way to consume the bucket anonymously, why does this even need to be a first class field? Why can't it be an opaque BucketClass parameter?
csi: [3] | ||
driver: cosi.sigs.k8s.io [4] | ||
volumeAttributes: [5] | ||
bucketAccessRequestName: <my-bar-name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 KEP should detail what files are surfaced where for which protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - great job @wlan0 @jeffvance @krishchow
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: guymguym, saad-ali, wlan0 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
- `azureBlob`: data required to target a provisioned azure container. | ||
- `gs`: data required to target a provisioned GS bucket. | ||
- `s3`: data required to target a provisioned S3 bucket. | ||
1. `parameters`: a copy of the BucketClass parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wlan0 @msau42 reminded me that PV/PVC actually handles this in a better way. Instead of copying all parameters over from the Class object (which may include lots of unnecessary information), the CSI CreateVolumeResponse
contains a set of volume_context
-- basically letting the driver decide what additional important info should be maintained about the volume. We should do something similar here. See https://github.com/container-storage-interface/spec/blob/master/csi.proto#L488
KEP for bucket provisioning API. Creating a new PR since #1383 hits the ratelimit too often
@jeffvance @brahmaroutu @rrati @krishchow @sajanjswl @xing-yang @saad-ali @guymguym @alarge