-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Tinkerbell provider #1830
Refactor Tinkerbell provider #1830
Conversation
Signed-off-by: moadqassem <[email protected]>
Signed-off-by: moadqassem <[email protected]>
af54b25
to
ec017c7
Compare
pkg/cloudprovider/provider/baremetal/plugins/tinkerbell/client/hardware.go
Outdated
Show resolved
Hide resolved
pkg/cloudprovider/provider/baremetal/plugins/tinkerbell/client/template.go
Show resolved
Hide resolved
pkg/cloudprovider/provider/baremetal/plugins/tinkerbell/client/template.go
Outdated
Show resolved
Hide resolved
pkg/cloudprovider/provider/baremetal/plugins/tinkerbell/client/template.go
Outdated
Show resolved
Hide resolved
func createGrowPartitionAction(destDisk string) Action { | ||
return Action{ | ||
Name: "grow-partition", | ||
Image: "quay.io/tinkerbell/actions/cexec:latest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be pinned to c5bde803d9f6c90f1a9d5e06930d856d1481854c, which is the same as "latest" at the moment. Or is there a reason not to pin this image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? There is always enhancements that er com into latest and all our tinkerbell actions are using latest,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You pinned every single other image in every single other action. Except this one. Surely there had to be a reason. :) When I look at the code, I might now have accidentally broke it by thinking "oh, someone forgot to pin this, let me just fix that". So a comment why this one action is not pinned to a version would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you have overlooked something, while the tags are pinned for the all the images, due the the image repo and path are different 😉
quay.io/tinkerbell/actions/cexec:latest
quay.io/tinkerbell-actions/writefile:v1.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway if you really would like to have this in, I have no problem with that 😉. Will pin it down
ClusterName string | ||
OSImageURL string | ||
TinkClient ctrlruntimeclient.Client | ||
//KubeClient ctrlruntimeclient.Client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yeah, it is from the original PR. I just wanted to know what's the idea of this field. But this was at the beginning of my implementation, but since I have refactored quite a lot I can simply take it out.
ClusterName: tinkSpec.ClusterName.Value, | ||
TinkClient: tinkClient, | ||
HardwareRef: tinkSpec.HardwareRef, | ||
//KubeClient: k8sClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another leftover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yeah, it is from the original PR. I just wanted to know what's the idea of this field. But this was at the beginning of my implementation, but since I have refactored quite a lot I can simply take it out.
} | ||
|
||
if !allowProvision { | ||
log.Infow("server is not allowed to be provisioned; either hardware allowPXE and allowWorkflow is set tp false", "hardware", hardware.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tp -> to
|
||
if !allowProvision { | ||
log.Infow("server is not allowed to be provisioned; either hardware allowPXE and allowWorkflow is set tp false", "hardware", hardware.Name) | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning nil, nil here will make the baremetal provider return a baremetalServer struct with nil in it, which will then later explode because its GetName() functions are not capable of handling nils.
If a server cannot be provisioned, it should IMHO be an error that can bubble up and not hidden in the MC log where few people would ever look (especially not the user who doesn't have access to MC logs in KKP systems).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I agree this makes sense
result = append(result, e) | ||
workflow := &tinkv1alpha1.Workflow{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Name: workflowName + "-workflow", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please somehow make this single line re-usable. At the moment it's both an implementation detail, but also used by the Tinkerbell driver directly (i.e. it has knowledge it should not have).
Also, is it intended that "-workflow" is appended to the workflow name, or is the parameter for this function misnamed? In the driver, the server's name is used for this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can but honestly I took over this PR only yesterday and I cannot fix everything in one shot 😅 I can address that in up coming PRs
if hw.Hardware.Network == nil { | ||
return fmt.Errorf("tinkerbell hardware network configs can not be empty") | ||
// Delete the associated Workflow. | ||
workflowName := targetHardware.Name + "-workflow" // Assuming workflow names are derived from hardware names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is the spot I mean regarding leaking implementation logic)
} | ||
|
||
if hw.Hardware.Metadata == "" { | ||
return fmt.Errorf("tinkerbell hardware metadata can not be empty") | ||
// Reset the hardware ID in the tinkerbell cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ID and state, so it's more obvious that from this point on, the machine is also not Staged anymore.
if err := d.hardwareClient.Delete(ctx, string(uid)); err != nil { | ||
if resourceNotFoundErr(err) { | ||
return nil | ||
func GetConfig(driverConfig tinktypes.TinkerbellPluginSpec, aa func(configVar providerconfigtypes.ConfigVarString, envVarName string) (string, error)) (*tinktypes.Config, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is aa
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea tbh! part of the previous implementation(The PR that I rebased my changes on top, the one mentioned in the description). But I can take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok found out, will rename after its functionality
if driverConfig.Auth.Kubeconfig.Value != "" { | ||
val, err := base64.StdEncoding.DecodeString(driverConfig.Auth.Kubeconfig.Value) | ||
if err != nil { | ||
// An error here means that this is not a valid base64 string | ||
// We can be more explicit here with the error for visibility. Webhook will return this error if we hit this scenario. | ||
return nil, fmt.Errorf("failed to decode base64 encoded kubeconfig. Expected value is a base64 encoded Kubeconfig in JSON or YAML format: %w", err) | ||
} | ||
config.Kubeconfig = string(val) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to support this? Inlining kubeconfigs has never been a pleasant thing in KKP and I thought it's better practice to reference Secrets instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda agree with you tbh, and it is less error prune. Ok I will adjust that to be read from a secret instead of Inlining it. But can I do it in a follow up PR instead? I kinda need this image now 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a sideeffect from using providerconfigtypes.ConfigVarString for the field, but I would be totally okay with commenting that inline strings are not supported for this field.
func (h *Hardware) GetID() string { | ||
if h.Spec.Metadata != nil && | ||
h.Spec.Metadata.Instance != nil && | ||
h.Spec.Metadata.Instance.ID != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&& h.Spec.Metadata.Instance.ID != ""
is redundant and can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this redundant ? You can have a hardware object without an id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but if the ID is an empty string, you can just return it. Because the default would be to return an empty string anyway. You are adding a bit of unnecessary logic to this function. The function doesn't need to concern itself with how the ID field looks like, it should just return it if instance metadata exists, regardless of its value.
return string(h.Status.State) | ||
} | ||
|
||
return "Unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make comparisons easier, couldn't this be one of the constants? Now this function would need to be understood as "returns a proper State type [no matter that it's currently just a string], or the random string 'Unknown'".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True can be addressed as well
) | ||
|
||
const ( | ||
Staged string = "Staged" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dedicated type HardwareState string
would help identify what these constants are used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I added that then I took out as I actually see no real value other than the hassle of keep casting the value when comparing it with the API field. Especially the API doesn't know this HardwareState
or isn't considering as an Enum.
Signed-off-by: moadqassem <[email protected]>
Signed-off-by: moadqassem <[email protected]>
Signed-off-by: moadqassem <[email protected]>
Signed-off-by: moadqassem <[email protected]>
d36b200
to
4b80b0f
Compare
/approve |
LGTM label has been added. Git tree hash: be437d850b7526466e76ce80553cc8ea6adcb773
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xrstf The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
In a previous PR here, we started a major refactoring for the tinkerbell provider adding new features and removing deprecated ones. This PR however, address a lot of changes for the original one as the current author is on a long vacation and we need this changes in ASAP. For more details on the functionality and what points this PR amends please take a look at the previous PR and its review.
Which issue(s) this PR fixes:
Fixes #
What type of PR is this?
Special notes for your reviewer:
Does this PR introduce a user-facing change? Then add your Release Note here:
Documentation: