Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Tinkerbell provider #1830

Merged

Conversation

moadqassem
Copy link
Member

@moadqassem moadqassem commented Jul 25, 2024

What this PR does / why we need it:
In a previous PR here, we started a major refactoring for the tinkerbell provider adding new features and removing deprecated ones. This PR however, address a lot of changes for the original one as the current author is on a long vacation and we need this changes in ASAP. For more details on the functionality and what points this PR amends please take a look at the previous PR and its review.
Which issue(s) this PR fixes:

Fixes #

What type of PR is this?

Special notes for your reviewer:

Does this PR introduce a user-facing change? Then add your Release Note here:

Refactor the tinkerbell provider

Documentation:

None

@kubermatic-bot kubermatic-bot added do-not-merge/docs-needed Indicates that a PR should not merge because it's missing one of the documentation labels. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. labels Jul 25, 2024
@kubermatic-bot kubermatic-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 25, 2024
@moadqassem moadqassem requested a review from xrstf July 25, 2024 15:07
@kubermatic-bot kubermatic-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. docs/none Denotes a PR that doesn't need documentation (changes). and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/docs-needed Indicates that a PR should not merge because it's missing one of the documentation labels. labels Jul 25, 2024
@moadqassem moadqassem force-pushed the refactor-tinkerbell-provider branch from af54b25 to ec017c7 Compare July 25, 2024 15:19
func createGrowPartitionAction(destDisk string) Action {
return Action{
Name: "grow-partition",
Image: "quay.io/tinkerbell/actions/cexec:latest",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be pinned to c5bde803d9f6c90f1a9d5e06930d856d1481854c, which is the same as "latest" at the moment. Or is there a reason not to pin this image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? There is always enhancements that er com into latest and all our tinkerbell actions are using latest,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You pinned every single other image in every single other action. Except this one. Surely there had to be a reason. :) When I look at the code, I might now have accidentally broke it by thinking "oh, someone forgot to pin this, let me just fix that". So a comment why this one action is not pinned to a version would be appreciated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you have overlooked something, while the tags are pinned for the all the images, due the the image repo and path are different 😉

quay.io/tinkerbell/actions/cexec:latest
quay.io/tinkerbell-actions/writefile:v1.0.0

Copy link
Member Author

@moadqassem moadqassem Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway if you really would like to have this in, I have no problem with that 😉. Will pin it down

ClusterName string
OSImageURL string
TinkClient ctrlruntimeclient.Client
//KubeClient ctrlruntimeclient.Client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yeah, it is from the original PR. I just wanted to know what's the idea of this field. But this was at the beginning of my implementation, but since I have refactored quite a lot I can simply take it out.

ClusterName: tinkSpec.ClusterName.Value,
TinkClient: tinkClient,
HardwareRef: tinkSpec.HardwareRef,
//KubeClient: k8sClient,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another leftover

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yeah, it is from the original PR. I just wanted to know what's the idea of this field. But this was at the beginning of my implementation, but since I have refactored quite a lot I can simply take it out.

}

if !allowProvision {
log.Infow("server is not allowed to be provisioned; either hardware allowPXE and allowWorkflow is set tp false", "hardware", hardware.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tp -> to


if !allowProvision {
log.Infow("server is not allowed to be provisioned; either hardware allowPXE and allowWorkflow is set tp false", "hardware", hardware.Name)
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning nil, nil here will make the baremetal provider return a baremetalServer struct with nil in it, which will then later explode because its GetName() functions are not capable of handling nils.

If a server cannot be provisioned, it should IMHO be an error that can bubble up and not hidden in the MC log where few people would ever look (especially not the user who doesn't have access to MC logs in KKP systems).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I agree this makes sense

result = append(result, e)
workflow := &tinkv1alpha1.Workflow{
ObjectMeta: metav1.ObjectMeta{
Name: workflowName + "-workflow",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please somehow make this single line re-usable. At the moment it's both an implementation detail, but also used by the Tinkerbell driver directly (i.e. it has knowledge it should not have).

Also, is it intended that "-workflow" is appended to the workflow name, or is the parameter for this function misnamed? In the driver, the server's name is used for this parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can but honestly I took over this PR only yesterday and I cannot fix everything in one shot 😅 I can address that in up coming PRs

if hw.Hardware.Network == nil {
return fmt.Errorf("tinkerbell hardware network configs can not be empty")
// Delete the associated Workflow.
workflowName := targetHardware.Name + "-workflow" // Assuming workflow names are derived from hardware names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is the spot I mean regarding leaking implementation logic)

}

if hw.Hardware.Metadata == "" {
return fmt.Errorf("tinkerbell hardware metadata can not be empty")
// Reset the hardware ID in the tinkerbell cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID and state, so it's more obvious that from this point on, the machine is also not Staged anymore.

if err := d.hardwareClient.Delete(ctx, string(uid)); err != nil {
if resourceNotFoundErr(err) {
return nil
func GetConfig(driverConfig tinktypes.TinkerbellPluginSpec, aa func(configVar providerconfigtypes.ConfigVarString, envVarName string) (string, error)) (*tinktypes.Config, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is aa?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea tbh! part of the previous implementation(The PR that I rebased my changes on top, the one mentioned in the description). But I can take a look

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok found out, will rename after its functionality

Comment on lines +179 to +187
if driverConfig.Auth.Kubeconfig.Value != "" {
val, err := base64.StdEncoding.DecodeString(driverConfig.Auth.Kubeconfig.Value)
if err != nil {
// An error here means that this is not a valid base64 string
// We can be more explicit here with the error for visibility. Webhook will return this error if we hit this scenario.
return nil, fmt.Errorf("failed to decode base64 encoded kubeconfig. Expected value is a base64 encoded Kubeconfig in JSON or YAML format: %w", err)
}
config.Kubeconfig = string(val)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to support this? Inlining kubeconfigs has never been a pleasant thing in KKP and I thought it's better practice to reference Secrets instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree with you tbh, and it is less error prune. Ok I will adjust that to be read from a secret instead of Inlining it. But can I do it in a follow up PR instead? I kinda need this image now 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a sideeffect from using providerconfigtypes.ConfigVarString for the field, but I would be totally okay with commenting that inline strings are not supported for this field.

func (h *Hardware) GetID() string {
if h.Spec.Metadata != nil &&
h.Spec.Metadata.Instance != nil &&
h.Spec.Metadata.Instance.ID != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& h.Spec.Metadata.Instance.ID != "" is redundant and can be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this redundant ? You can have a hardware object without an id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but if the ID is an empty string, you can just return it. Because the default would be to return an empty string anyway. You are adding a bit of unnecessary logic to this function. The function doesn't need to concern itself with how the ID field looks like, it should just return it if instance metadata exists, regardless of its value.

return string(h.Status.State)
}

return "Unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make comparisons easier, couldn't this be one of the constants? Now this function would need to be understood as "returns a proper State type [no matter that it's currently just a string], or the random string 'Unknown'".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True can be addressed as well

)

const (
Staged string = "Staged"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dedicated type HardwareState string would help identify what these constants are used for.

Copy link
Member Author

@moadqassem moadqassem Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I added that then I took out as I actually see no real value other than the hassle of keep casting the value when comparing it with the API field. Especially the API doesn't know this HardwareState or isn't considering as an Enum.

Signed-off-by: moadqassem <[email protected]>
Signed-off-by: moadqassem <[email protected]>
Signed-off-by: moadqassem <[email protected]>
@moadqassem moadqassem force-pushed the refactor-tinkerbell-provider branch from d36b200 to 4b80b0f Compare July 25, 2024 18:04
@xrstf
Copy link
Contributor

xrstf commented Jul 25, 2024

/approve
/lgtm

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2024
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: be437d850b7526466e76ce80553cc8ea6adcb773

@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xrstf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubermatic-bot kubermatic-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2024
@kubermatic-bot kubermatic-bot merged commit d89c4fb into kubermatic:main Jul 25, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. docs/none Denotes a PR that doesn't need documentation (changes). lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants