Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure-organization.sh followup #1737

Merged
merged 10 commits into from
Mar 11, 2021

Conversation

spiffxp
Copy link
Member

@spiffxp spiffxp commented Mar 1, 2021

Followup to #1726, intended to fix #1659

Basically:

  • drop StorageBucketLister
  • replace per-project ServiceAccountLister with org-level iam.serviceAccountLister (though still not actually sure why this role is necessary)
  • add functions to lib_iam to allow removal of stale bindings
  • lib_iam will only support creating custom roles from file
  • try diffing results of binding changes (adding yq to the mix to do this)
  • remove the stale e2e project binding removal calls introduced in audit-followup: remove stale iam bindings for e2e projects #1722

Order is going to matter for rolling this out:

  • ensure-organization.sh to create the org-level roles
  • terraform apply in prow-build / prow-build-trusted
  • check to see if terraform removed the project-level role
  • ensure-main-project.sh
  • followup PR to remove ensure_removed calls

There is one more custom role I am interested in creating, for the "cluster-admins" group, but I can do that in followup.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. wg/k8s-infra labels Mar 1, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from dims and nikhita March 1, 2021 09:23
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 1, 2021
@spiffxp spiffxp force-pushed the ensure-organization-followup branch 2 times, most recently from b58b175 to 018013a Compare March 2, 2021 02:31
infra/gcp/lib_iam.sh Outdated Show resolved Hide resolved
infra/gcp/lib_iam.sh Show resolved Hide resolved
infra/gcp/lib_util.sh Show resolved Hide resolved
infra/gcp/roles/iam.serviceAccountLister.yaml Outdated Show resolved Hide resolved
infra/gcp/roles/specs/iam.serviceAccountLister.yaml Outdated Show resolved Hide resolved
# not actually sure why this account is necessary but here we are
title: Secret Account Lister
description: Can list ServiceAccounts
name: iam.serviceAccountLister
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is is identical to the Role defined above in infra/gcp/roles/iam.serviceAccountLister.yaml ?? Why do we have both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I agree, it's silly in this case to have two files that look identical

but the idea is to consistently follow the pattern of:

  • custom roles are defined via a (custom) spec
  • specs generate concrete roles
  • custom role definitions are checked in to git (so we know when custom roles get new permissions)
  • then those get applied

this lets us apply common defaults, and moves us toward a model of "write a spec, have something else apply it"

we can skip the "check in concrete roles" step and go straight to applying, but at the moment I find it useful to have them checked in for diffing when regeneration picks up new p

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the non-spec is the readback? I'm looking at the script and trying to figure out what it's doing. It mixes the spec and the result of gcloud iam roles describe. It's not clear to me when that script is run or who runs it or when.

Copy link
Member Author

@spiffxp spiffxp Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the non-spec is the readback?

Yes. FWIW, these files can't be directly applied via gcloud, the name field needs munging depending on whether they're destined for project or organization. I felt like it was better not to mix that concern in here.

The script is currently run by humans (independent of the other ensure- scripts). Now that @hh has proven pr-creator to be reusable out of test-infra, the last followup item I have for #1659 is to setup prowjobs that:

  • run this on postsubmit and periodically
  • open PRs if something changed
  • a human would review the PR
  • a human would manually deploy the role change once the PR merged
  • they would run ensure-organization.sh as none of these are created at the project level

The reason for mixing the spec into comments of the generated role was to be 100% clear about what spec the role was generated from (vs. "this file", "this time", "this repo", "this hash" etc... but perhaps those would be a better / more concise approach to aim for)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am missing is why we check the readback in, and how that is different from the audit.

IOW, why is it more than

  • Here's a spec
  • Here's a script to apply the spec
  • Here's a periodic script to audit the results
  • Here's the last known-good audit state

If the periodic audit shows new perms, alert. I'm missing the role of the intermediate?

I trust you so I am totally willing to OK this and just grow my comprehension async to this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be right, this could be spurious and an unnecessary extra step.

I sorta want to keep this in as an experiment and remove later if we feel it's too annoying. Reasons why:

  • this might make it easier to vet audit output "here's what I absolutely expect it to be", and regenerating from primitive roles at time of audit might pick up new permissions I didn't know about last time I said "sure, refresh the role"
  • it may be easier for contributors to find out "ok but what exact permissions does this role have" even if they don't have iam permissions or gcloud handy

I guess I sort of think of this as the equivalent to a terraform state file. What was the state of the world the last time I managed it. Compare that to what the state of the world is now (did something else change things?) + the changes I intend to make to the state of the world.

spiffxp added 6 commits March 2, 2021 20:16
add more functions, use _ensure "private" helpers, try diffing results

-ensure_custom_iam_role_from_file
+ensure_custom_org_iam_role_from_file
+ensure_custom_project_iam_role_from_file
+ensure_removed_custom_org_role
+ensure_removed_custom_project_role
+custom_project_role_name
custom_org_role_name
@spiffxp spiffxp force-pushed the ensure-organization-followup branch from 018013a to 310431d Compare March 3, 2021 15:54
@k8s-ci-robot k8s-ci-robot added area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 3, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Mar 3, 2021

replace per-project ServiceAccountLister with org-level iam.serviceAccountLister (though still not actually sure why this role is necessary)

btw @thockin do you have the context for why we needed to make this role in the first place? I haven't plumbed the depths of aaa's creation to figure that out yet

@thockin
Copy link
Member

thockin commented Mar 3, 2021 via email

@spiffxp
Copy link
Member Author

spiffxp commented Mar 3, 2021

If that's the case, maybe we can drop it entirely (should be subsumed by custom audit.viewer role now)

@spiffxp
Copy link
Member Author

spiffxp commented Mar 3, 2021

If that's the case, maybe we can drop it entirely (should be subsumed by custom audit.viewer role now)

I don't have an account that's unprivileged enough to be affected by this so I don't want to try it as part of this PR, opened #1753

spiffxp added 2 commits March 3, 2021 16:40
order is going to matter for rolling this out:
- ensure-organization.sh to create the org-level roles
- terraform apply in prow-build / prow-build-trusted
- check to see if terraform removed the project-level role
- ensure-main-project.sh
- followup PR to remove ensure_removed calls
force all custom roles to be created from file
@spiffxp spiffxp force-pushed the ensure-organization-followup branch from 07eccc2 to 3aab780 Compare March 3, 2021 23:03
@spiffxp spiffxp changed the title [wip] ensure-organization.sh followup ensure-organization.sh followup Mar 3, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 3, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Mar 4, 2021

/hold
ok, wait, the latest audit pr does not line up with what I expect the state of the world to be

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Mar 9, 2021

OK, looking for LGTM. I'm no longer concerned about what the audit PRs are showing, I feel comfortable enough to deploy this, but I'm not going to remove /hold until I have time. Or someone else able to deploy is welcome to.

@dims
Copy link
Member

dims commented Mar 9, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Mar 11, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2021
@k8s-ci-robot k8s-ci-robot merged commit 4295316 into kubernetes:main Mar 11, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 11, 2021
@spiffxp spiffxp deleted the ensure-organization-followup branch March 13, 2021 17:28
@spiffxp
Copy link
Member Author

spiffxp commented Jun 11, 2021

Related to refactoring infra/gcp, ref: #516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

audit followup: organization resources should be managed by script
5 participants