-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
populate-owners: Also slurp OWNERS_ALIASES #1285
populate-owners: Also slurp OWNERS_ALIASES #1285
Conversation
This is interesting work but I think I'd like to see a design doc and a little more planning first to make sure we've thought through the edges. |
Summarizing my initial comment:
Is that sufficient to motivate this approach? Or are you sold on the approach already, and just concerned about implementation details? |
A third approach would be to expand upstream aliases while copying over their |
95ea966
to
28d6c5c
Compare
Done with 95ea966 -> 3188c52. I've removed the WIP prefix from the PR subject. |
28d6c5c
to
3188c52
Compare
I think more detail is necessary to specify how we handle I'd also like to hear if it's possible to write this in Go and use the upstream types so we can load in the The original implementation was 90% functionality with 5% of the work -- as the effort gets more complex I'd like to see it adopt some more rigor and perhaps skip the "complex Python script" portion and jump straight into "manage upstream |
The implementation for that is here. Translating from Python, I try to leave upstream aliases alone. If and only if at least two upstream repositories use the same alias with different definitions, I namspace them by renaming to
I don't do anything to upstream
I think the complexity of unifying aliases rules out Bash. I looked into Go as a next choice, but Go doesn't have native shebang support, and this logic seemed too simple to warrant a full compiled command. If you have a non-Python scripting language you'd prefer, let me know. If you want a Go command that gets
I'm fine with this. But I'm also comfortable with the level of structure I have now in Python. So I'm happy to translate if you want, but I'm fine with this PR as it stands too. |
Right, I just meant that reading code is harder than reading a document of intent at the outset. I would like to see some human-readable documentation about what is being done to sync this (now that it is not just "move file A to location B") so that users can self-service debug without reading code.
The biggest thing in my opinion is who is to maintain the system. I personally see a lot of positives to having an ecosystem of Go utilities used for running the CI, with consistent coding practices, etc, between them. I assume you would like the DPTP team to own this sync once you have written it, so I think I would prefer the Go approach but I don't feel super strongly. I think the code as written is also approaching the level of complexity where it would be very valuable to have some unit tests and perhaps an integration test to validate functionality. |
I can add an English translation to docs somewhere in this repo. Do you have a preference for where those live? Maybe move the slurper under
I think the Python script will be fairly stable. And I'm fine with being in an I started in on a Go rewrite, but the upstream package contains no public structures for writing altered owners or alias content to disk. It seems unnecessary to outsource the reading if we need to keep local code for writing. Would you suggest I just stick to the Go stdlib, or would you rather have me work up some kind of writing API in a test-infra PR?
I'll add these regardless of the language we end up using. |
Reasonable thoughts -- let's keep in Python for now, add it under the |
3188c52
to
2777fc3
Compare
Heh, this is what I get for putting my head down for a few hours ;). I'd pretty much completed the Python -> Go port by the time I noticed your comment, so I've just gone ahead and finished that up with 3188c52 -> 2777fc3. There are also some unit tests, although once things involve GitHub-access mocking tests becomes more difficult. How much work do you think it's worth to get stable tests there? See the FIXME in For integration tests, I think it's "run the script, use Also, it looks like we already have aliases getting injected from openshift/cluster-api-provider-aws. |
|
||
// owners is copied from k8s.io/test-infra/prow/repoowners's Config | ||
type owners struct { | ||
Approvers []string `json:"approvers,omitempty" yaml:"approvers,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach has more structure than Python, but unfortunately the OWNERS
content doesn't seem to be particularly structured. I've based this structure on Config
, but there's also options
and filters
property declared in FullConfig
. And in this repo, there's a filters
consumer here. Some upstream OWNERS
also use features
and aliases
, and I haven't tracked down the origin of those. Are we worried about preserving those sorts of things during this copy? Should I declare new properties as we hit them, or should I just make owners
an interface{}
to ensure we preserve all upstream properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ignore filters
features
and aliases
since I think they are fairly broken upstream anyway.
If no upstreams define aliases, then the utility removes `OWNER_ALIASES` from the root of this repository. | ||
|
||
The utility also iterates through the `ci-operator/jobs/{organization}/{repository}` directories, writing `OWNERS` to reflect the upstream configuration. | ||
If the upstream did not have an `OWNERS` file, the utility removes the associated `ci-operator/jobs/{organization}/{repository}/OWNERS`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overwrite is fine but remove seems wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overwrite is fine but remove seems wrong
Really? If upstream adds an OWNERS
, and we slurp it, and then upstream removes their OWNERS
, I don't think we want to leave the stale upstream content in this repo. Of course, it would be best if upstream didn't drop their OWNERS
, but that's up to them. And if upstream does drop their OWNERS
, falling back to whoever owns the parent directory seems sane to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's much more common that instead of an upstream dropping their OWNERS, a downstream dir here gets something added before upstream catches up. In any case, this should be rare. As long as it's obvious in review what happened I think it should be ok
|
||
// owners is copied from k8s.io/test-infra/prow/repoowners's Config | ||
type owners struct { | ||
Approvers []string `json:"approvers,omitempty" yaml:"approvers,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ignore filters
features
and aliases
since I think they are fairly broken upstream anyway.
// owners is copied from k8s.io/test-infra/prow/repoowners's Config | ||
type owners struct { | ||
Approvers []string `json:"approvers,omitempty" yaml:"approvers,omitempty"` | ||
Reviewers []string `json:"reviewers,omitempty" yaml:"reviewers,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need the yaml
annotations here, if not possible with the YAML lib you imported, we could use the one that is prevalent in k8s -- github.com/ghodss/yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need the
yaml
annotations here...
I needed them to get omitempty
support. I'm fine switching YAML libraries if you prefer, but I like having my dependencies semantically versioned ;). And annotating for the two serialization syntaxes doesn't seem like a big enough deal to be worth preferring the wrapper package to the base implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/shrug
The ghodss library works great for k8s, if you version with a vendored dep instead of the import path it's fine. Minor nit either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ghodss library works great for k8s...
The ghodss library wraps gopkg.in/yaml.v2 (link in my earlier comment). Adding yaml
annotations here seems like a cheaper approach than adding however many additional lines of Go ghodss would bring in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. Like I said, it's a nit. fine as-is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having both json
and yaml
tags here mean there is a chance they might differ. k8s has had convention that yaml and json fieldnames should always be same.
I would suggest we keep json
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having both json and yaml tags here mean there is a chance they might differ.
Which folks could catch in review. But:
- These structs should be pretty stable. I don't foresee a lot of churn which could bring in those differences.
- These structs aren't public. So even if differences leak in, it shouldn't bother anyone.
"testing" | ||
) | ||
|
||
func assertEqual(t *testing.T, actual, expected interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you reflect.DeepEqual
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you
reflect.DeepEqual
here?
Because I wanted to print more useful error messages than a slice of pointer offsets ;). I could use github.com/stretchr/testify/assert
, but this didn't seem to be worth reaching outside the stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k8s apimachinery has a nice set of diff functions you can use as well
I think in general let's just use reflect
and not maintain extra code here. We can revisit if dev iteration is annoying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general let's just use
reflect
and not maintain extra code here. We can revisit if dev iteration is annoying
Done with f28cfdf -> eb35a31.
tools/populate-owners/main_test.go
Outdated
assertEqual(t, orgRepos, expected) | ||
} | ||
|
||
func TestGetOwners(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fake this out so it's stable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fake this out so it's stable?
Done with eb35a31 -> 8bcba67, which splits the post-clone processing into a tested extractOwners
method.
tools/populate-owners/main_test.go
Outdated
error *regexp.Regexp | ||
}{ | ||
{ | ||
name: "no collisions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the name a little more descriptive:
no alias name collisions, so no namespaced aliases created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make the name a little more descriptive...
Done (for this entry and other names in this test) with 2777fc3 -> a73bc68.
tools/populate-owners/main_test.go
Outdated
}{ | ||
{ | ||
name: "no collisions", | ||
input: []*orgRepo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can collapse some of this indentation to make it a bit more readable:
input: []*orgRepo{{
Organization: "a",
Repository: "b",
Aliases: &aliases{Aliases: map[string][]string{
"ab": {
"alice",
"bob",
},
}},
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can collapse some of this indentation...
Done with 2777fc3 -> a73bc68.
} | ||
|
||
assertEqual(t, collected, test.collected) | ||
if test.namespaced != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait. is namespaceAliases
mutating the input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
namespaceAliases
mutating the input?
Yes, because if we need to namespace an alias, we want to use the new, namespaced form when writing the upstream OWNERS
into this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either let's add godoc to explain the data flow or let's not mutate the input and just have clear input --> output flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either let's add godoc to explain the data flow...
Added with a73bc68 -> f28cfdf.
f28cfdf
to
eb35a31
Compare
eb35a31
to
4c2b655
Compare
4c2b655
to
8bcba67
Compare
Ok, I think I've addressed everything that's come up in review so far, excepting things @stevekuznetsov has commented on but said are fine as they stand. Let me know if I missed something and still need to make changes. And thanks for all the review :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM sans one comment
@@ -1,42 +0,0 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment -- can you make this script run your tool with the correct inputs and outputs? This keeps a consistent API for repo admins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... can you make this script run your tool with the correct inputs and outputs?
It doesn't really have inputs and outputs, you just run it with your $PWD
somewhere inside the repository (the path comes in here which gets fed to getRepoRoot
). But I've added a wrapper with 8bcba67 -> 9fbc825, which also:
-
Updates
OWNERS
andOWNERS_ALIASES
with a fresh run. -
Adds
os.O_TRUNC
to avoid occasional corruption like:diff --git a/ci-operator/jobs/openshift/azure-misc/OWNERS b/ci-operator/jobs/openshift/azure-misc/OWNERS index 132c684..bf0dec5 100644 --- a/ci-operator/jobs/openshift/azure-misc/OWNERS +++ b/ci-operator/jobs/openshift/azure-misc/OWNERS @@ -3,10 +3,11 @@ # See the OWNERS docs: https://git.k8s.io/community/contributors/guide/owners.md approvers: +- pweil- - jim-minter - kargakis - kwoodson -- mjudeikis +is - pweil- reviewers: - jim-minter
although the exact source of that corruption is not clear to me.
Some slurped repositories use OWNERS_ALIASES [1], so we need to pull these in to resolve entries in their OWNERS. OWNERS_ALIASES can only exist in the repo root [2]. That means we may need to namespace the upstream aliases to avoid collisions when assembing the composite OWNERS_ALIASES here. Defining aliases in upstream repos will be useful for situations where a particular repo team (e.g. the installer team) should have control over other directories within this repositories. For example, $ cat cluster/test-deploy/aws/OWNERS approvers: - installer-approvers reviewers: - installer-reviewers is simpler and more maintainable with an auto-maintained OWNERS_ALIASES than the copy/paste we used in 49f60b7 (cluster/test-deploy/aws/OWNERS: Delegate to openshift/installer, 2018-08-27, openshift#1290). With script control over OWNERS_ALIASES, *this* repo can't define its own aliases. But that doesn't seem like a large limitation, because this repo is unlikely to need aliases that are not defined in *any* of the upstream repositories. And if we need to grow a location for local alias configuration, we can always add that in the future. I'm using 'git clone...' over SSH to access the files, because that provides access to private repositories. Using: https://raw.githubusercontent.com/{organization}/{repository}/HEAD/{path} only works if you have an auth token or the repository is public. Using clone for two files is a bit heavy. Ideally we could pull just the two files we're interested. Git's archive command supports that use-case, but GitHub doesn't seem to have enabled the server side of that transaction: $ git archive --format=tar --remote ssh://[email protected]/openshift/installer.git HEAD OWNERS OWNERS_ALIASES Invalid command: 'git-upload-archive '/openshift/installer.git'' You appear to be using ssh to clone a git:// URL. Make sure your core.gitProxy config option and the GIT_PROXY_COMMAND environment variable are NOT set. fatal: The remote end hung up unexpectedly $ git --version git version 1.8.3.1 Until they enable that, I'm fine consuming more of their bandwidth than is strictly necessary ;). Using reflect in assertEqual gives us a nice, compact implementation. But the output can be a bit hard to read. With an introduced error: $ go test . ... --- FAIL: TestOrgRepos (0.00s) main_test.go:15: unexpected result: [0xc42006e1e0 0xc42006e240] != [0xc42006e2a0] --- FAIL: TestGetOwners (2.30s) main_test.go:15: unexpected result: &{Directories:[] Organization:openshift Repository:installer Owners:0xc42006e420 Aliases:<nil> Commit:aaa47f6a54f0fa0449d36de01ccb9f56729b608a} != &{Directories:[] Organiza ... I'd mocked up an assertEqual based on comparing json.MarshalIndent strings which produced: --- FAIL: TestOrgRepos (0.00s) main_test.go:30: unexpected result: [ { "directories": [ "/tmp/populate-owners-389376240/a/b" ], "organization": "a", "repository": "b" }, { "directories": [ "/tmp/populate-owners-389376240/c/d" ], "organization": "c", "repository": "d" } ] != [ { "directories": [ "/tmp/populate-owners-389376240/a/b" ], "organization": "a", "repository": "b" } ] --- FAIL: TestGetOwners (2.38s) main_test.go:30: unexpected result: { "organization": "openshift", "repository": "installer", "owners": { "approvers": [ "aaronlevy", "abhinavdahiya", "crawford", "smarterclayton", "wking", "yifan-gu" ], "reviewers": [ "vikramsk" ] }, "commit": "aaa47f6a54f0fa0449d36de01ccb9f56729b608a" } != { "organization": "openshift", "repository": "installer", "owners": { "approvers": [ "aaronlevy", "abhinavdahiya", "crawford", "smarterclayton", "wking", "yifan-gu" ], "reviewers": [ "vikramsk" ] }, "commit": "2587b3ed493c18747f2b37e1ab6daebdd277631a" } and of course, there are a number of third-party libraries that do an even better job (e.g. [3]). But I don't think this functionality is worth an external dependency, and Steve doesn't think it's worth maintaining much local code for comparison [4], so we're going with the more compact reflect approach despite the less-useful output. And the tests should all be passing anyway, right? ;) The GIT_*_DATE tick business is based on [5]. I haven't bothered running a new commit after each test.setup change, because we don't need a different commit hash. In production, extractOwners will be run on a fresh clone, so we will always have a different commit hash if the content changes. The stub populate-owners.sh wraps the 'go run' at Steve's request [6]. We no longer need anything Bash-specific, so I'm using a POSIX shebang in the wrapper. The 'exec' allows us to replace the shell command with Go without executing a new process [7] (although Go itself will proceed to launch a few processes). The O_TRUNC is an attempt to avoid occasional corruption like: diff --git a/ci-operator/jobs/openshift/azure-misc/OWNERS b/ci-operator/jobs/openshift/azure-misc/OWNERS index 132c684..bf0dec5 100644 --- a/ci-operator/jobs/openshift/azure-misc/OWNERS +++ b/ci-operator/jobs/openshift/azure-misc/OWNERS @@ -3,10 +3,11 @@ # See the OWNERS docs: https://git.k8s.io/community/contributors/guide/owners.md approvers: +- pweil- - jim-minter - kargakis - kwoodson -- mjudeikis +is - pweil- reviewers: - jim-minter although the exact source of that corruption is not clear to me. [1]: https://github.com/openshift/cluster-api-provider-aws/blob/44e974de04f496f9552ecd37b73cad01b6d69f4d/OWNERS_ALIASES [2]: https://github.com/kubernetes/community/blame/0741bdfbb56dcd4829754560f79a2f3da32cb34f/contributors/guide/owners.md#L54 [3]: https://godoc.org/github.com/stretchr/testify/assert#Equal [4]: openshift#1285 (comment) [5]: https://github.com/git/git/blob/v2.18.0/t/test-lib-functions.sh#L128-L138 [6]: openshift#1285 (comment) [7]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#exec
8bcba67
to
9fbc825
Compare
9fbc825
to
940f45a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stevekuznetsov, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…OWNERS The old shell script used to do this, but I'd accidentally dropped the functionality in e1f993f (populate-owners: Also slurp OWNERS_ALIASES, 2018-08-25, openshift#1285).
This will make it easier to add special-case OWNERS files to openshift/release while still getting the members slurped up with the populate-owners tooling. That tooling pulls OWNERS_ALIASES since openshift/release@e1f993fb (populate-owners: Also slurp OWNERS_ALIASES, 2018-08-25, openshift/release#1285).
The previous content was copy/pasted into this repo with 49f60b7 (cluster/test-deploy/aws/OWNERS: Delegate to openshift/installer, 2018-08-27, openshift#1290). But with upstream OWNERS_ALIASES being slurped since e1f993f (populate-owners: Also slurp OWNERS_ALIASES, 2018-08-25, openshift#1285) and the installer repo defining aliases since openshift/installer@62f87acb (OWNERS_ALIASES: Define aliases for reviewers and approvers, 2018-08-24, openshift/installer#184), we can DRY this up by using the upstream aliases.
As discussed in #1601, pulling all projects serially is slow, because even with shallow clone you're pulling all the files in the HEAD tree. As discussed in e1f993f (populate-owners: Also slurp OWNERS_ALIASES, 2018-08-25, openshift#1285), GitHub does not enable git-upload-archive, so we can't use the Git protocol to select specific files. One way to speed things up would be to allow fetching subsets of projects, but that risks corrupting OWNERS_ALIASES. If the utility only fetched some repositories, it would have an incomplete picture of aliases. For example, say the repository contained ci-operator/jobs/a/b and ci-operator/jobs/c/d, both of which defined aliases. A full run would result in aliases for both a/b and c/d in OWNERS_ALIASES. A subsequent run with only a/b would update any a/b aliases, but all other aliases, including those which had previously been injected by c/d, would be removed. You could work around that by caching information about which repositories were involved in OWNERS_ALIASES, but it would be a bit complicated. This commit, on the other hand, takes advantage of the fact that most of our repositories are public. It optimistically attempts an efficient HTTP pull, and only falls back to the shallow clone if the efficient pull fails. That's slightly more work for private repositories, where the HTTP attempt is wasted effort, but it's much more efficient for public repos. Another potential issue is consuming your API quota of 60 unauthenticated requests per hour [1]. But if you hit that limit, we just fall back to Git, so it's not the end of the world. Another alterantive would be to pull in parallel (as we used to before e1f993f), but that still has the bandwidth cost of shifting lots of files that we don't care about for this use case. You could save some bandwidth by caching blobs locally, but then you'd have the complication of cache management. The "got owners ..." message is because the HTTP pulls don't produce any terminal output. The Git pulls write status updates to stderr, but now that it might be a while between Git pulls, it's good to let the caller know something is happening ;). Docs for the commit endpoint I'm hitting are in [2]. [1]: https://developer.github.com/v3/#rate-limiting [2]: https://developer.github.com/v3/repos/commits/#get-the-sha-1-of-a-commit-reference
If slurped repositories use
OWNERS_ALIASES
, we'll need to pull these in to resolve entries in theirOWNERS
.OWNERS_ALIASES
can only exist in the repo root. That means we may need to namespace the upstream aliases to avoid collisions when assembing the compositeOWNERS_ALIASES
here.Defining aliases in upstream repos will be useful for situations where a particular repo team (e.g. the installer team) should have control over other directories within this repositories (e.g.
cluster/test-deploy/aws
). Having:is simpler and more maintainable with auto-maintained
OWNERS_ALIASES
than it would be if we manually copy/pasted lists of GitHub usernames.With script control over
OWNERS_ALIASES
, this repo can't define its own aliases. But that doesn't seem like a large limitation, because this repo is unlikely to need aliases that are not defined in any of the upstream repositories.This is still a WIP, because the raw.githubusercontent.com doesn't work out of the box with private repositories. I'll probably return those to using Git over SSH.
CC @stevekuznetsov.