Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

populate-owners: Also slurp OWNERS_ALIASES #1285

Merged

Conversation

wking
Copy link
Member

@wking wking commented Aug 25, 2018

If slurped repositories use OWNERS_ALIASES, we'll need to pull these in to resolve entries in their OWNERS. OWNERS_ALIASES can only exist in the repo root. That means we may need to namespace the upstream aliases to avoid collisions when assembing the composite OWNERS_ALIASES here.

Defining aliases in upstream repos will be useful for situations where a particular repo team (e.g. the installer team) should have control over other directories within this repositories (e.g. cluster/test-deploy/aws). Having:

$ cat cluster/test-deploy/aws/OWNERS
approvers:
  - installer-approvers
reviewers:
  - installer-reviewers

is simpler and more maintainable with auto-maintained OWNERS_ALIASES than it would be if we manually copy/pasted lists of GitHub usernames.

With script control over OWNERS_ALIASES, this repo can't define its own aliases. But that doesn't seem like a large limitation, because this repo is unlikely to need aliases that are not defined in any of the upstream repositories.

This is still a WIP, because the raw.githubusercontent.com doesn't work out of the box with private repositories. I'll probably return those to using Git over SSH.

CC @stevekuznetsov.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 25, 2018
@stevekuznetsov
Copy link
Contributor

This is interesting work but I think I'd like to see a design doc and a little more planning first to make sure we've thought through the edges.

@wking
Copy link
Member Author

wking commented Aug 27, 2018

... I think I'd like to see a design doc and a little more planning first to make sure we've thought through the edges.

Summarizing my initial comment:

  • Current master:
    • We pull upstream OWNERS
    • We do not pull upstream OWNERS_ALIASES
  • Problems with that:
    • An upstream repo that uses aliases in OWNERS will have those aliases interpreted as bare usernames in this repo.
    • Thre's no DRY way to delegate additional directories (besides ci-operator/jobs/{org}/{repo}) to groups maintained upstream.
  • Potential fixes:
    • Pull upstream OWNERS_ALIASES (this PR).
    • Forbid upstream repos from using OWNERS_ALIASES, but this restricts their ability to stay DRY with internal subgroup delegation.

Is that sufficient to motivate this approach? Or are you sold on the approach already, and just concerned about implementation details?

@wking
Copy link
Member Author

wking commented Aug 27, 2018

A third approach would be to expand upstream aliases while copying over their OWNERS. That would address the "aliases interpreted as usernames" problem, but not the "DRY additional directory delegation" issue.

@wking wking changed the title wip: populate-owners: Also slurp OWNERS_ALIASES populate-owners: Also slurp OWNERS_ALIASES Aug 27, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 27, 2018
@wking wking force-pushed the slurp-owners-aliases-too branch from 95ea966 to 28d6c5c Compare August 27, 2018 17:45
@wking
Copy link
Member Author

wking commented Aug 27, 2018

This is still a WIP, because the raw.githubusercontent.com doesn't work out of the box with private repositories. I'll probably return those to using Git over SSH.

Done with 95ea966 -> 3188c52. I've removed the WIP prefix from the PR subject.

@stevekuznetsov
Copy link
Contributor

I think more detail is necessary to specify how we handle OWNERS_ALIASES name collisions, filter lists in upstream OWNERS files, and perhaps some others.

I'd also like to hear if it's possible to write this in Go and use the upstream types so we can load in the OWNERS from upstream and manage the data in a more structured way if possible. Or, alternatively, why that approach would not be viable.

The original implementation was 90% functionality with 5% of the work -- as the effort gets more complex I'd like to see it adopt some more rigor and perhaps skip the "complex Python script" portion and jump straight into "manage upstream OWNERS using native types" portion. I agree disallowing OWNERS_ALIASES from upstream is not valid.

@wking
Copy link
Member Author

wking commented Aug 27, 2018

I think more detail is necessary to specify how we handle OWNERS_ALIASES name collisions...

The implementation for that is here. Translating from Python, I try to leave upstream aliases alone. If and only if at least two upstream repositories use the same alias with different definitions, I namspace them by renaming to {upstream-org}-{upstream-repo}-{upstream-alias}. If that rename collides with an existing alias, I abort with an error (but I think that collisions there would be unlikely).

... filter lists in upstream OWNERS files...

I don't do anything to upstream OWNERS files other than potentially namespacing aliases as discussed above. They are washed through Python, so things like key order, indents, and # comments may change.

I'd also like to hear if it's possible to write this in Go...

I think the complexity of unifying aliases rules out Bash. I looked into Go as a next choice, but Go doesn't have native shebang support, and this logic seemed too simple to warrant a full compiled command. If you have a non-Python scripting language you'd prefer, let me know. If you want a Go command that gets go run ..., I can do that to.

... and use the upstream types so we can load in the OWNERS from upstream and manage the data in a more structured way if possible.

I'm fine with this. But I'm also comfortable with the level of structure I have now in Python. So I'm happy to translate if you want, but I'm fine with this PR as it stands too.

@stevekuznetsov
Copy link
Contributor

The implementation for that is here. Translating from Python ....

Right, I just meant that reading code is harder than reading a document of intent at the outset. I would like to see some human-readable documentation about what is being done to sync this (now that it is not just "move file A to location B") so that users can self-service debug without reading code.

But I'm also comfortable with the level of structure I have now in Python. So I'm happy to translate if you want, but I'm fine with this PR as it stands too.

The biggest thing in my opinion is who is to maintain the system. I personally see a lot of positives to having an ecosystem of Go utilities used for running the CI, with consistent coding practices, etc, between them. I assume you would like the DPTP team to own this sync once you have written it, so I think I would prefer the Go approach but I don't feel super strongly.

I think the code as written is also approaching the level of complexity where it would be very valuable to have some unit tests and perhaps an integration test to validate functionality.

@wking
Copy link
Member Author

wking commented Aug 28, 2018

The implementation for that is here. Translating from Python ....

Right, I just meant that reading code is harder than reading a document of intent at the outset.

I can add an English translation to docs somewhere in this repo. Do you have a preference for where those live? Maybe move the slurper under tools/populate-owners and add a README there?

I personally see a lot of positives to having an ecosystem of Go utilities used for running the CI, with consistent coding practices, etc, between them. I assume you would like the DPTP team to own this sync once you have written it, so I think I would prefer the Go approach but I don't feel super strongly.

I think the Python script will be fairly stable. And I'm fine with being in an OWNERS file covering the slurper, whichever language it ends up being written in. Still, it's good to have more than one owner, so I'm fine iterating until we hit on something we both want to maintain ;).

I started in on a Go rewrite, but the upstream package contains no public structures for writing altered owners or alias content to disk. It seems unnecessary to outsource the reading if we need to keep local code for writing. Would you suggest I just stick to the Go stdlib, or would you rather have me work up some kind of writing API in a test-infra PR?

I think the code as written is also approaching the level of complexity where it would be very valuable to have some unit tests and perhaps an integration test to validate functionality.

I'll add these regardless of the language we end up using.

@stevekuznetsov
Copy link
Contributor

Reasonable thoughts -- let's keep in Python for now, add it under the tools/ directory like you suggest and add you and I as OWNERS. A README there would be fine, too.

@wking wking force-pushed the slurp-owners-aliases-too branch from 3188c52 to 2777fc3 Compare August 28, 2018 22:11
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 28, 2018
@wking
Copy link
Member Author

wking commented Aug 28, 2018

...let's keep in Python for now...

Heh, this is what I get for putting my head down for a few hours ;). I'd pretty much completed the Python -> Go port by the time I noticed your comment, so I've just gone ahead and finished that up with 3188c52 -> 2777fc3.

There are also some unit tests, although once things involve GitHub-access mocking tests becomes more difficult. How much work do you think it's worth to get stable tests there? See the FIXME in TestGetOwners for an example, and let me know if you want me to drop that test, work up a more stable mock, or what.

For integration tests, I think it's "run the script, use git diff to see if the changes make sense, and git reset --hard HEAD if you don't like them". That's what I've been using in development anyway ;).

Also, it looks like we already have aliases getting injected from openshift/cluster-api-provider-aws.


// owners is copied from k8s.io/test-infra/prow/repoowners's Config
type owners struct {
Approvers []string `json:"approvers,omitempty" yaml:"approvers,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has more structure than Python, but unfortunately the OWNERS content doesn't seem to be particularly structured. I've based this structure on Config, but there's also options and filters property declared in FullConfig. And in this repo, there's a filters consumer here. Some upstream OWNERS also use features and aliases, and I haven't tracked down the origin of those. Are we worried about preserving those sorts of things during this copy? Should I declare new properties as we hit them, or should I just make owners an interface{} to ensure we preserve all upstream properties?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ignore filters features and aliases since I think they are fairly broken upstream anyway.

If no upstreams define aliases, then the utility removes `OWNER_ALIASES` from the root of this repository.

The utility also iterates through the `ci-operator/jobs/{organization}/{repository}` directories, writing `OWNERS` to reflect the upstream configuration.
If the upstream did not have an `OWNERS` file, the utility removes the associated `ci-operator/jobs/{organization}/{repository}/OWNERS`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overwrite is fine but remove seems wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overwrite is fine but remove seems wrong

Really? If upstream adds an OWNERS, and we slurp it, and then upstream removes their OWNERS, I don't think we want to leave the stale upstream content in this repo. Of course, it would be best if upstream didn't drop their OWNERS, but that's up to them. And if upstream does drop their OWNERS, falling back to whoever owns the parent directory seems sane to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's much more common that instead of an upstream dropping their OWNERS, a downstream dir here gets something added before upstream catches up. In any case, this should be rare. As long as it's obvious in review what happened I think it should be ok


// owners is copied from k8s.io/test-infra/prow/repoowners's Config
type owners struct {
Approvers []string `json:"approvers,omitempty" yaml:"approvers,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ignore filters features and aliases since I think they are fairly broken upstream anyway.

// owners is copied from k8s.io/test-infra/prow/repoowners's Config
type owners struct {
Approvers []string `json:"approvers,omitempty" yaml:"approvers,omitempty"`
Reviewers []string `json:"reviewers,omitempty" yaml:"reviewers,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need the yaml annotations here, if not possible with the YAML lib you imported, we could use the one that is prevalent in k8s -- github.com/ghodss/yaml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need the yaml annotations here...

I needed them to get omitempty support. I'm fine switching YAML libraries if you prefer, but I like having my dependencies semantically versioned ;). And annotating for the two serialization syntaxes doesn't seem like a big enough deal to be worth preferring the wrapper package to the base implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/shrug

The ghodss library works great for k8s, if you version with a vendored dep instead of the import path it's fine. Minor nit either way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ghodss library works great for k8s...

The ghodss library wraps gopkg.in/yaml.v2 (link in my earlier comment). Adding yaml annotations here seems like a cheaper approach than adding however many additional lines of Go ghodss would bring in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. Like I said, it's a nit. fine as-is

Copy link
Contributor

@abhinavdahiya abhinavdahiya Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having both json and yaml tags here mean there is a chance they might differ. k8s has had convention that yaml and json fieldnames should always be same.

I would suggest we keep json only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having both json and yaml tags here mean there is a chance they might differ.

Which folks could catch in review. But:

  • These structs should be pretty stable. I don't foresee a lot of churn which could bring in those differences.
  • These structs aren't public. So even if differences leak in, it shouldn't bother anyone.

"testing"
)

func assertEqual(t *testing.T, actual, expected interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you reflect.DeepEqual here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you reflect.DeepEqual here?

Because I wanted to print more useful error messages than a slice of pointer offsets ;). I could use github.com/stretchr/testify/assert, but this didn't seem to be worth reaching outside the stdlib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s apimachinery has a nice set of diff functions you can use as well
I think in general let's just use reflect and not maintain extra code here. We can revisit if dev iteration is annoying

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general let's just use reflect and not maintain extra code here. We can revisit if dev iteration is annoying

Done with f28cfdf -> eb35a31.

assertEqual(t, orgRepos, expected)
}

func TestGetOwners(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fake this out so it's stable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fake this out so it's stable?

Done with eb35a31 -> 8bcba67, which splits the post-clone processing into a tested extractOwners method.

error *regexp.Regexp
}{
{
name: "no collisions",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the name a little more descriptive:

no alias name collisions, so no namespaced aliases created

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the name a little more descriptive...

Done (for this entry and other names in this test) with 2777fc3 -> a73bc68.

}{
{
name: "no collisions",
input: []*orgRepo{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can collapse some of this indentation to make it a bit more readable:

			input: []*orgRepo{{
				Organization: "a",
				Repository:   "b",
				Aliases: &aliases{Aliases: map[string][]string{
					"ab": {
						"alice",
						"bob",
					},
				}},
			},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can collapse some of this indentation...

Done with 2777fc3 -> a73bc68.

}

assertEqual(t, collected, test.collected)
if test.namespaced != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait. is namespaceAliases mutating the input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is namespaceAliases mutating the input?

Yes, because if we need to namespace an alias, we want to use the new, namespaced form when writing the upstream OWNERS into this repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either let's add godoc to explain the data flow or let's not mutate the input and just have clear input --> output flow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either let's add godoc to explain the data flow...

Added with a73bc68 -> f28cfdf.

@openshift-ci-robot openshift-ci-robot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Aug 29, 2018
@wking wking force-pushed the slurp-owners-aliases-too branch 3 times, most recently from f28cfdf to eb35a31 Compare August 29, 2018 18:59
@wking wking force-pushed the slurp-owners-aliases-too branch from eb35a31 to 4c2b655 Compare August 29, 2018 19:37
@wking wking force-pushed the slurp-owners-aliases-too branch from 4c2b655 to 8bcba67 Compare August 29, 2018 19:40
@wking
Copy link
Member Author

wking commented Aug 29, 2018

Ok, I think I've addressed everything that's come up in review so far, excepting things @stevekuznetsov has commented on but said are fine as they stand. Let me know if I missed something and still need to make changes. And thanks for all the review :).

Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM sans one comment

@@ -1,42 +0,0 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment -- can you make this script run your tool with the correct inputs and outputs? This keeps a consistent API for repo admins

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... can you make this script run your tool with the correct inputs and outputs?

It doesn't really have inputs and outputs, you just run it with your $PWD somewhere inside the repository (the path comes in here which gets fed to getRepoRoot ). But I've added a wrapper with 8bcba67 -> 9fbc825, which also:

  • Updates OWNERS and OWNERS_ALIASES with a fresh run.

  • Adds os.O_TRUNC to avoid occasional corruption like:

    diff --git a/ci-operator/jobs/openshift/azure-misc/OWNERS b/ci-operator/jobs/openshift/azure-misc/OWNERS
    index 132c684..bf0dec5 100644
    --- a/ci-operator/jobs/openshift/azure-misc/OWNERS
    +++ b/ci-operator/jobs/openshift/azure-misc/OWNERS
    @@ -3,10 +3,11 @@
     # See the OWNERS docs: https://git.k8s.io/community/contributors/guide/owners.md
    
     approvers:
    +- pweil-
     - jim-minter
     - kargakis
     - kwoodson
    -- mjudeikis
    +is
     - pweil-
     reviewers:
     - jim-minter

    although the exact source of that corruption is not clear to me.

Some slurped repositories use OWNERS_ALIASES [1], so we need to pull
these in to resolve entries in their OWNERS.  OWNERS_ALIASES can only
exist in the repo root [2].  That means we may need to namespace the
upstream aliases to avoid collisions when assembing the composite
OWNERS_ALIASES here.

Defining aliases in upstream repos will be useful for situations where
a particular repo team (e.g. the installer team) should have control
over other directories within this repositories.  For example,

  $ cat cluster/test-deploy/aws/OWNERS
  approvers:
    - installer-approvers
  reviewers:
    - installer-reviewers

is simpler and more maintainable with an auto-maintained
OWNERS_ALIASES than the copy/paste we used in 49f60b7
(cluster/test-deploy/aws/OWNERS: Delegate to openshift/installer,
2018-08-27, openshift#1290).

With script control over OWNERS_ALIASES, *this* repo can't define its
own aliases.  But that doesn't seem like a large limitation, because
this repo is unlikely to need aliases that are not defined in *any* of
the upstream repositories.  And if we need to grow a location for
local alias configuration, we can always add that in the future.

I'm using 'git clone...' over SSH to access the files, because that
provides access to private repositories.  Using:

  https://raw.githubusercontent.com/{organization}/{repository}/HEAD/{path}

only works if you have an auth token or the repository is public.

Using clone for two files is a bit heavy.  Ideally we could pull just
the two files we're interested.  Git's archive command supports that
use-case, but GitHub doesn't seem to have enabled the server side of
that transaction:

  $ git archive --format=tar --remote ssh://[email protected]/openshift/installer.git HEAD OWNERS OWNERS_ALIASES
  Invalid command: 'git-upload-archive '/openshift/installer.git''
    You appear to be using ssh to clone a git:// URL.
    Make sure your core.gitProxy config option and the
    GIT_PROXY_COMMAND environment variable are NOT set.
  fatal: The remote end hung up unexpectedly
  $ git --version
  git version 1.8.3.1

Until they enable that, I'm fine consuming more of their bandwidth
than is strictly necessary ;).

Using reflect in assertEqual gives us a nice, compact implementation.
But the output can be a bit hard to read.  With an introduced error:

  $ go test .
  ...
  --- FAIL: TestOrgRepos (0.00s)
  	main_test.go:15: unexpected result: [0xc42006e1e0 0xc42006e240] != [0xc42006e2a0]
  --- FAIL: TestGetOwners (2.30s)
  	main_test.go:15: unexpected result: &{Directories:[] Organization:openshift Repository:installer Owners:0xc42006e420 Aliases:<nil> Commit:aaa47f6a54f0fa0449d36de01ccb9f56729b608a} != &{Directories:[] Organiza
  ...

I'd mocked up an assertEqual based on comparing json.MarshalIndent
strings which produced:

  --- FAIL: TestOrgRepos (0.00s)
  	main_test.go:30: unexpected result: [
  		  {
  		    "directories": [
  		      "/tmp/populate-owners-389376240/a/b"
  		    ],
  		    "organization": "a",
  		    "repository": "b"
  		  },
  		  {
  		    "directories": [
  		      "/tmp/populate-owners-389376240/c/d"
  		    ],
  		    "organization": "c",
  		    "repository": "d"
  		  }
  		] != [
  		  {
  		    "directories": [
  		      "/tmp/populate-owners-389376240/a/b"
  		    ],
  		    "organization": "a",
  		    "repository": "b"
  		  }
  		]
  --- FAIL: TestGetOwners (2.38s)
  	main_test.go:30: unexpected result: {
  		  "organization": "openshift",
  		  "repository": "installer",
  		  "owners": {
  		    "approvers": [
  		      "aaronlevy",
  		      "abhinavdahiya",
  		      "crawford",
  		      "smarterclayton",
  		      "wking",
  		      "yifan-gu"
  		    ],
  		    "reviewers": [
  		      "vikramsk"
  		    ]
  		  },
  		  "commit": "aaa47f6a54f0fa0449d36de01ccb9f56729b608a"
  		} != {
  		  "organization": "openshift",
  		  "repository": "installer",
  		  "owners": {
  		    "approvers": [
  		      "aaronlevy",
  		      "abhinavdahiya",
  		      "crawford",
  		      "smarterclayton",
  		      "wking",
  		      "yifan-gu"
  		    ],
  		    "reviewers": [
  		      "vikramsk"
  		    ]
  		  },
  		  "commit": "2587b3ed493c18747f2b37e1ab6daebdd277631a"
  		}

and of course, there are a number of third-party libraries that do an
even better job (e.g. [3]).  But I don't think this functionality is
worth an external dependency, and Steve doesn't think it's worth
maintaining much local code for comparison [4], so we're going with
the more compact reflect approach despite the less-useful output.  And
the tests should all be passing anyway, right?  ;)

The GIT_*_DATE tick business is based on [5].  I haven't bothered
running a new commit after each test.setup change, because we don't
need a different commit hash.  In production, extractOwners will be
run on a fresh clone, so we will always have a different commit hash
if the content changes.

The stub populate-owners.sh wraps the 'go run' at Steve's request [6].
We no longer need anything Bash-specific, so I'm using a POSIX shebang
in the wrapper.  The 'exec' allows us to replace the shell command
with Go without executing a new process [7] (although Go itself will
proceed to launch a few processes).

The O_TRUNC is an attempt to avoid occasional corruption like:

  diff --git a/ci-operator/jobs/openshift/azure-misc/OWNERS b/ci-operator/jobs/openshift/azure-misc/OWNERS
  index 132c684..bf0dec5 100644
  --- a/ci-operator/jobs/openshift/azure-misc/OWNERS
  +++ b/ci-operator/jobs/openshift/azure-misc/OWNERS
  @@ -3,10 +3,11 @@
   # See the OWNERS docs: https://git.k8s.io/community/contributors/guide/owners.md

   approvers:
  +- pweil-
   - jim-minter
   - kargakis
   - kwoodson
  -- mjudeikis
  +is
   - pweil-
   reviewers:
   - jim-minter

although the exact source of that corruption is not clear to me.

[1]: https://github.com/openshift/cluster-api-provider-aws/blob/44e974de04f496f9552ecd37b73cad01b6d69f4d/OWNERS_ALIASES
[2]: https://github.com/kubernetes/community/blame/0741bdfbb56dcd4829754560f79a2f3da32cb34f/contributors/guide/owners.md#L54
[3]: https://godoc.org/github.com/stretchr/testify/assert#Equal
[4]: openshift#1285 (comment)
[5]: https://github.com/git/git/blob/v2.18.0/t/test-lib-functions.sh#L128-L138
[6]: openshift#1285 (comment)
[7]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#exec
@wking wking force-pushed the slurp-owners-aliases-too branch from 8bcba67 to 9fbc825 Compare August 29, 2018 22:31
@wking wking force-pushed the slurp-owners-aliases-too branch from 9fbc825 to 940f45a Compare August 29, 2018 22:42
Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 30, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stevekuznetsov, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 30, 2018
@openshift-merge-robot openshift-merge-robot merged commit 8f42984 into openshift:master Aug 30, 2018
@wking wking deleted the slurp-owners-aliases-too branch August 30, 2018 20:47
wking added a commit to wking/openshift-release that referenced this pull request Aug 30, 2018
…OWNERS

The old shell script used to do this, but I'd accidentally dropped the
functionality in e1f993f (populate-owners: Also slurp OWNERS_ALIASES,
2018-08-25, openshift#1285).
wking added a commit to wking/openshift-installer that referenced this pull request Aug 30, 2018
This will make it easier to add special-case OWNERS files to
openshift/release while still getting the members slurped up with the
populate-owners tooling.  That tooling pulls OWNERS_ALIASES since
openshift/release@e1f993fb (populate-owners: Also slurp
OWNERS_ALIASES, 2018-08-25, openshift/release#1285).
wking added a commit to wking/openshift-release that referenced this pull request Aug 31, 2018
The previous content was copy/pasted into this repo with 49f60b7
(cluster/test-deploy/aws/OWNERS: Delegate to openshift/installer,
2018-08-27, openshift#1290).  But with upstream OWNERS_ALIASES being slurped
since e1f993f (populate-owners: Also slurp OWNERS_ALIASES,
2018-08-25, openshift#1285) and the installer repo defining aliases since
openshift/installer@62f87acb (OWNERS_ALIASES: Define aliases for
reviewers and approvers, 2018-08-24, openshift/installer#184), we can
DRY this up by using the upstream aliases.
wking added a commit to wking/openshift-release that referenced this pull request Oct 4, 2018
As discussed in #1601, pulling all projects serially is slow, because
even with shallow clone you're pulling all the files in the HEAD tree.
As discussed in e1f993f (populate-owners: Also slurp OWNERS_ALIASES,
2018-08-25, openshift#1285), GitHub does not enable git-upload-archive, so
we can't use the Git protocol to select specific files.

One way to speed things up would be to allow fetching subsets of
projects, but that risks corrupting OWNERS_ALIASES.  If the utility
only fetched some repositories, it would have an incomplete picture of
aliases.  For example, say the repository contained
ci-operator/jobs/a/b and ci-operator/jobs/c/d, both of which defined
aliases.  A full run would result in aliases for both a/b and c/d in
OWNERS_ALIASES.  A subsequent run with only a/b would update any a/b
aliases, but all other aliases, including those which had previously
been injected by c/d, would be removed.  You could work around that by
caching information about which repositories were involved in
OWNERS_ALIASES, but it would be a bit complicated.

This commit, on the other hand, takes advantage of the fact that most
of our repositories are public.  It optimistically attempts an
efficient HTTP pull, and only falls back to the shallow clone if the
efficient pull fails.  That's slightly more work for private
repositories, where the HTTP attempt is wasted effort, but it's much
more efficient for public repos.

Another potential issue is consuming your API quota of 60
unauthenticated requests per hour [1].  But if you hit that limit, we
just fall back to Git, so it's not the end of the world.

Another alterantive would be to pull in parallel (as we used to before
e1f993f), but that still has the bandwidth cost of shifting lots of
files that we don't care about for this use case.  You could save some
bandwidth by caching blobs locally, but then you'd have the
complication of cache management.

The "got owners ..." message is because the HTTP pulls don't produce
any terminal output.  The Git pulls write status updates to stderr,
but now that it might be a while between Git pulls, it's good to let
the caller know something is happening ;).

Docs for the commit endpoint I'm hitting are in [2].

[1]: https://developer.github.com/v3/#rate-limiting
[2]: https://developer.github.com/v3/repos/commits/#get-the-sha-1-of-a-commit-reference
derekhiggins pushed a commit to derekhiggins/release that referenced this pull request Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants