Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync issue happened for wg-manifests and wg-deployment #384

Closed
PatrickXYS opened this issue Nov 5, 2020 · 13 comments · Fixed by #385
Closed

Sync issue happened for wg-manifests and wg-deployment #384

PatrickXYS opened this issue Nov 5, 2020 · 13 comments · Fixed by #385
Assignees

Comments

@PatrickXYS
Copy link
Member

After this PR merged,

we should have wg-manifests and wg-deployment Github Team established,

given this query, https://github.com/orgs/kubeflow/teams?query=wg-

It's obviously sync script does not work, I'm guessing the issue happened before, happened again.

/cc @chensun @jlewi @Bobgy

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

/assign
I'll investigate now

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

Found error

ValueError: chensun is not in the allowed set of admins. Allowed admins is Bobgy, google-admin, googlebot, james-jwu, jlewi, k8s-ci-robot, richardsliu, rmgogogo, theadactyl

I'll prioritize #346 after immediate work items for Kubeflow 1.2

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

Hmm, the change was correct in #376, but the running job hasn't been updated. I'll investigate further.

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

/cc @jlewi

@PatrickXYS
Copy link
Member Author

Maybe adding @chensun to the kubeflow org, and then check out the cron job to see how it goes.

@PatrickXYS
Copy link
Member Author

I think we can improve the process, find a way to expose logs / errors / issues to non-google folks, thus, it will save time for google-admin to check those logs as well

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

OK, I found the problem, the validate.py change in #376 needs to be manually updated by

I've updated that, the next sync should work

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

I think we can improve the process, find a way to expose logs / errors / issues to non-google folks, thus, it will save time for google-admin to check those logs as well

I think the blocker is that fixing anything there is high privilege, so you'll need to wait for a google-admin to do that, just exposing the logs might not be enough, but exposing logs might be helpful to let community members find a fix and contribute a PR...

Any ideas for a good way to expose those logs?

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

There's one more problem

{"component":"peribolos","file":"prow/cmd/peribolos/main.go:202","func":"main.main","level":"fatal","msg":"Configuration failed: failed to configure kubeflow members: kubeflow must specify {[jlewi james-jwu google-admin googlebot] true} as admins, missing [jlewi]","severity":"fatal","time":"2020-11-05T02:50:10Z"}

Bobgy added a commit to Bobgy/internal-acls that referenced this issue Nov 5, 2020
k8s-ci-robot pushed a commit that referenced this issue Nov 5, 2020
* fix: update kubeflow admin config. Fixes #384

* update
@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

/reopen
Still not working

{"component":"peribolos","file":"prow/cmd/peribolos/main.go:202","func":"main.main","level":"fatal","msg":"Configuration failed: failed to configure kubeflow members: authenticated user jlewi is not an admin of kubeflow","severity":"fatal","time":"2020-11-05T03:10:19Z"}

@k8s-ci-robot
Copy link

@Bobgy: Reopened this issue.

In response to this:

/reopen
Still not working

{"component":"peribolos","file":"prow/cmd/peribolos/main.go:202","func":"main.main","level":"fatal","msg":"Configuration failed: failed to configure kubeflow members: authenticated user jlewi is not an admin of kubeflow","severity":"fatal","time":"2020-11-05T03:10:19Z"}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

The sync was successful

@Bobgy Bobgy closed this as completed Nov 5, 2020
@PatrickXYS
Copy link
Member Author

I think we can improve the process, find a way to expose logs / errors / issues to non-google folks, thus, it will save time for google-admin to check those logs as well

I think the blocker is that fixing anything there is high privilege, so you'll need to wait for a google-admin to do that, just exposing the logs might not be enough, but exposing logs might be helpful to let community members find a fix and contribute a PR...

Any ideas for a good way to expose those logs?

I can come up with some ideas including giving viewer access, but looks like the GKE cluster cronjob is running is not used internally, I don't have a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants