Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Svcwatcher core after losing master/leader #227

Open
TothFerenc opened this issue Jul 22, 2020 · 5 comments
Open

Svcwatcher core after losing master/leader #227

TothFerenc opened this issue Jul 22, 2020 · 5 comments

Comments

@TothFerenc
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?:
bug

What happened:
Svcwatcher Pod lost master for any reason, so the process was exiting:

E0704 17:44:12.997128       1 svcwatcher.go:93] Lost master
F0704 17:44:12.997152       1 svcwatcher.go:97] Lost lease
E0704 17:44:12.997232       1 event.go:269] Unable to write event: 'can't create an event with namespace 'default' in namespace 'kube-system'' (may retry after sleeping)
goroutine 1 [running]:
github.com/golang/glog.stacks(0xc000374400, 0xc00028e000, 0x3b, 0x9e)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:769 +0xb8
github.com/golang/glog.(*loggingT).output(0x20605c0, 0xc000000003, 0xc00023c0e0, 0x1fce7c9, 0xd, 0x61, 0x0)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:720 +0x372
github.com/golang/glog.(*loggingT).println(0x20605c0, 0xc000000003, 0xc00002feb0, 0x1, 0x1)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:633 +0xe7
github.com/golang/glog.Fatalln(...)
        /go/src/github.com/nokia/danm/vendor/github.com/golang/glog/glog.go:1141
main.main()
        /go/src/github.com/nokia/danm/cmd/svcwatcher/svcwatcher.go:97 +0x9e4
E0704 17:44:20.320002       1 event.go:269] Unable to write event: 'can't create an event with namespace 'default' in namespace 'kube-system'' (may retry after sleeping)
glog: Flush took longer than 10s

What you expected to happen:
No core dump before exit.

How to reproduce it:
It happens frequently during deployment.

Anything else we need to know?:

Environment:

  • DANM version (use danm -version):
# /opt/cni/bin/danm -version
2020/07/22 12:31:18 DANM binary was built from release: v4.2.0-0
2020/07/22 12:31:18 DANM binary was built from commit: c0a4c1570845556cf911a46df475c45a85941bb2
  • Kubernetes version (use kubectl version):
# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:41:22Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
@Levovar
Copy link
Collaborator

Levovar commented Aug 3, 2020

so, this is the 97th line where it cores:
https://github.com/nokia/danm/blob/master/cmd/svcwatcher/svcwatcher.go#L97
It is literally a library call without references to any objects
I think I have already stated earlier that glog is shite :)
maybe the non-newline API wouldn't core, but I absolutely refuse to deep dive into its code. solution is removing the usage of the whole library

the cannot create event remark above is more interesting for me

@Levovar
Copy link
Collaborator

Levovar commented Aug 3, 2020

reg the Eventing issue: the leader election library creates an event recorder without a namespace defined, so it defaults to default
but our component runs in the kube-system, so when we really want to record an event it fails
something like: tsuru/remesher#5

which is funny because as far as I can tell the Events are raised using the meta of the provided EndPointsLock: https://github.com/kubernetes/client-go/blob/00dbcca6ee44c678754d3f5fda1bd0e704b26fe2/tools/leaderelection/resourcelock/endpointslock.go#L100,
and lo and behold we do set the proper namespace into the lock:
https://github.com/nokia/danm/blob/master/cmd/svcwatcher/svcwatcher.go#L74

soo...

@Levovar
Copy link
Collaborator

Levovar commented Aug 3, 2020

I guess others also have issues with the library :)
https://bugzilla.redhat.com/show_bug.cgi?id=1842002

@Levovar
Copy link
Collaborator

Levovar commented Aug 10, 2020

@TothFerenc any comments on above? I'm kind of on the opinion that this is how stuff works, and we just need to live with it

@TothFerenc
Copy link
Contributor Author

Maybe we can create a new TODO issue about log module harmonization (use the same logging engine across all DANM components), and this issue can depend on it.
Of couse I will close this issue once client libraries are fixed in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants