How to hook up third-party daemons? #35

jfilak · 2016-10-24T10:58:03Z

I'm an ABRT developer and I would love to create a problem daemon reporting problems detected by ABRT to node-problem-detector.

ABRT's architecture is similar to node-problem-detector's - there are agents reporting detected problems to abrtd. An ABRT agent is either a tiny daemon watching logs (or systemd-journal) or a language error handler (Python sys.excepthook, Ruby at_exit callback, /proc/sys/kernel/core_pattern, Node.js uncaughtException event handler, Java JNI agent).

I've created a docker image that is capable to detect Kernel oopses, vmcores and core files on a host:
https://github.com/jfilak/docker-abrt/tree/atomic_minimal

(It should be possible to detect uncaught [Python, Ruby, Java] exceptions in the future)

ABRT provides several ways of reporting the detected problems to users - e-mail, FTP|SCP upload, D-Bus signal, Bugzilla bug, micro-Report, systemd-journal catalog message - and it is trivial to add another report destination.

The Design Doc defines "Problem Report Interface" but I've failed to find out how to register a new problem deamon to node-problem-detector or how to use the "Problem Report Interface" from a third party daemon.

derekwaynecarr · 2016-10-24T18:02:32Z

/subscribed

Random-Liu · 2016-10-24T19:36:26Z

@jfilak Cool, we'd really like to integrate with third party daemons!

NPD was introduced in K8s 1.3 for several reasons:

We have urgent requirement for kernel problem detection. At that time we were suffering from some kernel deadlocks, such as the unregister_netdevice kernel race Mitigate impact of unregister_netdevice kernel race kubernetes#20096..
Node problem detection is a necessity, but it is really environment-dependent. We don't have enough knowledge and bandwidth to work on concrete solution for all different environments. So we make it composable and want to open a door to get community help and integrate with third party solution.

The Design Doc defines "Problem Report Interface" but I've failed to find out how to register a new problem deamon to node-problem-detector or how to use the "Problem Report Interface" from a third party daemon.

In the first version, we architecturally separated out the "problem daemon" and defined the "problem report interface", but kernel monitor (the first "problem daemon") is still in-process integrated because at that time it's the only daemon.

We have plan to support inter-process integration and now it seems to be the time. :)

@dchen1107
/cc @kubernetes/sig-node

dchen1107 · 2016-10-24T22:04:13Z

@jfilak Thanks for interest integrating new problem detector with k8s's generic NPD!

By design, NPD should be easy to plug-in / swap with a different problem detector containers, and aggregate / report all problems to the upstream layers / the users. Do you want to give a demo on one of our sig-node meeting?

jfilak · 2016-10-25T08:23:38Z

Do you want to give a demon on one of our sig-node meeting?

Yes, I do. Thank you for the offer. However, I need some time to get familiar with kubernetes and to polish the image. I've been testing the image only on a bare metal with Docker so far.

Random-Liu · 2016-10-25T19:20:05Z

@jfilak Thanks!
If you are interested, you can join our slack channel, http://slack.kubernetes.io/ and #sig-node. :)

adohe-zz · 2016-11-07T09:24:30Z

/subscribed

andyxning · 2017-09-03T14:19:54Z

@jfilak I am working on porting Nagios Plugin Interface to NPD which uses stdout and return status code to identify the status of a service or node status. IIUC, ABRT can not use stdout and return status code to communicate with NPD, right?

If then, how about we use a localhost http api to listen for third party daemon checker.

fejta-bot · 2018-01-04T10:45:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-02-08T22:00:16Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-03-10T22:46:34Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Random-Liu mentioned this issue Jan 7, 2017

NPD Kubernetes 1.6 Planning #58

Closed

11 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 8, 2018

k8s-ci-robot closed this as completed Mar 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to hook up third-party daemons? #35

How to hook up third-party daemons? #35

jfilak commented Oct 24, 2016

derekwaynecarr commented Oct 24, 2016

Random-Liu commented Oct 24, 2016 •

edited

Loading

dchen1107 commented Oct 24, 2016 •

edited

Loading

jfilak commented Oct 25, 2016

Random-Liu commented Oct 25, 2016

adohe-zz commented Nov 7, 2016

andyxning commented Sep 3, 2017

fejta-bot commented Jan 4, 2018

fejta-bot commented Feb 8, 2018

fejta-bot commented Mar 10, 2018

How to hook up third-party daemons? #35

How to hook up third-party daemons? #35

Comments

jfilak commented Oct 24, 2016

derekwaynecarr commented Oct 24, 2016

Random-Liu commented Oct 24, 2016 • edited Loading

dchen1107 commented Oct 24, 2016 • edited Loading

jfilak commented Oct 25, 2016

Random-Liu commented Oct 25, 2016

adohe-zz commented Nov 7, 2016

andyxning commented Sep 3, 2017

fejta-bot commented Jan 4, 2018

fejta-bot commented Feb 8, 2018

fejta-bot commented Mar 10, 2018

Random-Liu commented Oct 24, 2016 •

edited

Loading

dchen1107 commented Oct 24, 2016 •

edited

Loading