Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

persistent log for PAI services #2129

Closed
fanyangCS opened this issue Feb 2, 2019 · 7 comments
Closed

persistent log for PAI services #2129

fanyangCS opened this issue Feb 2, 2019 · 7 comments

Comments

@fanyangCS
Copy link
Contributor

What would you like to be added:
persistent log for key PAI services like RM, NameNode, etc.

Why is this needed:
the log will be useful for debug.

Without this feature, how does the current module work
right now, when a pai service restarts, the log is gone. hard to debug

Components that may involve changes:
PAI key services.

@fanyangCS
Copy link
Contributor Author

fanyangCS commented Mar 1, 2019

partially resolved by #2244

@squirrelsc
Copy link
Member

@scarlett2018 recently, there are some wired issues in OpenPAI, as there is no log, so it's hard to trouble shooting. It should persistent log of all pods near realtime.

@squirrelsc squirrelsc changed the title persistent log for key PAI services persistent log for PAI services Mar 26, 2019
@xudifsd
Copy link
Member

xudifsd commented Mar 26, 2019

We have two options:

  • centralized log storage like fluentd
  • persist to local and define a mechanism to view, maybe nginx services in each node serving log view request

@squirrelsc
Copy link
Member

We have two options:

  • centralized log storage like fluentd
  • persist to local and define a mechanism to view, maybe nginx services in each node serving log view request

It's better to centralize logs. It don't need to deploy Nginx services on host.

@xudifsd
Copy link
Member

xudifsd commented Mar 26, 2019

@squirrelsc they have pros and cons, centralized will need fluentd service deployed on every hosts and elastic search deployed and a lot of space dedicated to the service. IMO, The local host is much more light weighted.

@squirrelsc
Copy link
Member

@squirrelsc they have pros and cons, centralized will need fluentd service deployed on every hosts and elastic search deployed and a lot of space dedicated to the service. IMO, The local host is much more light weighted.

we can just put all logs on HDFS, no elasticSearch is needed. If there is any alert, we have pod name, and it can be used to find log files.
For fluentd, how is log files of user container uploaded? Can it be used to upload logs of OpenPAI service?

@fanyangCS
Copy link
Contributor Author

closed and tracked in #4992.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants