-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rolling-file appender to core logging #84735
Add rolling-file appender to core logging #84735
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-review
src/core/server/logging/appenders/rolling_file/policies/size_limit/size_limit_policy.ts
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/rolling_file_appender.test.ts
Show resolved
Hide resolved
try { | ||
await this.strategy.rollout(); | ||
await this.fileManager.closeStream(); | ||
} catch (e) { | ||
// eslint-disable-next-line no-console | ||
console.log('Error while rolling file: ', e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm forced to try/catch this, else it could result in an unhandled promise rejection from the call in append
.
I'm not sure what to do with the error though. Crashing the server if the rollout failed seems extreme, but I'm not sure console.log
is enough. I could eventually keep an internal ConsoleAppender
and use it to log the error instead of relying on console.log
, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a failed rollout never, sometimes, or always prevent future logs from being written? My guess is sometimes
?
Eventually keeping an internal ConsoleAppender
sounds like a good idea to me. In the short term, I don't have many other suggestions, other than changing console.log
to console.error
, so that we can at least see this in stderr instead of stdout.
In the context of audit logging specifically, we've discussed optionally halting the server to prevent users from interacting with Kibana when we can't audit their actions: #60636. This is out of scope for this PR, but it might inform how we decide to handle these failures.
In the hypothetical future, we could leverage the notification center to alert administrators about this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a failed rollout never, sometimes, or always prevent future logs from being written?
it's never never
when you interact with filesystem, but except in case of hard I/O failure, I think the worth case would be that we would not be able to roll the log file, and that the next rolling cycle would still write in the previous (non-renamed) file. We skip a rolling, basically, which seemed acceptable (also we can't really do anything except warning about the failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what to do with the error though. Crashing the server if the rollout failed seems extreme, but I'm not sure console.log is enough.
It seems Elasticsearch logs a caught exception without any additional measurements (we can further check with them). see elastic/elasticsearch#45523
Pinging @elastic/kibana-core (Team:Core) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this new appender for Kibana's new audit log, and it's working very nicely! This is a really impressive PR, thanks again for putting this up so quickly
try { | ||
await this.strategy.rollout(); | ||
await this.fileManager.closeStream(); | ||
} catch (e) { | ||
// eslint-disable-next-line no-console | ||
console.log('Error while rolling file: ', e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a failed rollout never, sometimes, or always prevent future logs from being written? My guess is sometimes
?
Eventually keeping an internal ConsoleAppender
sounds like a good idea to me. In the short term, I don't have many other suggestions, other than changing console.log
to console.error
, so that we can at least see this in stderr instead of stdout.
In the context of audit logging specifically, we've discussed optionally halting the server to prevent users from interacting with Kibana when we can't audit their actions: #60636. This is out of scope for this PR, but it might inform how we decide to handle these failures.
In the hypothetical future, we could leverage the notification center to alert administrators about this as well.
this.currentFileTime = birthtime.getTime(); | ||
this.currentFileSize = size; | ||
} catch (e) { | ||
this.currentFileTime = Date.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there only certain failure scenarios that should cause us to reset the file info like this? Or do we really want to reset whenever any error is caught?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we would only catch ENOENT
(or such) errors. But that's kinda the first time we got a logger/appender implementation complex enough to potentially throw errors, so I'm not really sure how we should handle failures internal to the logger system or its appenders, so for safety I used a catch-all instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we would only catch ENOENT (or such) errors. But that's kinda the first time we got a logger/appender implementation complex enough to potentially throw errors,
shouldn't we log an exception different from ENOENT
to console at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add a ENOENT
check and logs the error.
Remaining question (same as in #84735 (comment)) is, should we:
- use
console.log
- use
console.error
- create a
ConsoleAppender
in theRollingFileAppender
and use it internally for all error logging (we would use the samelayout
as the one provided to the rolling file appender). In that case, it will logs instdout
as if we were usingconsole.log
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3rd option looks fine. However, I'd expect that it uses the default layout
that a user specifies for the root
context.
If the 3rd option is too hard to implement, the 2nd one is well enough.
export { RollingStrategy } from './strategy'; | ||
export type RollingStrategyConfig = NumericRollingStrategyConfig; | ||
|
||
export const rollingStrategyConfigSchema = schema.oneOf([numericRollingStrategyConfigSchema]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only have a single strategy available, what do you think about providing a default value, so that consumers aren't forced to specify it? I mistakenly omitted this from my yml, and the @kbn/config-schema
error message was hard to decipher at best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I even added default values for both the policy and the strategy at some point, so that
rolling-file:
kind: rolling-file
path: /Users/pierregayvallet/Documents/test-log/kibana.log
layout:
kind: pattern
would be enough, then I reverted it, thinking that we didn't do that for other appenders (like you must always specify layout
, there isn't a default value for it).
@elastic/kibana-core do you have any opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't see a reason not to add defaults. we have got them for some values
kibana/src/core/server/logging/layouts/pattern_layout.ts
Lines 36 to 48 in ab92bbb
const DEFAULT_PATTERN = `[%date][%level][%logger]%meta %message`; | |
export const patternSchema = schema.string({ | |
validate: (string) => { | |
DateConversion.validate!(string); | |
}, | |
}); | |
const patternLayoutSchema = schema.object({ | |
highlight: schema.maybe(schema.boolean()), | |
kind: schema.literal('pattern'), | |
pattern: schema.maybe(patternSchema), | |
}); |
src/core/server/logging/appenders/rolling_file/strategies/numeric/pattern_matcher.ts
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/strategies/numeric/pattern_matcher.ts
Outdated
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/strategies/numeric/pattern_matcher.ts
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/strategies/numeric/rolling_tasks.ts
Outdated
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/strategies/numeric/rolling_tasks.ts
Show resolved
Hide resolved
|
||
export const sizeLimitTriggeringPolicyConfigSchema = schema.object({ | ||
kind: schema.literal('size-limit'), | ||
size: schema.byteSize({ min: '1b', defaultValue: '100mb' }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you discussed the defaults with Cloud? The legacy appender has other limits + we might want to have higher min and explicit max limits
kibana/packages/kbn-legacy-logging/src/schema.ts
Lines 77 to 82 in e176def
// > 1MB | |
.greater(1048576) | |
// < 1GB | |
.less(1073741825) | |
// 10MB | |
.default(10485760), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't. @Kushmaro maybe you can help us here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgayvallet
By our log delivery feature design (see storage section) we concluded that the smallest kibana instance in cloud will have around 1GB of storage space
is the 100mb referring to the log size rotation limit? cloud also enforces a quota on the log path, and I * think * that quota is 100mb total, so maybe a more sensible rotation limit would be 10mb.
@s-nel for awareness and further guidelines (he's on PTO though so may take a while)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the 100mb referring to the log size rotation limit
That's the maximum size of the log file before it got rolled out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use 100MB rollover size for ES audit logs and keep 2 files. We might choose something smaller for Kibana since the instances don't have much storage.
this.currentFileTime = birthtime.getTime(); | ||
this.currentFileSize = size; | ||
} catch (e) { | ||
this.currentFileTime = Date.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we would only catch ENOENT (or such) errors. But that's kinda the first time we got a logger/appender implementation complex enough to potentially throw errors,
shouldn't we log an exception different from ENOENT
to console at least?
src/core/server/logging/appenders/rolling_file/strategies/fs.ts
Outdated
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/strategies/fs.ts
Outdated
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/policies/time_interval/time_interval_policy.ts
Outdated
Show resolved
Hide resolved
src/core/server/logging/appenders/rolling_file/strategies/numeric/numeric_strategy.ts
Show resolved
Hide resolved
src/core/server/logging/integration_tests/rolling_file_appender.test.ts
Outdated
Show resolved
Hide resolved
|
||
const logger = root.logger.get('test.rolling.file'); | ||
|
||
// size = 100b, message.length ~= 40b, should roll every 3 message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can create a buffer of fixed size to make sure to have better control over the rolling process
const asciiBuf = Buffer.alloc(5, 'a', 'utf8');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only using ascii chars in the message, so we kinda know that sizeInBytes === msg.length
, but I can use a buffer to add more control if we want to.
@restrry I addressed most of your comments. The points I still need input on are
PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change size-limit
policy defaults in a follow up
💚 Build SucceededMetrics [docs]Distributable file count
History
To update your PR or re-run it, just comment with: |
* You need to start somewhere * revert comment * rename default strategy to numeric * add some tests * fix some tests * update documentation * update generated doc * change applyBaseConfig to be async * fix integ tests * add integration tests * some renames * more tests * more tests * nits on README * some self review * doc nits * self review * use `escapeRegExp` from lodash * address some review comments * a few more nits * extract `isDevCliParent` check outside of LoggingSystem.upgrade * log errors from context * add defaults for policy/strategy
* You need to start somewhere * revert comment * rename default strategy to numeric * add some tests * fix some tests * update documentation * update generated doc * change applyBaseConfig to be async * fix integ tests * add integration tests * some renames * more tests * more tests * nits on README * some self review * doc nits * self review * use `escapeRegExp` from lodash * address some review comments * a few more nits * extract `isDevCliParent` check outside of LoggingSystem.upgrade * log errors from context * add defaults for policy/strategy
* master: (53 commits) Fixing recovered instance reference bug (elastic#85412) Switch to new elasticsearch client for Visualizations (elastic#85245) Switch to new elasticsearch client for TSVB (elastic#85275) Switch to new elasticsearch client for Vega (elastic#85280) [ILM] Add shrink field to hot phase (elastic#84087) Add rolling-file appender to core logging (elastic#84735) [APM] Service overview: Dependencies table (elastic#83416) [Uptime ]Update empty message for certs list (elastic#78575) [Graph] Fix graph saved object references (elastic#85295) [APM] Create new API's to return Latency and Throughput charts (elastic#85242) [Advanced settings] Reset to default for empty strings (elastic#85137) [SECURITY SOLUTION] Bundles _source -> Fields + able to sort on multiple fields in Timeline (elastic#83761) [Fleet] Update agent listing for better status reporting (elastic#84798) [APM] enable 'sanitize_field_names' for Go (elastic#85373) Update dependency @elastic/charts to v24.4.0 (elastic#85452) Introduce external url service (elastic#81234) Deprecate disabling the security plugin (elastic#85159) [FLEET] New Integration Policy Details page for use in Integrations section (elastic#85355) [Security Solutions][Detection Engine] Fixes one liner access control with find_rules REST API chore: 🤖 remove extraPublicDirs (elastic#85454) ...
Summary
Fix #56291
This PR adds a new
rolling-file
appender to core's logging system.The appender currently supports two triggering policy: time-based (
time-interval
) and size-base (size-limit
), and one rolling strategy: appending a numeric-based suffix at the end of the file when rolling it (numeric
). Seesrc/core/server/logging/README.md
in the PR for more documentationExamples
Rolling every 24h
This appender is configured to roll the file every 24h hours (
interval: 24h
) everyday at 0:00am (modulate: true
). it will retain a maximum of 2 (max: 2
) rolled files:being written to.
A new kibana.log file is created and starts being written to.
kibana.log is renamed to kibana-1.log. A new kibana.log file is created and starts being written to.
Rolling once the file reaches 50mb
This appender is configured to roll the file once it reaches
50mb
. The same rolling strategy than previous example applies.Checklist