-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task/internal/syslog: Add capability to ignore kernel failures #1666
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -102,56 +102,21 @@ def syslog(ctx, config): | |
# flush the file fully. oh well. | ||
|
||
log.info('Checking logs for errors...') | ||
exclude_errors = config.get('ignorelist', []) | ||
log.info('Exclude error list: {0}'.format(exclude_errors)) | ||
for rem in cluster.remotes.keys(): | ||
log.debug('Checking %s', rem.name) | ||
stdout = rem.sh( | ||
[ | ||
args = [ | ||
'egrep', '--binary-files=text', | ||
'\\bBUG\\b|\\bINFO\\b|\\bDEADLOCK\\b', | ||
'\\bBUG\\b|\\bINFO\\b|\\bDEADLOCK\\b|\\bOops\\b|\\bWARNING\\b|\\bKASAN\\b', | ||
run.Raw(f'{archive_dir}/syslog/kern.log'), | ||
run.Raw('|'), | ||
'grep', '-v', 'task .* blocked for more than .* seconds', | ||
run.Raw('|'), | ||
'grep', '-v', 'lockdep is turned off', | ||
run.Raw('|'), | ||
'grep', '-v', 'trying to register non-static key', | ||
run.Raw('|'), | ||
'grep', '-v', 'DEBUG: fsize', # xfs_fsr | ||
run.Raw('|'), | ||
'grep', '-v', 'CRON', # ignore cron noise | ||
run.Raw('|'), | ||
'grep', '-v', 'BUG: bad unlock balance detected', # #6097 | ||
run.Raw('|'), | ||
'grep', '-v', 'inconsistent lock state', # FIXME see #2523 | ||
run.Raw('|'), | ||
'grep', '-v', '*** DEADLOCK ***', # part of lockdep output | ||
run.Raw('|'), | ||
'grep', '-v', | ||
# FIXME see #2590 and #147 | ||
'INFO: possible irq lock inversion dependency detected', | ||
run.Raw('|'), | ||
'grep', '-v', | ||
'INFO: NMI handler (perf_event_nmi_handler) took too long to run', # noqa | ||
run.Raw('|'), | ||
'grep', '-v', 'INFO: recovery required on readonly', | ||
run.Raw('|'), | ||
'grep', '-v', 'ceph-create-keys: INFO', | ||
run.Raw('|'), | ||
'grep', '-v', 'INFO:ceph-create-keys', | ||
run.Raw('|'), | ||
'grep', '-v', 'Loaded datasource DataSourceOpenStack', | ||
run.Raw('|'), | ||
'grep', '-v', 'container-storage-setup: INFO: Volume group backing root filesystem could not be determined', # noqa | ||
run.Raw('|'), | ||
'egrep', '-v', '\\bsalt-master\\b|\\bsalt-minion\\b|\\bsalt-api\\b', | ||
run.Raw('|'), | ||
'grep', '-v', 'ceph-crash', | ||
run.Raw('|'), | ||
'egrep', '-v', '\\btcmu-runner\\b.*\\bINFO\\b', | ||
run.Raw('|'), | ||
'head', '-n', '1', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which of these need moved to the qa suite? Please post a ceph PR. It needs backported to octopus/pacific before this can be merged. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think most of this is stale. We will have to run all relevant suites and find out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could do that but I'm not sure it is worth the effort. If the objective is to make the exclude list configurable, is there a problem with leaving these in (meaning that these would always be on the exclude list)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @batrick What's the best filter to exercise only kernel code. Which of the following covers all ?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In the past, this has been confusing to newcomers. There's all sorts of magic defaults in teuthology (e.g. the log ignorelist). Best to move these to the ceph.git qa/ when possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This will have the best coverage. Add |
||
], | ||
) | ||
] | ||
for exclude in exclude_errors: | ||
args.extend([run.Raw('|'), 'egrep', '-v', exclude]) | ||
args.extend([ | ||
run.Raw('|'), 'head', '-n', '1', | ||
]) | ||
stdout = rem.sh(args) | ||
if stdout != '': | ||
log.error('Error in syslog on %s: %s', rem.name, stdout) | ||
set_status(ctx.summary, 'fail') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtlayton are we missing anything here or LGTY?