Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic IOPs check #19

Merged
merged 3 commits into from
Feb 21, 2023
Merged

Add basic IOPs check #19

merged 3 commits into from
Feb 21, 2023

Conversation

micahlee
Copy link
Contributor

@micahlee micahlee commented Jan 19, 2023

Desired Outcome

The desired outcome of this PR is to include disk performance checks in Conjur preflight. Namely this includes:

  • Read, write, and sync latency
  • Read and write operations per second (IOPs)

This is to help A) begin collecting better information from support cases related to disk performance, and B) provide a mechanism to warn on insufficient performance before attempting to run an auto-failover Conjur cluster.

Implemented Changes

This PR is organized into discrete commits and is best reviewed by-commit. This includes:

  • A commit for adding debug logs to the CLI
  • A commit to improve the testing of some existing code
  • A commit to add the disk performance checks and tests for them.

Here is an example output of the new disk performance checks:

========================================
Conjur Enterprise Preflight Qualification
Version: 0.2.0-6835dc5 (Build 31)
========================================

...

Disk
----
...
INFO - FIO - read iops (/): 878.84 (Min: 626, Max: 1008, StdDev: 24.33)
INFO - FIO - write iops (/): 914.83 (Min: 674, Max: 1080, StdDev: 24.87)
INFO - FIO - read latency (99%, /): 0.00 ms
INFO - FIO - write latency (99%, /): 0.00 ms
INFO - FIO - sync latency (99%, /): 1.22 ms
...

Additional example outputs are available in the artifacts for each Jenkins build.

Connected Issue/Story

CyberArk internal issue ID: ONYX-29736

Definition of Done

At least 1 todo must be completed in the sections below for the PR to be
merged.

Changelog

  • The CHANGELOG has been updated, or
  • This PR does not include user-facing changes and doesn't require a
    CHANGELOG update

Test coverage

  • This PR includes new unit and integration tests to go with the code
    changes, or
  • The changes in this PR do not require tests

Documentation

  • Docs (e.g. READMEs) were updated in this PR
  • A follow-up issue to update official docs has been filed here: ONYX-31672
  • This PR does not require updating any documentation

Behavior

  • This PR changes product behavior and has been reviewed by a PO, or
  • These changes are part of a larger initiative that will be reviewed later, or
  • No behavior was changed with this PR

Security

  • Security architect has reviewed the changes in this PR,
  • These changes are part of a larger initiative with a separate security review, or
  • There are no security aspects to these changes

@micahlee micahlee force-pushed the onyx-29736-basic-iops branch 18 times, most recently from 4d09420 to e0b023b Compare January 20, 2023 16:46
@micahlee micahlee marked this pull request as ready for review January 20, 2023 16:53
@micahlee micahlee requested a review from a team as a code owner January 20, 2023 16:53
@micahlee micahlee changed the base branch from master to integration-tests January 20, 2023 18:58
Base automatically changed from integration-tests to master January 20, 2023 19:30
@micahlee micahlee force-pushed the onyx-29736-basic-iops branch 2 times, most recently from 824d27c to ffc2cba Compare January 25, 2023 15:50
@micahlee micahlee force-pushed the onyx-29736-basic-iops branch 2 times, most recently from a63f0f1 to 904c0c8 Compare February 7, 2023 22:06
This makes it easier to report verbose program
logging to troubleshooting issues with the
preflight itself.
@micahlee micahlee force-pushed the onyx-29736-basic-iops branch 3 times, most recently from e53c276 to bd512ad Compare February 9, 2023 18:48
@micahlee micahlee requested a review from a team February 9, 2023 19:11
These are based on the `fio` tool, which must
be installed separately as a prereq for these
checks.

Further, the IOPs checks require `libaio` to be
installed separately, and is only available on
linux.
Copy link

@imheresamir imheresamir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imheresamir
Copy link

Just curious, did we investigate any Go libraries for the FIO disk/iops checks? So we could call a function in a library instead of delegating the checks to external executables?

@micahlee
Copy link
Contributor Author

micahlee commented Feb 21, 2023

Just curious, did we investigate any Go libraries for the FIO disk/iops checks? So we could call a function in a library instead of delegating the checks to external executables?

Thanks @imheresamir,

Yes, an there's not a lot out there. I did consider using https://github.com/cunnie/gobonniego, which would have covered IOPs but not latency. There are some latency libraries, but I wasn't able to find a license compatible one (e.g. https://github.com/koct9i/ioping is GPL).

The main reason to use fio, though, is that job parameters to evaluate performance for etcd are already well documented:

So this gives us more assurance that what we're capturing matches what we need to evaluate suitability for auto-failover.

That said, if we do learn about a library solution that would work for us, I am not opposed to finding a way to push toward that instead.

@micahlee micahlee merged commit 8a3e906 into master Feb 21, 2023
@micahlee micahlee deleted the onyx-29736-basic-iops branch February 21, 2023 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants