Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak Metrics collection traces #931

Open
luqmana opened this issue Apr 15, 2022 · 2 comments
Open

Tweak Metrics collection traces #931

luqmana opened this issue Apr 15, 2022 · 2 comments
Labels
nexus Related to nexus

Comments

@luqmana
Copy link
Contributor

luqmana commented Apr 15, 2022

The nexus log can get a bit noisy with all the successful requests to the metric collection endpoints every 10s:

Apr 15 17:27:58.715 INFO request completed, response_code: 200, uri: /metrics/collect/e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c, method: GET, req_id: 4134bd35-62ea-490f-a28f-67bb6413c3ee, remote_addr: [fd00:1de::6]:47965, local_addr: [fd00:1de::7]:12221, component: dropshot_internal, name: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
Apr 15 17:28:08.713 INFO request completed, response_code: 200, uri: /metrics/collect/e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c, method: GET, req_id: 9f5530f3-81e7-489c-bb62-c2ae8afa2a8d, remote_addr: [fd00:1de::6]:47965, local_addr: [fd00:1de::7]:12221, component: dropshot_internal, name: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
Apr 15 17:28:18.710 INFO request completed, response_code: 200, uri: /metrics/collect/e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c, method: GET, req_id: 1273060b-7ca2-4d6a-bce1-c52c2062f066, remote_addr: [fd00:1de::6]:47965, local_addr: [fd00:1de::7]:12221, component: dropshot_internal, name: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c

It'd be nice to just show them on errors or a different log level perhaps.

@luqmana luqmana added the nexus Related to nexus label Apr 15, 2022
@smklein
Copy link
Collaborator

smklein commented Apr 18, 2022

I think this particular line is coming from dropshot: https://github.com/oxidecomputer/dropshot/blob/da09c39441e2d4c984d076f90cd550d9c6f76951/dropshot/src/server.rs#L722-L724

If we only log errors, I think this will impact all dropshot servers, not just Nexus. That's definitely an option - I also think being able to filter our logs would help a lot, since "under some, but not all conditions" this information could be useful.

@bnaecker
Copy link
Collaborator

It's true this is the normal Dropshot request handling log. It's also true that we could make these messages in particular less frequent, by running a separate server just for handling the metrics requests. There's already the ProducerServer type for this, so it would just be a matter of moving the handler functions to that server and figuring out a new address and/or port.

leftwo pushed a commit that referenced this issue Oct 4, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile
leftwo added a commit that referenced this issue Oct 5, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile

---------

Co-authored-by: Alan Hanson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nexus Related to nexus
Projects
None yet
Development

No branches or pull requests

3 participants