Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm: introduce engine probe #14604

Merged
merged 7 commits into from
Nov 2, 2023
Merged

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Nov 1, 2023

Introduce a probe for reporting the resource usage of WebAssembly VMs.

The intention of decoupling this from transform_probe is that even
if we have many different "types" of VMs (authn, etc) we still want
to report the baseline resource usage.

Currently this PR only introduces tracking for memory usage.
We report the maximum (allowed) memory usage and the used memory,
which is updated when the engine is done with startup or after
a transform is invoked.

A followup PR will introduce CPU resource usage tracking.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 1, 2023

Force push: Fix the method we call to report memory usage

There are other types of probes internal to the wasm subsystem so rename
the transform probe so that we can disambiguate from other probes, since
the transform probe is "external".

Signed-off-by: Tyler Rockwood <[email protected]>
@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 1, 2023

Force push: Fix conflicts with dev

@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 1, 2023

sigh, failures are all known failures around decommissioning

@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 2, 2023

/ci-repeat

@vbotbuildovich
Copy link
Collaborator

new failures detected in https://buildkite.com/redpanda/redpanda/builds/40290#018b8eec-4815-4b1e-b8b6-4730e0e8ddad: "rptest.tests.rpk_registry_test.RpkRegistryTest.test_produce_consume_proto"

@vbotbuildovich
Copy link
Collaborator

new failures detected in https://buildkite.com/redpanda/redpanda/builds/40290#018b8ef8-aa1f-4973-8ee9-72845642a9ec: "rptest.tests.memory_stress_test.MemoryStressTest.test_fetch_with_many_partitions.memory_share_for_fetch=0.7"
"rptest.tests.rpk_registry_test.RpkRegistryTest.test_produce_consume_proto"

StephanDollberg
StephanDollberg previously approved these changes Nov 2, 2023
}

engine_probe engine_probe_cache::make_probe(const ss::sstring& name) {
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit said that there is no better pattern for this conditional insert but wouldn't just using operator[] be the easiest here?

Can just check for a default constructed nullptr shared_ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss some of the gtl map helpers from my time at Google, but yes default construction works fine - I've made the switch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I meant you can then also replace the emplace call? Something like:

    auto& probe = _probe_cache[name];
    if (!probe) {
        probe = ss::make_lw_shared<internal::engine_probe_impl>(
      this, name);
    }
    return engine_probe(probe);

or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store these as raw pointers, and the destructor of the probe removes the map entry (a sneaky pattern), because we want this to be a cache, but to cleanup the probes when all the holders to the impl are deleted.

I'm open to better approaches here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry missed that, I think you could still avoid the extra emplace call by rearranging the code a bit.

This sounds like the classical weak_ptr pattern but lw_shared probably doesn't support that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does support weak_ptr but I still need to remove the map entries if you have ideas on how to do that.

ref to weak_ptr: https://github.com/redpanda-data/seastar/blob/245e0ccfa6d58d7e0dca2b4034ce1bc43e39bdc5/include/seastar/core/weak_ptr.hh#L97-L113

@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 2, 2023

Force push: Address review comment: #14604 (comment)

@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 2, 2023

CI Failures: #14623, #14619

engine_probe engine_probe_cache::make_probe(const ss::sstring& name) {
auto probe = _probe_cache[name];
if (probe) {
return engine_probe(probe->shared_from_this());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling shared from this is odd here, no? Just copying should be fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store them as raw pointers, see my other comment.

StephanDollberg
StephanDollberg previously approved these changes Nov 2, 2023
@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 2, 2023

Force push: Remove emplace call

The engine probe will be responsible for tracking all the resources that
a Wasm engine is actively using, such as memory and CPU.

This commit only adds memory capabilities.

Signed-off-by: Tyler Rockwood <[email protected]>
To ensure that when we are reporting our absolute values across many
components we don't mess up the underlying value.

Signed-off-by: Tyler Rockwood <[email protected]>
We're going to periodically report memory usage when the engine is
invoked. This means on start and each time a transform is performed.

We also need to report our usage to 0 when the engine is stopped.

Signed-off-by: Tyler Rockwood <[email protected]>
This is an internal detail to the wasm subsystem and the wasm subsystem
will be responsible for reporting VM resource usage.

Signed-off-by: Tyler Rockwood <[email protected]>
@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 2, 2023

Force push: Add pragma once to header

@vbotbuildovich
Copy link
Collaborator

new failures detected in https://buildkite.com/redpanda/redpanda/builds/40340#018b91af-82f3-45c4-b5af-9da70fc8d536: "rptest.tests.memory_stress_test.MemoryStressTest.test_fetch_with_many_partitions.memory_share_for_fetch=0.5"

@rockwotj
Copy link
Contributor Author

rockwotj commented Nov 2, 2023

CI Failures: #14623 and bad test infra

@rockwotj rockwotj merged commit 88f913e into redpanda-data:dev Nov 2, 2023
24 checks passed
@rockwotj rockwotj deleted the wasm-probe branch November 2, 2023 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants