Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ra metrics to be reported upon parsing completed or accumulated #37

Closed
wants to merge 69 commits into from

Conversation

pgomulka
Copy link
Owner

RAmetric can be implemented so that they could be reported before they are being indexed (like with a new field being added)
or they could be accumulated and reported upon shard commit as an additional metadata

This commit addes new method to DocumentSizeReporter#onParsingComplted DocumentSizeAccumulator that is being used to accumulate the size inbetween the commits DocumentSizeReporter can be parametrised with a DocumentSizeAccumulator

based on elastic#108449

pgomulka and others added 30 commits May 14, 2024 14:13
RAmetric can be implemented so that they could be reported before they are being indexed (like
with a new field being added)
or they could be accumulated and reported upon shard commit as an additional metadata

This commit addes new method to DocumentSizeReporter#onParsingComplted
DocumentSizeAccumulator that is being used to accumulate the size inbetween the commits
DocumentSizeReporter can be parametrised with a DocumentSizeAccumulator

based on elastic#108449
See https://bugs.openjdk.org/browse/JDK-8329528. The applied workaround
was suggested on the linked issue, and was tested and confirmed to avoid
the G1 bug.
Encapsulates this component of the snapshot deletion process so we can
follow up with some optimizations in isolation.

Relates elastic#108278
The instances returned here are immutable so we can use singletons.
We can also be smarter about the way we serialize them to have arrays
of the minimal size, e.g. 2 bytes only for JSON instead of a wasteful
1024.
…lastic#108589)

I observed that `testRegisterRepositorySuccessAfterCreationFailed` test
never invokes assertion blocks, because listener is not invoked.

There are 2 problems:

1. Test setup used mocks. Mocks interrupt listener chain propagation, so registerRepository never returned Response or Failure.
2. We silently ignore assertions in listener because it is not invoked. Test pass successfully.

PutRepositories method relies on cluster state update. I replace mocked
ClusterService and ThreadPool with test implementation of these. Also
add blocking call on listener to ensure we get result.

Address
[comment](elastic#108531 (review))
to break down larger PR into smaller pieces in elastic#108531
…ncy (elastic#108463)

the in SystemIndexThreadPoolTestCase#testUserThreadPoolsAreBlocked sometimes was blocked instead of throwing an expected exception (due to a queue on a threadpool being full).
IT happens that submitting a busyTask does not guarantee that it will be executed immediately by a threadpool. It might be that some other task was executing at the time
This commit refactors the way threadpool is populated and makes sure that before the queue are filled, all the busyTasks are executed on threadpools

based on the test failure -> elastic#107625 thread pool's threads were busy, but I cannot tell if a queue was full before the search request was submitted.
…lastic#108540)

We should make sure to find leaks reported by both of these, these days
our `LeakTracker` will likely be more sensitive that Netty's in some
cases since our objects refer to Netty objects and thus get collected
first.
This PR adds basic infra for mapper metrics and adds first metrics for
synthetic source load latency.
Dry up the code for all the `byte[]` backed versions of the big array
a little. The motivation here (outside of making the code drier now) is
to follow-up with possible optimizations for sparsely populated variants
of this thing.
The workaround requires to JDK args, but SystemJvmOptions actually
operates on individual JDK args. This commit adjusts SystemJvmOptions to
allow adding sets of jdk args together.
This PR prevents assigning DLS/FLS to `search` if `replication` is also
assigned, to avoid edge-case failure modes. The use of _existing_ API
keys with DLS/FLS in `search` with `replication` is likewise blocked.
With this commit we add a new E2E test for retrieving stacktraces
without any use of profiling APIs. We add this test because we have
observed very rare values with wrong document counts which we could
narrow down to wrong results from the aggregations API but we don't
understand the root cause. This test is a first step towards a minimal
reproduction that solely relies on core Elasticsearch APIs.
)

In this PR we update the snapshot and restore implementations to include
the failure store of a data stream when a data stream is being snapshot.

- When a data stream is requested to be snapshot, then it implies all backing indices and failure stores.
- When a data stream is being restored, then it implies all backing indices and failure stores (assuming the feature flag is also enabled)
- When individual backing or failure store indices that have been removed are being restored they need to be manually added to the data stream. This functionality worked out of the box.
rjernst and others added 29 commits May 15, 2024 09:14
This commit re-bumps the bundled JDK to Java 22 now that we have
a tested workaround for the G1GC bug
(https://bugs.openjdk.org/browse/JDK-8329528).

relates elastic#108571
relates elastic#106987
this is just to find out if we ever remove the tasks from the
threadPoolExecutor in order to make the Kibana(System) ThreadPool tests
reliable
Currently it is possible for the MockTransportService distrupt behavior
to swallow requests if either the connection is already closed (in which
case response pruning has already occurred) or if the behavior is added
after the clear callback has been triggered.
* mixed cluster tests are executable

* add tests from upgrade tests

* [ML] Add mixed cluster tests for existing services

* clean up

* review improvements

* spotless

* remove blocked AzureOpenAI mixed IT

* improvements from DK review

* temp for testing

* refactoring and documentation

* Revert manual testing configs of "temp for testing"

This reverts parts of commit fca46fd.

* revert TESTING.asciidoc formatting

* Update TESTING.asciidoc to avoid reformatting

* add minimum version for tests to match minimum version in services

* spotless
…ic#108691)

It's possible for a node-left task to get interrupted prior to removing
the node from the master's list of faultyNodes. Nodes on the faultyNodes
list do not receive cluster state updates, and are eventually removed.

Subsequently, when the node attempts to rejoin, after test network
disruptions have ceased, the node-join request can succeed, but the
node will never receive the cluster state update, consider the node-join
a failure, and will resend node-join requests until the LagDetector
removes the node from the faultyNodes list.
elastic#108690 will address the
node-join issue.

Closes elastic#91447
This remove the `OPTIONS` clause of the `FROM` command.
Previously readiness waited only on a master node being elected.
Recently it was also made to wait on file settings being applied. Yet
the node may be fully started before those file settings are applied.
The test expected readiness was ok after the node finishes starting.

This commit retries the readiness check until it succeeds since
readiness state will be updated async to the node finishing starting.

closes elastic#108523
This commit overrides dumpDebug for DockerTests to pull the log from
docker rather than looking in the filesystem.
In preparation for elastic#108210, this commit adds a separate method to gather
MappedActionFilter instances. For now this remains compatible with the
existing getActionFilters by allowing MappedActionFilter to exist in
both places.
…08593)

* [DOCS] Fix documentation for timeout-related parameters

Closes elastic#108224
Upon further discussion we decided not to do any of these things, so
this commit removes the leftover TODO comments.
The failure was happening as it was incorrectly setting up initial state.

It was generating unassigned primary shards with corresponding reserved space for initial primary data on a node specified in lastAllocatedNodeId. It was assumed that it would be a node the shard is going to be assigned to, however it was not the case as gateway allocator is bypassed when computing the desired balance. As a result the shard was started on some other random node scewing the expected disk computation.

The fix uses INITIALIZING shard state to guarantee that the shard is going to be first time initialized on expected node.
…#108633)

This PR adds missing role description for the `transport_client role`, 
and a test to enforce that all reserved roles are described. 
The description also serves as self-documentation for roles, 
thus it is reasonable to make this a requirement for all reserved roles.

Relates to elastic#108422, which included descriptions for other reserved roles.
…ulkAction (elastic#108449)

previously DocumentSizeReporter was reporting upon indexing being completed in TransportShardBulkAction#onComplete
This commit renames the method to onIndexingCompleted and moves that reporting to IndexEngine in serverless plugin.
This will be followed up in a separate PR that will be reporting in an Engine#index subclass (serverless)
@pgomulka pgomulka closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.