Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metric for total downsampling latency #106747

Merged
merged 12 commits into from
Mar 27, 2024

Conversation

kkrik-es
Copy link
Contributor

This metric tracks latency on master side, in addition to per-shard metrics that were added in #106632

@kkrik-es kkrik-es added >non-issue :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine labels Mar 26, 2024
@kkrik-es kkrik-es self-assigned this Mar 26, 2024
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-1

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine test this

@kkrik-es kkrik-es requested a review from martijnvg March 26, 2024 15:39
@kkrik-es kkrik-es marked this pull request as ready for review March 26, 2024 15:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@@ -173,6 +177,12 @@ public TransportDownsampleAction(
this.threadContext = threadPool.getThreadContext();
this.taskQueue = clusterService.createTaskQueue("downsample", Priority.URGENT, STATE_UPDATE_TASK_EXECUTOR);
this.persistentTasksService = persistentTasksService;
this.downsampleMetrics = downsampleMetrics;
this.startTime = client.threadPool().relativeTimeInMillis();
Copy link
Contributor

@lkts lkts Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my education - we actually create new instance of action for every invocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. My understanding is that actions are initialized once in the plugin, then their methods (masterOperation) are called per incoming request.

@@ -30,6 +30,7 @@
public class DownsampleMetrics extends AbstractLifecycleComponent {

public static final String LATENCY_SHARD = "es.tsdb.downsample.latency.shard.histogram";
public static final String LATENCY_MASTER = "es.tsdb.downsample.latency.master.histogram";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The master part is leaking implementation details that may change in the future. I think that maybe api latency or operation latency is a better name here. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to total.

@@ -459,12 +470,16 @@ public void onResponse(PersistentTasksCustomMetadata.PersistentTask<PersistentTa
if (countDown.decrementAndGet() == 0) {
logger.info("All downsampling tasks completed [" + numberOfShards + "]");
updateTargetIndexSettingStep(request, listener, sourceIndexMetadata, downsampleIndexName, parentTask);
downsampleMetrics.recordLatencyMaster(getDurationInMillis(), DownsampleMetrics.ActionStatus.SUCCESS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to take into account the time it takes to perform the refresh and force merge? If so then we should move this to ForceMergeActionListener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, and added also in the failure handlers, ptal.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kkrik-es kkrik-es merged commit a2af99c into elastic:main Mar 27, 2024
14 checks passed
@kkrik-es kkrik-es deleted the metrics/downsample-latency branch June 27, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants