Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Fix stuck segments upload issue #11021

Closed
wants to merge 1 commit into from

Conversation

ashking94
Copy link
Member

@ashking94 ashking94 commented Oct 31, 2023

Description

There is a use case where when segments are getting uploaded to remote store and a flush/force merge happens, there is a possibility that the segment tracker's data can get erased while there is async refresh retry is happening. This leads to latch not getting counted down.

for (String src : filteredFiles) {
// Initializing listener here to ensure that the stats increment operations are thread-safe
UploadListener statsListener = createUploadListener();
ActionListener<Void> aggregatedListener = ActionListener.wrap(resp -> {
statsListener.onSuccess(src);
batchUploadListener.onResponse(resp);
}, ex -> {
logger.warn(() -> new ParameterizedMessage("Exception: [{}] while uploading segment files", ex), ex);
if (ex instanceof CorruptIndexException) {
indexShard.failShard(ex.getMessage(), ex);
}
statsListener.onFailure(src);
batchUploadListener.onFailure(ex);
});
statsListener.beforeUpload(src);
remoteDirectory.copyFrom(storeDirectory, src, IOContext.DEFAULT, aggregatedListener);

Related Issues

Resolves #11020

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added bug Something isn't working Storage:Durability Issues and PRs related to the durability framework Storage:Remote labels Oct 31, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 31, 2023

Compatibility status:

Checks if related components are compatible with change ee651e7

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

@ashking94 ashking94 self-assigned this Nov 1, 2023
Copy link
Contributor

github-actions bot commented Nov 1, 2023

Gradle Check (Jenkins) Run Completed with:

@ashking94
Copy link
Member Author

Gradle Check (Jenkins) Run Completed with:

Known flaky tests - #10193, #9499

Copy link
Contributor

github-actions bot commented Nov 2, 2023

Gradle Check (Jenkins) Run Completed with:

@dblock
Copy link
Member

dblock commented Nov 21, 2023

@ashking94 Please see through gradle check failures above?

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test that reproduces the bug as described and fails without the fix, please.

@ashking94 ashking94 marked this pull request as draft November 28, 2023 03:07
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 3, 2024
@ticheng-aws
Copy link
Contributor

Hi @ashking94, the PR is stalled. Do we have any updates?

@ashking94
Copy link
Member Author

Hi @ashking94, the PR is stalled. Do we have any updates?

I plan to take it up real soon.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 6, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 6, 2024
@ashking94
Copy link
Member Author

ashking94 commented Mar 8, 2024

Part problem that this PR is solving has been fixed with #11896 already. Closing this PR.

@ashking94 ashking94 closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working skip-changelog stalled Issues that have stalled Storage:Durability Issues and PRs related to the durability framework Storage:Remote
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[BUG] Stuck segments upload leads to high refresh lag
4 participants