Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegate Flint index vacuum operation to Spark #2985

Merged

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Sep 5, 2024

Description

Currently, Flint index vacuum operation was handled by the SQL plugin to avoid launching a Spark job. This was introduced in #2557 to:

  1. Minimize the latency and cost of submitting to Spark.
  2. Offer flexibility to vacuum refreshing/active indexes directly (though this decision was ultimately reversed).

As a result, the vacuum logic in the SQL plugin became a duplicate of that in Spark. With the need for checkpoint cleanup, handling this in the SQL plugin became impractical. This PR removes the redundant logic from the SQL plugin and fully delegates the vacuum operation to Flint Spark.

Testing

Deployed OpenSearch and SQL 3.0 to an EC2 instance and tested with Flint Spark 0.6 in EMR-S as below:

$ curl --request POST \
  --url localhost:9200/_plugins/_async_query \
  --header 'content-type: application/x-ndjson' \
  --data '{"datasource": "mys3","lang": "sql","query": "CREATE MATERIALIZED VIEW mys3.default.vacuum_checkpoint_test_1 AS SELECT clientip FROM mys3.default.http_logs WITH (auto_refresh = true, checkpoint_location = '\''s3://checkpoint/vacuum-checkpoint-test-1'\'')"}'

$ curl localhost:9200/_cat/indices
yellow open flint_mys3_default_vacuum_checkpoint_test_1 iZQ9igSzS8SC5rfRcCnGvA 1 1 12291133 0  644mb  644mb

$ curl --request POST \
  --url localhost:9200/_plugins/_async_query \
  --header 'content-type: application/x-ndjson' \
  --data '{"datasource": "mys3","lang": "sql","query": "DROP MATERIALIZED VIEW mys3.default.vacuum_checkpoint_test_1"}'

$ aws s3 ls s3://checkpoint/ | grep vacuum-checkpoint-test
                           PRE vacuum-checkpoint-test-1/
2024-09-04 18:14:43          0 vacuum-checkpoint-test-1_$folder$

$ curl --request POST \
  --url localhost:9200/_plugins/_async_query \
  --header 'content-type: application/x-ndjson' \
  --data '{"datasource": "mys3","lang": "sql","query": "VACUUM MATERIALIZED VIEW mys3.default.vacuum_checkpoint_test_1"}'

$ aws s3 ls s3://checkpoint/ | grep vacuum-checkpoint-test
(empty)

Related Issues

Resolves opensearch-project/opensearch-spark#580

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen added the enhancement New feature or request label Sep 5, 2024
@dai-chen dai-chen self-assigned this Sep 5, 2024
Comment on lines -55 to -57
if (flintIndexMetadata.getFlintIndexOptions().isExternalScheduler()) {
asyncQueryScheduler.removeJob(flintIndexMetadata.getOpensearchIndexName());
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with @noCharger that this will be added to Flint Spark later.

Copy link
Collaborator

@noCharger noCharger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's make sure all code path related to VACUUM removed

.thenReturn(acknowledgedResponse);
when(acknowledgedResponse.isAcknowledged()).thenReturn(true);

openSearchFlintIndexClient.deleteIndex("test-index");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are some use case for FlintIndexClient and its deleteIndex now that we don't have vacuum handled over here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when we added this FlintIndexClient abstraction. It seems useful in future. Let me know if we want to remove it now. cc: @ykmr1224

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove it now.

@dai-chen
Copy link
Collaborator Author

dai-chen commented Sep 5, 2024

LGTM, let's make sure all code path related to VACUUM removed

Sure, will double check. Thanks!

@dai-chen
Copy link
Collaborator Author

dai-chen commented Sep 5, 2024

Only BWC failed. Same as main branch.

@dai-chen dai-chen merged commit 83e89fb into opensearch-project:main Sep 6, 2024
13 of 14 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 6, 2024
* Remove vacuum dispatch and update UT

Signed-off-by: Chen Dai <[email protected]>

* Remove unused code and test

Signed-off-by: Chen Dai <[email protected]>

* Fix jacoco test

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>
(cherry picked from commit 83e89fb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.17 2.17
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.17
# Create a new branch
git switch --create backport/backport-2985-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 83e89fb0f6a659b6cf5877d14ef260438b459c61
# Push it to GitHub
git push --set-upstream origin backport/backport-2985-to-2.17
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-2985-to-2.17.

dai-chen added a commit to dai-chen/sql-1 that referenced this pull request Sep 6, 2024
* Remove vacuum dispatch and update UT

Signed-off-by: Chen Dai <[email protected]>

* Remove unused code and test

Signed-off-by: Chen Dai <[email protected]>

* Fix jacoco test

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>
dai-chen pushed a commit that referenced this pull request Sep 6, 2024
* Remove vacuum dispatch and update UT

* Remove unused code and test

* Fix jacoco test

---------


(cherry picked from commit 83e89fb)

Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dai-chen added a commit that referenced this pull request Sep 6, 2024
…#2985) (#2995)

* Remove vacuum dispatch and update UT

* Remove unused code and test

* Fix jacoco test

---------

Signed-off-by: Chen Dai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] VACCUM index statement should delete checkpoint data
5 participants