Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flint Index Purging Logic #2372

Merged
merged 2 commits into from
Oct 27, 2023
Merged

Add Flint Index Purging Logic #2372

merged 2 commits into from
Oct 27, 2023

Conversation

kaituo
Copy link
Contributor

@kaituo kaituo commented Oct 26, 2023

Description

  • Introduce dynamic settings for enabling/disabling purging and controlling index TTL.
  • Reuse default result index name as a common prefix for all result indices.
  • Change result index to a non-hidden index for better user experience.
  • Allow custom result index specification in the data source.
  • Move default result index name from spark to core package to avoid cross-package references.
  • Add validation for provided result index name in the data source.
  • Use pattern prefix + data source name for default result index naming.

Testing:

  • Verified old documents are purged in a cluster setup.
  • Checked result index naming with and without custom names, ensuring validation is applied.

Note: Tests will be added in a subsequent PR.

Issues Resolved

#2331

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Setting.Property.NodeScope,
Setting.Property.Dynamic);

public static final Setting<Boolean> AUTO_INDEX_MANAGEMENT_ENABLED_SETTING =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for both the indices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

this.properties = properties;
this.allowedRoles = allowedRoles;
this.resultIndex = resultIndex;

if (errorMessage != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor Nit: can we move this up, In case there is a new revision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do

@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

Merging #2372 (003fe92) into main (88b1f03) will decrease coverage by 0.91%.
The diff coverage is 14.00%.

@@             Coverage Diff              @@
##               main    #2372      +/-   ##
============================================
- Coverage     96.46%   95.55%   -0.91%     
  Complexity     4918     4918              
============================================
  Files           465      468       +3     
  Lines         13522    13668     +146     
  Branches        913      915       +2     
============================================
+ Hits          13044    13061      +17     
- Misses          458      587     +129     
  Partials         20       20              
Flag Coverage Δ
sql-engine 95.55% <14.00%> (-0.91%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...rch/sql/opensearch/setting/OpenSearchSettings.java 100.00% <100.00%> (ø)
...org/opensearch/sql/spark/client/EmrClientImpl.java 100.00% <ø> (ø)
...arch/sql/spark/client/EmrServerlessClientImpl.java 100.00% <100.00%> (ø)
...earch/sql/spark/data/constants/SparkConstants.java 0.00% <ø> (ø)
...sql/spark/response/JobExecutionResponseReader.java 100.00% <100.00%> (ø)
...g/opensearch/sql/spark/response/SparkResponse.java 100.00% <100.00%> (ø)
...org/opensearch/sql/spark/cluster/IndexCleanup.java 0.00% <0.00%> (ø)
...nsearch/sql/spark/cluster/FlintIndexRetention.java 0.00% <0.00%> (ø)
...sql/spark/cluster/ClusterManagerEventListener.java 0.00% <0.00%> (ø)

@penghuo
Copy link
Collaborator

penghuo commented Oct 26, 2023

related to #2331

* @param queryForDeleteByQueryRequest query request
* @param listener action listener
*/
public void deleteDocsBasedOnShardSize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't, will remove

@kaituo
Copy link
Contributor Author

kaituo commented Oct 26, 2023

related to #2331

added the issue in pr description

@kaituo kaituo force-pushed the purge branch 2 times, most recently from ed77c86 to 1ee8c03 Compare October 26, 2023 16:11
- Introduce dynamic settings for enabling/disabling purging and controlling index TTL.
- Reuse default result index name as a common prefix for all result indices.
- Change result index to a non-hidden index for better user experience.
- Allow custom result index specification in the data source.
- Move default result index name from spark to core package to avoid cross-package references.
- Add validation for provided result index name in the data source.
- Use pattern prefix + data source name for default result index naming.

Testing:
- Verified old documents are purged in a cluster setup.
- Checked result index naming with and without custom names, ensuring validation is applied.

Note: Tests will be added in a subsequent PR.

Signed-off-by: Kaituo Li <[email protected]>
Signed-off-by: Kaituo Li <[email protected]>
this::handleSessionPurgeError);
}

private void handleSessionPurgeResponse(Long response) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purgeStatementIndex() is independent of purgeSessionIdex, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. I do it in sequence since delete by query is not a cheap query and our purging is not time sensitive. I want to achieve the purging without too much performance impact.

@penghuo penghuo merged commit 1bcacd1 into opensearch-project:main Oct 27, 2023
19 of 21 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 27, 2023
* Add Flint Index Purging Logic

- Introduce dynamic settings for enabling/disabling purging and controlling index TTL.
- Reuse default result index name as a common prefix for all result indices.
- Change result index to a non-hidden index for better user experience.
- Allow custom result index specification in the data source.
- Move default result index name from spark to core package to avoid cross-package references.
- Add validation for provided result index name in the data source.
- Use pattern prefix + data source name for default result index naming.

Testing:
- Verified old documents are purged in a cluster setup.
- Checked result index naming with and without custom names, ensuring validation is applied.

Note: Tests will be added in a subsequent PR.

Signed-off-by: Kaituo Li <[email protected]>

* address comments

Signed-off-by: Kaituo Li <[email protected]>

---------

Signed-off-by: Kaituo Li <[email protected]>
(cherry picked from commit 1bcacd1)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 27, 2023
* Add Flint Index Purging Logic

- Introduce dynamic settings for enabling/disabling purging and controlling index TTL.
- Reuse default result index name as a common prefix for all result indices.
- Change result index to a non-hidden index for better user experience.
- Allow custom result index specification in the data source.
- Move default result index name from spark to core package to avoid cross-package references.
- Add validation for provided result index name in the data source.
- Use pattern prefix + data source name for default result index naming.

Testing:
- Verified old documents are purged in a cluster setup.
- Checked result index naming with and without custom names, ensuring validation is applied.

Note: Tests will be added in a subsequent PR.

Signed-off-by: Kaituo Li <[email protected]>

* address comments

Signed-off-by: Kaituo Li <[email protected]>

---------

Signed-off-by: Kaituo Li <[email protected]>
(cherry picked from commit 1bcacd1)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
penghuo pushed a commit that referenced this pull request Oct 28, 2023
* Add Flint Index Purging Logic

- Introduce dynamic settings for enabling/disabling purging and controlling index TTL.
- Reuse default result index name as a common prefix for all result indices.
- Change result index to a non-hidden index for better user experience.
- Allow custom result index specification in the data source.
- Move default result index name from spark to core package to avoid cross-package references.
- Add validation for provided result index name in the data source.
- Use pattern prefix + data source name for default result index naming.

Testing:
- Verified old documents are purged in a cluster setup.
- Checked result index naming with and without custom names, ensuring validation is applied.

Note: Tests will be added in a subsequent PR.



* address comments



---------


(cherry picked from commit 1bcacd1)

Signed-off-by: Kaituo Li <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
penghuo pushed a commit that referenced this pull request Oct 28, 2023
* Add Flint Index Purging Logic

- Introduce dynamic settings for enabling/disabling purging and controlling index TTL.
- Reuse default result index name as a common prefix for all result indices.
- Change result index to a non-hidden index for better user experience.
- Allow custom result index specification in the data source.
- Move default result index name from spark to core package to avoid cross-package references.
- Add validation for provided result index name in the data source.
- Use pattern prefix + data source name for default result index naming.

Testing:
- Verified old documents are purged in a cluster setup.
- Checked result index naming with and without custom names, ensuring validation is applied.

Note: Tests will be added in a subsequent PR.



* address comments



---------


(cherry picked from commit 1bcacd1)

Signed-off-by: Kaituo Li <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mengweieric added a commit to mengweieric/sql that referenced this pull request Nov 8, 2023
mengweieric added a commit to mengweieric/sql that referenced this pull request Nov 8, 2023
vamsimanohar added a commit to mengweieric/sql that referenced this pull request Nov 13, 2023
vamsimanohar added a commit that referenced this pull request Nov 13, 2023
* Revert "Add more metrics and handle emr exception message (#2422) (#2426)"

This reverts commit b57f7cc.

* Revert "Block settings in sql query settings API and add more unit tests (#2407) (#2412)"

This reverts commit 3024737.

* Revert "Added session, statement, emrjob metrics to sql stats api (#2398) (#2400)"

This reverts commit 6e17ae6.

* Revert "Redefine Drop Index as logical delete (#2386) (#2397)"

This reverts commit e939bb6.

* Revert "add concurrent limit on datasource and sessions (#2390) (#2395)"

This reverts commit deb3ccf.

* Revert "Add Flint Index Purging Logic (#2372) (#2389)"

This reverts commit dd48b9b.

* Revert "Refactoring for tags usage in test files and also added explicit denly list setting. (#2383) (#2385)"

This reverts commit 37e010f.

* Revert "Enable session by default (#2373) (#2375)"

This reverts commit 7d95e4c.

* Revert "Create new session if client provided session is invalid (#2368) (#2371)"

This reverts commit 5ab7858.

* Revert "Add where clause support in create statement (#2366) (#2370)"

This reverts commit b620a56.

* Revert "create new session if current session not ready (#2363) (#2365)"

This reverts commit 5d07281.

* Revert "Handle Describe,Refresh and Show Queries Properly (#2357) (#2362)"

This reverts commit 16e2f30.

* Revert "Add Session limitation (#2354) (#2359)"

This reverts commit 0f334f8.

* Revert "Bug Fix, support cancel query in running state (#2351) (#2353)"

This reverts commit 9a40591.

* Revert "Fix bug, using basic instead of basicauth (#2342) (#2355)"

This reverts commit e4827a5.

* Revert "Add missing tags and MV support (#2336) (#2346)"

This reverts commit 8791bb0.

* Revert "[Backport 2.x] deprecated job-metadata-index (#2340) (#2343)"

This reverts commit bea432c.

* Revert "Integration with REPL Spark job (#2327) (#2338)"

This reverts commit 58a5ae5.

* Revert "Implement patch API for datasources (#2273) (#2329)"

This reverts commit 4c151fe.

* Revert "Add sessionId parameters for create async query API (#2312) (#2324)"

This reverts commit 3d1a376.

* Revert "Add Statement (#2294) (#2318) (#2319)"

This reverts commit b3c2e94.

* Revert "Upgrade json (#2307) (#2314)"

This reverts commit 6c65bb4.

* Revert "Minor Refactoring (#2308) (#2317)"

This reverts commit 051cc4f.

* Revert "add InteractiveSession and SessionManager (#2290) (#2293) (#2315)"

This reverts commit 6ac197b.

---------

Co-authored-by: Vamsi Manohar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants