Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flint query scheduler part 2 #2961

Merged

Conversation

noCharger
Copy link
Collaborator

@noCharger noCharger commented Sep 3, 2024

Description

  1. Abstract AsyncQueryScheduler interface, with AsyncQuerySchedulerRequest model
  2. Inject AsyncQueryScheduler to FlintIndexOpFactory to support DROP, VACUUM, and ALTER ... WITH (auto_refresh = false)
  3. Register OpenSearch scheduler index as system index
  4. Refactor refresh job to support any scheduled query, decoupled scheduler job from AsyncQueryScheduler
  5. FlintStreamingJobHouseKeeperTask uses flintIndexOpFactory so it integrates seamlessly

Local sanity test

Drop should unschedule

% curl -X POST -H "Content-Type: application/json" -d '{
  "datasource": "myglue_test",
  "lang": "sql",
  "query": "DROP MATERIALIZED VIEW myglue_test.default.count_by_status"
}' http://localhost:9200/_plugins/_async_query
{
  "queryId": "UDVuUE1jdDFqVm15Z2x1ZV90ZXN0"
}% 

                        
[2024-09-02T15:24:19,262][INFO ][o.o.s.s.a.AsyncQueryExecutorServiceImpl] CreateAsyncQueryRequest: DROP MATERIALIZED VIEW myglue_test.default.count_by_status
[2024-09-02T15:24:19,307][INFO ][o.o.j.s.JobScheduler     ] Descheduling jobId: flint_myglue_test_default_count_by_status
[2024-09-02T15:24:19,348][INFO ][o.o.s.s.s.OpenSearchAsyncQueryScheduler]  Unscheduled job for jobId: flint_myglue_test_default_count_by_status     

Vacuum should remove schedule

% curl -X POST -H "Content-Type: application/json" -d '{
  "datasource": "myglue_test",
  "lang": "sql",
  "query": "VACUUM MATERIALIZED VIEW myglue_test.default.count_by_status"
}' http://localhost:9200/_plugins/_async_query
{
  "queryId": "Zk05NXdZdFNET215Z2x1ZV90ZXN0"
}%

[2024-09-02T15:56:22,959][INFO ][o.o.s.s.a.AsyncQueryExecutorServiceImpl] CreateAsyncQueryRequest: VACUUM MATERIALIZED VIEW myglue_test.default.count_by_status
[2024-09-02T15:56:23,027][INFO ][o.o.s.s.f.o.FlintIndexOpVacuum] Vacuuming Flint index flint_myglue_test_default_count_by_status
[2024-09-02T15:56:23,095][INFO ][o.o.c.m.MetadataDeleteIndexService] [flint_myglue_test_default_count_by_status/50Au8FsdRPeLJxqLIIFnDA] deleting index                                    
[2024-09-02T15:56:23,151][INFO ][o.o.s.s.f.OpenSearchFlintIndexClient]  OpenSearch index delete result: true

green open .async-query-scheduler ooAqua8HSqyFGhfa1R_NUA 1 0 1 0 7.8kb 7.8kb
green open .async-query-scheduler ooAqua8HSqyFGhfa1R_NUA 1 0 0 1 11kb 11kb

Related Issues

#2833

Follow ups

  1. The curreent RefreshQueryHandler doesn't check whether an index is in refreshing state before sending EMR job, with scheduler this can be optimized

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Louis Chu <[email protected]>
Signed-off-by: Louis Chu <[email protected]>
Comment on lines +37 to +38
public boolean isExternalScheduler() {
// Default is false, which means using internal scheduler to refresh the index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal/External sounds ambiguous to me. Can we rephrase or add documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does internal means within spark? I think it is difficult to find that documentation from this option defined in SQL plugin. And I would imagine internal is internal to OpenSearch...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah internal means within spark and external could be anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to create a blocker now, but @ykmr1224 raised a good question.

We should have the name of this flag from the perspective of spark. Probably use_spark_scheduler = True | False would make more sense because we actually don't care about what an external scheduler is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from spark perspective, we want to have extensibility among external Schedulers in case they have difference behavior

"^(\\d+)\\s*(years?|months?|weeks?|days?|hours?|minutes?|minute|mins?|seconds?|secs?|milliseconds?|millis?|micros?|nanoseconds?|nanos?)$",
Pattern.CASE_INSENSITIVE);

public static IntervalSchedule parse(String intervalStr, Instant startTime) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a right decision to implement it by ourselves? Is there common format for time duration and library for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a right decision to implement it by ourselves? Is there common format for time duration and library for that?

The problem here is to converting Spark CalendarInterval to OpenSearch job scheduler interval.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use Spark CalendarInterval?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark CalendarInterval is the interface for spark scheduler, we will need this when convert stream job to external scheduler

Signed-off-by: Louis Chu <[email protected]>
Signed-off-by: Louis Chu <[email protected]>
vamsimanohar
vamsimanohar previously approved these changes Sep 4, 2024
Copy link
Member

@vamsimanohar vamsimanohar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Are we tracking the optimization of using spark's scheduler in case of lower refresh intervals as a separate issue?

Signed-off-by: Louis Chu <[email protected]>
@noCharger noCharger force-pushed the feature-query-scheduler-part-2 branch from c85ed20 to 1e489b5 Compare September 4, 2024 16:27
@noCharger
Copy link
Collaborator Author

LGTM. Are we tracking the optimization of using spark's scheduler in case of lower refresh intervals as a separate issue?

opensearch-project/opensearch-spark#622

LOGGER.error(throwable);
}
};
threadPool.generic().submit(runnable);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Is it considered best practice to submit tasks to a generic thread pool across all job scheduler clients?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in f9fe064

ykmr1224
ykmr1224 previously approved these changes Sep 4, 2024
Signed-off-by: Louis Chu <[email protected]>
@noCharger noCharger force-pushed the feature-query-scheduler-part-2 branch from c1f0054 to f9fe064 Compare September 4, 2024 19:04
@seankao-az seankao-az merged commit 729bb13 into opensearch-project:main Sep 4, 2024
12 of 16 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 4, 2024
* Flint query scheduler part 2

Signed-off-by: Louis Chu <[email protected]>

* spotless apply

Signed-off-by: Louis Chu <[email protected]>

* Add UT

Signed-off-by: Louis Chu <[email protected]>

* Resolve comments

Signed-off-by: Louis Chu <[email protected]>

* Add more UTs

Signed-off-by: Louis Chu <[email protected]>

* Resolve comments

Signed-off-by: Louis Chu <[email protected]>

* Use SQL thread pool

Signed-off-by: Louis Chu <[email protected]>

---------

Signed-off-by: Louis Chu <[email protected]>
(cherry picked from commit 729bb13)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 4, 2024
* Flint query scheduler part 2

Signed-off-by: Louis Chu <[email protected]>

* spotless apply

Signed-off-by: Louis Chu <[email protected]>

* Add UT

Signed-off-by: Louis Chu <[email protected]>

* Resolve comments

Signed-off-by: Louis Chu <[email protected]>

* Add more UTs

Signed-off-by: Louis Chu <[email protected]>

* Resolve comments

Signed-off-by: Louis Chu <[email protected]>

* Use SQL thread pool

Signed-off-by: Louis Chu <[email protected]>

---------

Signed-off-by: Louis Chu <[email protected]>
(cherry picked from commit 729bb13)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ykmr1224 pushed a commit that referenced this pull request Sep 5, 2024
* Flint query scheduler part 2



* spotless apply



* Add UT



* Resolve comments



* Add more UTs



* Resolve comments



* Use SQL thread pool



---------


(cherry picked from commit 729bb13)

Signed-off-by: Louis Chu <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@seankao-az seankao-az mentioned this pull request Sep 5, 2024
7 tasks
@seankao-az
Copy link
Collaborator

seankao-az commented Sep 5, 2024

Found some bug after merge..
in AsyncQueryCoreIntegTest.java
some verifyStoreJobMetadataCalled only use 1 argument, when 2 is expected
This is a change introduced in #2955

Edit: fixed in #2973

seankao-az pushed a commit that referenced this pull request Sep 5, 2024
* Flint query scheduler part 2



* spotless apply



* Add UT



* Resolve comments



* Add more UTs



* Resolve comments



* Use SQL thread pool



---------


(cherry picked from commit 729bb13)

Signed-off-by: Louis Chu <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants