Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disttask: task executor slot manager #49136

Merged
merged 28 commits into from
Dec 20, 2023

Conversation

okJiang
Copy link
Member

@okJiang okJiang commented Dec 4, 2023

What problem does this PR solve?

Issue Number: close #49135

Problem Summary:

In distributed framework, we introduced slot manager to control resource.

What changed and how does it work?

in scheduler(task executor)

  • check the available slot before open scheduler
  • count the slot by slot manager

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented Dec 4, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed labels Dec 4, 2023
Copy link

tiprow bot commented Dec 4, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 4, 2023
@okJiang okJiang marked this pull request as ready for review December 5, 2023 06:50
@okJiang
Copy link
Member Author

okJiang commented Dec 5, 2023

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Dec 5, 2023
@okJiang okJiang changed the title [wip]dist/task: scheduler slot manager dist/task: scheduler slot manager Dec 5, 2023
@ti-chi-bot ti-chi-bot bot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed labels Dec 5, 2023
@okJiang okJiang changed the title dist/task: scheduler slot manager disttask: scheduler slot manager Dec 5, 2023
@@ -179,7 +185,7 @@ func TestManager(t *testing.T) {
mockTaskTable := mock.NewMockTaskTable(ctrl)
mockInternalExecutor := mock.NewMockTaskExecutor(ctrl)
mockPool := mock.NewMockPool(ctrl)
b := NewManagerBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add manager test with slot_manager

@@ -79,6 +80,7 @@ const (
exec_id varchar(256),
exec_expired timestamp,
state varchar(64) not null,
priority int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrade

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is useless, I will remove it

Comment on lines 208 to 212
priorityQueue := make(proto.TaskPriorityQueue, 0)
heap.Init(&priorityQueue)
for _, task := range tasks {
heap.Push(&priorityQueue, proto.WrapPriorityQueue(task))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not considering handling tasks

Copy link
Member Author

@okJiang okJiang Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been filtered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been filtered

But we will take running tasks into consideration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been filtered

But we should take running tasks into consideration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean after TiDB restarts?


type slotInfo struct {
taskID int
priority int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is reserved, I can add a comment for it.

@@ -45,6 +47,7 @@ var (
// ManagerBuilder is used to build a Manager.
type ManagerBuilder struct {
newPool func(name string, size int32, component util.Component, options ...spool.Option) (Pool, error)
// slotAlloctor alloctor.Alloctor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for?

Comment on lines 49 to 52
func NewTestManagerBuilder(ctrl *gomock.Controller) *ManagerBuilder {
b := NewManagerBuilder()
return b
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this function?

Comment on lines 18 to 21
type MockAlloctor struct {
ctrl *gomock.Controller
recorder *MockAlloctorMockRecorder
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not used for now

@@ -3,7 +3,7 @@ load("@io_bazel_rules_go//go:def.bzl", "go_library", "go_test")
go_library(
name = "resourcemanager",
srcs = [
"rm.go",
"resource_manager.go",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert the change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource_manager.go is more meaningful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource_manager.go is more meaningful

Not the code of disttask

// See the License for the specific language governing permissions and
// limitations under the License.

package taskexecutor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slot_manager.go. ----> slots.go

// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slot_manager_test.go ----> slots_test.go

This commit refactors the slot management in the task executor module. It introduces a new `slotManager` struct that handles the reservation and release of slots for executing tasks. The `slotManager` keeps track of the available slots and the tasks currently occupying the slots.

The changes include renaming some files and functions for better clarity. The `slot_manager.go` file is renamed to `slot.go`, and the `slot_manager_test.go` file is renamed to `slot_test.go`. The `slotManager` struct now has a `reserve` method to reserve slots for a task and an `unReserve` method to release the slots when the task is completed.

These changes improve the readability and maintainability of the code, making it easier to understand and modify the slot management logic in the task executor.
@okJiang okJiang changed the title disttask: scheduler slot manager disttask: task executor slot manager Dec 15, 2023
Copy link
Contributor

@ywqzzy ywqzzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 18, 2023
@okJiang
Copy link
Member Author

okJiang commented Dec 18, 2023

/retest


// canReserve is used to check whether the instance has enough slots to run the task.
func (sm *slotManager) canReserve(task *proto.Task) bool {
return sm.available >= task.Concurrency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sm.RLock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/disttask/framework/taskexecutor/manager.go Outdated Show resolved Hide resolved
pkg/disttask/framework/taskexecutor/manager.go Outdated Show resolved Hide resolved
pkg/disttask/framework/taskexecutor/manager.go Outdated Show resolved Hide resolved
Comment on lines 24 to 28
const (
managerID = "test"
taskID = int64(1)
taskID2 = int64(2)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those simple var can just duplicate in test where it's used

defer func() {
failpoint.Disable("github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor/taskTick")
}()
go m.fetchAndHandleRunnableTasksLoop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can call onRunnableTasks directly to simplify mock, and we will not need failpoint anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not use failpoint, how can we check the status of slotManager?🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can call onRunnableTasks with 1 task and check in this case

Comment on lines 426 to 429

failpoint.Inject("taskTick", func() {
<-onRunnableTaskTick
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, just let executor ext block on Run using mock

type slotInfo struct {
taskID int
// priority will be used in future
priority int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better store task itself, priority itself cannot determine task order. ok to do it later

@wuhuizuo
Copy link
Contributor

/remove-label needs-rebase

Copy link

ti-chi-bot bot commented Dec 19, 2023

@wuhuizuo: The label(s) needs-rebase cannot be applied. These labels are supported: fuzz/sqlancer, challenge-program, compatibility-breaker, first-time-contributor, contribution, good first issue, correctness, duplicate, proposal, security, ok-to-test, needs-ok-to-test, needs-more-info, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.5, needs-cherry-pick-release-7.1, needs-cherry-pick-release-7.5, affects-5.4, affects-6.1, affects-6.5, affects-7.1, affects-7.5, may-affects-5.4, may-affects-6.1, may-affects-6.5, may-affects-7.1, may-affects-7.5.

In response to this:

/remove-label needs-rebase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@wuhuizuo wuhuizuo removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 19, 2023
continue
}
m.addHandlingTask(task.ID)
m.slotManager.alloc(task)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's moved out, we have to free on error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

tiprow bot commented Dec 20, 2023

@okJiang: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tiprow_fast_test 06422f4 link true /test tiprow_fast_test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@D3Hunter
Copy link
Contributor

/lgtm

Copy link

ti-chi-bot bot commented Dec 20, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, ywqzzy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 20, 2023
Copy link

ti-chi-bot bot commented Dec 20, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-12-18 02:08:04.107448508 +0000 UTC m=+840375.144675437: ☑️ agreed by ywqzzy.
  • 2023-12-20 06:47:32.680304098 +0000 UTC m=+1029943.717531027: ☑️ agreed by D3Hunter.

@ti-chi-bot ti-chi-bot bot merged commit 27aed31 into pingcap:master Dec 20, 2023
17 of 18 checks passed
@okJiang okJiang deleted the disttask-rc/scheduler-slot-manager branch December 20, 2023 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement scheduler slot manager
5 participants