Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(portal): 将文件直接提交为sbatch命令 #891

Merged
merged 15 commits into from
Nov 11, 2023

Conversation

piccaSun
Copy link
Contributor

@piccaSun piccaSun commented Oct 13, 2023

做了什么

需求
在文件管理中增加对所有文件对象增加”提交“按钮直接sbatch执行
暂时只考虑提交的文本作为作业直接执行
执行后也应该展示在正在运行的作业和已结束的作业中
提交文件大小限制为1M

实现
此pr完成上述需求
增加文件是否为文本文件的判断
增加此操作在审计系统中的操作日志
image
image
image

提交失败时
fileSize-error
not-text
sbatchfailed
操作日志追加
image

调度器适配器接口仓库分支暂时变更为临时分支,测试结束后合并调度器适配器接口分支进主分支,修改复原SCOW中调度器适配器仓库

@changeset-bot
Copy link

changeset-bot bot commented Oct 13, 2023

🦋 Changeset detected

Latest commit: e4c1bdf

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
@scow/grpc-api Minor
@scow/scheduler-adapter-protos Minor
@scow/lib-scheduler-adapter Minor
@scow/portal-server Minor
@scow/test-adapter Minor
@scow/portal-web Minor
@scow/mis-web Minor
@scow/utils Minor
@scow/protos Patch
@scow/mis-server Minor
@scow/lib-web Patch
@scow/audit-server Patch
@scow/auth Minor
@scow/gateway Minor
@scow/lib-hook Patch
@scow/lib-operation-log Patch
@scow/cli Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@piccaSun piccaSun changed the title impl feat(portal): 将文件直接提交为sbatch命令 Oct 13, 2023
@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (e1e3230) 68.89% compared to head (e4c1bdf) 68.67%.
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #891      +/-   ##
==========================================
- Coverage   68.89%   68.67%   -0.23%     
==========================================
  Files         132      132              
  Lines        3926     3946      +20     
  Branches      523      528       +5     
==========================================
+ Hits         2705     2710       +5     
- Misses       1123     1138      +15     
  Partials       98       98              
Files Coverage Δ
apps/portal-server/src/utils/clusters.ts 42.85% <25.00%> (-44.65%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@piccaSun piccaSun marked this pull request as ready for review October 19, 2023 09:06
@piccaSun piccaSun requested a review from ddadaal October 19, 2023 09:06
Copy link
Member

@ddadaal ddadaal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个功能可以不增加新的调度器接口,而完全由SCOW完全使用已有的调度器接口实现吗?

"@scow/grpc-api": patch
---

新增 submitFileAsJob 接口,连接 slurm 调度器,直接提交文件 sbatch 执行
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改说法,后端不一定是slurm调度器。改成“直接把文件作为作业提交调度器执行"

"@scow/mis-web": patch
---

新增文件管理下提交任意文件直接作为作业文本 sbatch 执行的功能
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注意说法,”直接作为作业文本提交调度器执行“

@@ -5,7 +5,7 @@
"main": "build/index.js",
"private": true,
"scripts": {
"generate": "rimraf generated && buf generate --template buf.gen.yaml https://github.com/PKUHPC/scow-scheduler-adapter-interface.git",
"generate": "rimraf generated && buf generate --template buf.gen.yaml https://github.com/PKUHPC/scow-scheduler-adapter-interface.git#branch=add_job_submit_file_as_job_proto",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCOW主线代码里不能出现这种使用调度器接口仓库的非master分支的情况。这个PR应该等新的调度器接口发布后才合并。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,这个是打算全部review、测试完没有问题再把scow-slurm-interface那边合并到主分支,然后再修改scow这边generate的调度器适配器仓库

@piccaSun
Copy link
Contributor Author

这个功能可以不增加新的调度器接口,而完全由SCOW完全使用已有的调度器接口实现吗?

因为原有调度器接口是在提交作业时,获得所有提交作业的参数,然后调度器自己执行类似下方的脚本拼接

	scriptString += "#SBATCH " + "-A " + in.Account + "\n"
	scriptString += "#SBATCH " + "--partition=" + in.Partition + "\n"

所以如果想使用旧的接口,需要scow逐句解析出任意文件的对应各个参数,跟杨老师一起商量了一下最后新增了调度器的接口

@piccaSun
Copy link
Contributor Author

changeset文字已修改,请确认 @ddadaal

@piccaSun piccaSun requested a review from ddadaal October 19, 2023 16:46
@ddadaal
Copy link
Member

ddadaal commented Oct 20, 2023

OK,那可以保留这个API。如果功能测试没有问题了,去接口的仓库提交PR,先把接口合并后再来合并这个。

@piccaSun
Copy link
Contributor Author

OK,那可以保留这个API。如果功能测试没有问题了,去接口的仓库提交PR,先把接口合并后再来合并这个。

好的,那请 @lyl-available 先测试,我在测试的时候先更新一下测试用的调度器适配器

@lyl-available
Copy link
Contributor

测试通过

@piccaSun
Copy link
Contributor Author

修改提交成功后的返回值,使其包含提交成功的作业ID,文字与提交作业成功的显示文字保持一致
62集群下测试效果如下:
image

@piccaSun
Copy link
Contributor Author

@ddadaal
功能测试已结束,调度器适配器接口已通过pr合并到主分支,已经修改回scow下面的调度器适配器接口仓库地址为主分支地址。
但是slurm调度器这边忘了通知杨老师,调度器还没有将这个新增接口合并到调度器的主分支。

@piccaSun
Copy link
Contributor Author

piccaSun commented Nov 3, 2023

调度器适配器接口修改后62部署环境与slurm联调没有问题,等待调度器适配器接口仓库版本更新后修改buf-generate
提交成功时
image
image
提交失败时
image

@piccaSun
Copy link
Contributor Author

piccaSun commented Nov 6, 2023

@ddadaal 已更新scow-scheduler-adapter-interface版本号至v1.2.0

Comment on lines +264 to +266
const reply = await asyncClientCall(client.job, "submitScriptAsJob", {
userId, script,
}).catch((e) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const errors = parseErrorDetails(ex.metadata);
// 如果接口不存在
if (errors[0] && errors[0].$type === "google.rpc.ErrorInfo" && errors[0].reason === "UNIMPLEMENTED") {
scheduleApiVersion = { major: 1, minor: 1, patch: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口不存在默认是1.0.0,因为1.1.0是要求实现这个接口的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为1.0.0

* @param comparedVersion
*/
export async function checkSchedulerApiVersion(client: SchedulerAdapterClient,
comparedVersion: ApiVersion): Promise<void> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数可以通用非常好,但是还是设计复杂了,目前只需要确认获取到的版本大于等于要求的版本即可。第二个参数可以命名为minVersion,更贴切

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改,请确认

Comment on lines 103 to 106
scheduleApiVersion = null;
logger.warn(
"The scheduler API version can not be confirmed. Some functionalities may not operate as expected.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果甚至都获取不到版本,报错。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为报错。

scheduleApiVersion = { major: 1, minor: 1, patch: 0 };
// 如果接口服务不存在
} else if ((e as any).code === status.UNIMPLEMENTED) {
scheduleApiVersion = { major: 1, minor: 1, patch: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口服务不存在是指VersionService不存在吗?那也是认为是1.0.0。不需要区分

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的。已修改为服务或接口不存在都认为版本为1.0.0,请确认

apps/portal-server/src/utils/clusters.ts Show resolved Hide resolved
}

if (scheduleApiVersion) {
const compareResult = compareSemVersion(comparedVersion, scheduleApiVersion);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在只要判断版本是否>=就可以了。以后需要的话以后再说吧

Copy link
Contributor Author

@piccaSun piccaSun Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为在本函数下直接判断调度器api版本是否大于等于最低要求版本,只比较major和minor不再比较patch版本,请确认

Comment on lines 34 to 39
// /**
// * 判断当前集群下的API版本与调度器API版本
// * @param client
// * @returns
// */
// export async function checkScowSchedulerApiVersion(client: SchedulerAdapterClient): Promise<void> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没太看懂这个函数是在做什么……SCOW只需要管要调用的集群的调度器版本,不需要管SCOW自己的版本

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除注释掉的代码部分

@piccaSun
Copy link
Contributor Author

修改上述意见(调度器暂未实现VersionService服务暂未联调)
判断版本不符合要求时前端报错如下
提交失败
后台提示详细错误信息
后台报错

@piccaSun piccaSun requested a review from ddadaal November 10, 2023 03:58
@ddadaal ddadaal merged commit 135f2b1 into master Nov 11, 2023
9 checks passed
@ddadaal ddadaal deleted the feat-add-submit-file branch November 11, 2023 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants