Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(mis,portal): 重构scow后端, 对接调度器适配器接口 #632

Merged
merged 80 commits into from
Jul 11, 2023

Conversation

qhqhqhq
Copy link
Contributor

@qhqhqhq qhqhqhq commented May 10, 2023

1. 部署调度器适配器

首先需要确保您的集群上部署了对应的调度器适配器,得到访问它的地址及端口号

部署适配器可参考文档:

2. 修改SCOW配置文件

首先确保您使用了最新的SCOW镜像(可查看install.yaml中的imageTag字段)

在用于部署scow的scow-deployment文件夹中,修改配置文件:

  • 首先修改集群配置文件

    主要变化为删除slurm配置项, 将loginNodes配置项作为独立的一项配置。新增adapterUrl配置项,标识适配器地址

# 集群显示名称
displayName: hpc01

# 调度器适配器的地址
adapterUrl: "192.168.88.101:8972"

# 登录节点
loginNodes:
  - "192.168.88.102"
  • 修改管理系统配置文件

    删除了fetchJobs配置项中的db项,即不再采用源作业信息数据库,通过适配器同步作业信息

3. 不再使用源作业信息数据库

部署使用适配器后,可以不再部署export-jobs项目,同步作业信息的功能由适配器完成

@changeset-bot
Copy link

changeset-bot bot commented May 10, 2023

🦋 Changeset detected

Latest commit: 912eff9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 18 packages
Name Type
@scow/scheduler-adapter-protos Minor
@scow/lib-scheduler-adapter Minor
@scow/portal-server Minor
@scow/test-adapter Minor
@scow/protos Minor
@scow/mis-server Minor
@scow/portal-web Minor
@scow/demo-vagrant Minor
@scow/mis-web Minor
@scow/config Minor
@scow/auth Minor
@scow/cli Minor
@scow/lib-ssh Minor
@scow/grpc-api Minor
@scow/docs Minor
@scow/lib-hook Patch
@scow/lib-slurm Patch
@scow/gateway Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@qhqhqhq
Copy link
Contributor Author

qhqhqhq commented May 10, 2023

现在还没法测试正确性, 主要发上来看看

@lyl-available
Copy link
Contributor

lyl-available commented Jun 27, 2023

同时同步千条以上作业至管理系统扣费 有问题(已修复)

@lyl-available
Copy link
Contributor

fetchjob功能优化测试通过

Comment on lines +110 to +115
em.persist(pricedJob);
await em.flush();

pricedJobs.push(pricedJob);
} catch (error) {
logger.warn("invalid job. cluster: %s, jobId: %s, error: %s", job.cluster, job.jobId, error);
Copy link
Contributor Author

@qhqhqhq qhqhqhq Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了跳过有问题的作业,对于每条作业,都尝试调用flush,如果出错(如某个字段值溢出),则跳过它

@qhqhqhq qhqhqhq requested a review from ddadaal July 5, 2023 06:40
@ddadaal ddadaal merged commit 5b7f0e8 into master Jul 11, 2023
@ddadaal ddadaal deleted the refactor_use_scheduler-adapter-interface branch July 11, 2023 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants