Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete the sync Deployment after deploy complete. #70

Closed
caoxianfei1 opened this issue Jun 9, 2023 · 5 comments · Fixed by #78
Closed

Delete the sync Deployment after deploy complete. #70

caoxianfei1 opened this issue Jun 9, 2023 · 5 comments · Fixed by #78
Assignees
Labels
good first issue Good for newcomers

Comments

@caoxianfei1
Copy link
Collaborator

caoxianfei1 commented Jun 9, 2023

We start a deployment called curve-sync-config as a server to synchronize the config files of various services. But this service is not needed after the deployment is completed, so this deployment needs to be deleted, otherwise unnecessary resources will be wasted.

我们为了同步所需要的配置文件,在部署的时候需要启动一个叫curve-sync-config的deployment,现在我们需要在确定同步成功之后,或者在整个集群部署完成之后,删除这个deployment,因为这个deployment已经无用并且会占用系统的一些资源。

@lizonglingo
Copy link
Contributor

Can I have a try? Already submitted an application at https://summercoding-curve.app.codewave.163.com/apply.

@caoxianfei1
Copy link
Collaborator Author

caoxianfei1 commented Jul 31, 2023

Can I have a try? Already submitted an application at https://summercoding-curve.app.codewave.163.com/apply.

Of course. Welcome!

@caoxianfei1
Copy link
Collaborator Author

I has lookup your apply. It's all right

@lizonglingo
Copy link
Contributor

lizonglingo commented Aug 6, 2023

以下是我目前的实现思路,开发和测试的环境及方法,以及初步的代码。

实现思路

curve-sync-config 这一个 deployment 的作用是用于在 k8s 启动 curvefs / curvebs 时同步各个组件的配置。我的大致思路是:在检测到与 curve-sync-config 相关的所有 Pod 处于 Ready/Completed 后,将 curve-sync-config 这个 deployment delete 掉。

开发环境

  • 本地部署 Kubernetes Version 1.20.13,1 master node,3 work node
  • 没有在 vendor 引入新的包
  • go version 1.18

测试方法

  • 使用仓库中的 Dockerfile 构建并推送到私有仓库,在部署 operator 的 manifest 修改 curve-operator 的版本
  • 使用仓库中的 samples 集群配置文件进行部署

curve cluster 组件配置:

集群类型 etcd mds chunkserver metaserver snapshotclone monitor
bs × × ×
bs × ×
bs × ×
bs ×
fs × × ×
fs × ×

代码

目前最新版本的代码已经同步到我的 fork 中。

首先,checkClusterDeployedInfo 记录目前命名空间下,集群各个组件已经就绪的数量。

type checkClusterDeployedInfo struct {
	mdsReady  int
	etcdReady int

	metaServerReady    int
	chunkServerReady   int
	snapShotCloneReady int

	grafanaReady      int
	prometheusReady   int
	nodeExporterReady int

	jobPreChunkFileCompleted    int
	jobProLogicPoolCompleted    int
	jobProPhysicalPoolCompleted int
}

func deleteSyncConfigDeployment(c *daemon.Cluster, syncConfigDeployment string) 中,程序每隔一段时间会 list 命名空间中的 deployments 和 jobs 信息,通过组件的 AppName 作为前缀定位到各个 deployment 和 job,然后检查它们的状态是否 Ready 或是 Complete,比对期望的数量,判断是否可以 delete sync-config-deployment。

以下是各个组件期望的 deployment 或 job 的数量:

组件 数量 是否开启
mds len(cluster.Nodes)
etcd len(cluster.Nodes)
chunkserver len(cluster.Chunkserver.Devices)*len(cluster.Chunkserver.Nodes) bs
metaserver len(cluster.Nodes) fs
snapshotclone len(cluster.Nodes) bs(可选)
grafana 1 可选
prometheus 1 可选
nodeExporter len(cluster.Nodes) 可选
jobPreChunkFile len(cluster.Chunkserver.Devices)*len(cluster.Chunkserver.Nodes) bs
jobProLogicPool 1 bs 和 fs
jobProPhysicalPool 1 bs

func createSyncDeployment(c *daemon.Cluster) error 中,起一个 goroutine 调用 func deleteSyncConfigDeployment(c *daemon.Cluster, syncConfigDeployment string),执行 go deleteSyncConfigDeployment(c, newDeployment.GetName()) 在新的协程中运行。

@caoxianfei1
Copy link
Collaborator Author

以下是我目前的实现思路,开发和测试的环境及方法,以及初步的代码。

实现思路

curve-sync-config 这一个 deployment 的作用是用于在 k8s 启动 curvefs / curvebs 时同步各个组件的配置。我的大致思路是:在检测到与 curve-sync-config 相关的所有 Pod 处于 Ready/Completed 后,将 curve-sync-config 这个 deployment delete 掉。

开发环境

  • 本地部署 Kubernetes Version 1.20.13,1 master node,3 work node
  • 没有在 vendor 引入新的包
  • go version 1.18

测试方法

  • 使用仓库中的 Dockerfile 构建并推送到私有仓库,在部署 operator 的 manifest 修改 curve-operator 的版本
  • 使用仓库中的 samples 集群配置文件进行部署

curve cluster 组件配置:

集群类型 etcd mds chunkserver metaserver snapshotclone monitor
bs √ √ √ × × ×
bs √ √ √ × √ ×
bs √ √ √ × × √
bs √ √ √ × √ √
fs √ √ × √ × ×
fs √ √ × √ × √

代码

目前最新版本的代码已经同步到我的 fork 中。

首先,checkClusterDeployedInfo 记录目前命名空间下,集群各个组件已经就绪的数量。

type checkClusterDeployedInfo struct {
	mdsReady  int
	etcdReady int

	metaServerReady    int
	chunkServerReady   int
	snapShotCloneReady int

	grafanaReady      int
	prometheusReady   int
	nodeExporterReady int

	jobPreChunkFileCompleted    int
	jobProLogicPoolCompleted    int
	jobProPhysicalPoolCompleted int
}

func deleteSyncConfigDeployment(c *daemon.Cluster, syncConfigDeployment string) 中,程序每隔一段时间会 list 命名空间中的 deployments 和 jobs 信息,通过组件的 AppName 作为前缀定位到各个 deployment 和 job,然后检查它们的状态是否 Ready 或是 Complete,比对期望的数量,判断是否可以 delete sync-config-deployment。

以下是各个组件期望的 deployment 或 job 的数量:

组件 数量 是否开启
mds len(cluster.Nodes)
etcd len(cluster.Nodes)
chunkserver len(cluster.Chunkserver.Devices)*len(cluster.Chunkserver.Nodes) bs
metaserver len(cluster.Nodes) fs
snapshotclone len(cluster.Nodes) bs(可选)
grafana 1 可选
prometheus 1 可选
nodeExporter len(cluster.Nodes) 可选
jobPreChunkFile len(cluster.Chunkserver.Devices)*len(cluster.Chunkserver.Nodes) bs
jobProLogicPool 1 bs 和 fs
jobProPhysicalPool 1 bs
func createSyncDeployment(c *daemon.Cluster) error 中,起一个 goroutine 调用 func deleteSyncConfigDeployment(c *daemon.Cluster, syncConfigDeployment string),执行 go deleteSyncConfigDeployment(c, newDeployment.GetName()) 在新的协程中运行。

good job! You can commit the pr if you are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants