Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add juicefsruntime dataload function #1539

Merged
merged 87 commits into from
Mar 22, 2022
Merged
Show file tree
Hide file tree
Changes from 84 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
1964301
Add helm lint check in Github workflow (#1384)
TrafalgarZZZ Jan 18, 2022
323f39e
add juicefsruntime dataload
ldd91 Feb 18, 2022
c57c026
fix test
zwwhdls Feb 21, 2022
581852d
Add documents for debuging (#1387)
cheyang Jan 18, 2022
2e58404
not check mount ready in stageVolume (#1390)
zwwhdls Jan 18, 2022
e69c034
Build docker image for csi plugin, To #38946668 (#1391)
cheyang Jan 19, 2022
d25d80b
Add remount check for hostpath mount during sync (#1340)
Nizifan Jan 19, 2022
a5611de
Jindo fuse recover support and upgrade jindofs version (#1385)
frankleaf Jan 19, 2022
d10384c
unify the file format under cmd, alluxio jindo dataset (#1388)
allenhaozi Jan 20, 2022
e02a0e4
Docker image build for alluxio master restart (#1393)
cheyang Jan 21, 2022
197a643
Refactor Fluid CSI Plugin (#1395)
TrafalgarZZZ Jan 26, 2022
f352d7e
Build docker images for refactoring csi plugin, To #38946668 (#1397)
cheyang Jan 26, 2022
248c461
fix diagnose (#1402)
ssz1997 Jan 30, 2022
9e73667
Support fuse sidecar injection (#1401)
cheyang Feb 2, 2022
fb4be87
Fix building docker image for init user (#1408)
cheyang Feb 2, 2022
4aae2c4
Build docker image (#1409)
cheyang Feb 3, 2022
a4bb7da
Make AlluxioRuntime support serverless (#1411)
cheyang Feb 3, 2022
6c316c9
Support pod's namespace is empty in webhook (#1413)
cheyang Feb 4, 2022
3c94ac7
Support namespace is empty, To #37688693 (#1415)
cheyang Feb 4, 2022
c7063d8
Fix redundant type from array, slice, or map composite literal (#1417)
cheyang Feb 5, 2022
6f8d5e3
Fix dataload cannot be cleaned up bug (#1421)
TrafalgarZZZ Feb 8, 2022
49024f8
Add testcase, To #37688693 (#1423)
cheyang Feb 8, 2022
f2df89a
fix goosefs error (#1422)
xieydd Feb 9, 2022
3810f2c
Build docker image for goose, To #37688693 (#1425)
cheyang Feb 9, 2022
fffba9f
support juicefs in serverless (#1427)
zwwhdls Feb 10, 2022
bcd6bc9
Build docker image for juicefs on serverless, To #37688693 (#1432)
cheyang Feb 10, 2022
4ffa771
(GooseFS) [Bug Fix] Fix clean cache linux version error (#1428)
xieydd Feb 11, 2022
b74f652
Build docker image for Goosefs bug, To #37688693 (#1434)
cheyang Feb 11, 2022
747ef31
Fix update dataload status error (#1433)
abowloflrf Feb 12, 2022
be3ca71
Fix update dataload status error, To #37688693 (#1437)
cheyang Feb 13, 2022
53de3dc
disable fuse.shared.caching.reader.enabled for alluxio and goosefs, s…
Nizifan Feb 14, 2022
24c05b5
Update ADOPTERS.md (#1440)
peterchenhc Feb 14, 2022
54b5d4a
Remove the removed watermark configure key (#1439)
maobaolong Feb 15, 2022
3998bad
add feature gate to csi (#1444)
zwwhdls Feb 16, 2022
772273c
Make controllers handle deprecated runtime workers (#1447)
TrafalgarZZZ Feb 16, 2022
65afbf7
Docs for serverless (#1443)
cheyang Feb 18, 2022
aa09642
Build docker image, To #39482462 (#1449)
cheyang Feb 18, 2022
42c0bde
Use less job worker threads (#1450)
ssz1997 Feb 19, 2022
c8a8fd3
Build for GROMACS, To #26045127 (#1452)
cheyang Feb 20, 2022
eefc46a
Fix csi daemonset template typo (#1454)
TrafalgarZZZ Feb 22, 2022
b3554af
Update Fluid documents (#1455)
TrafalgarZZZ Feb 22, 2022
cbcacf5
fix gen_sdk.sh (#1457)
zwwhdls Feb 23, 2022
afddbbe
Archive deprecated Fluid documents (#1458)
TrafalgarZZZ Feb 23, 2022
2709928
Remove cache dir from sidecar pod (#1462)
cheyang Feb 25, 2022
309c942
Add Docker image for fuse without cachedir, To #37688693 (#1464)
cheyang Feb 25, 2022
d44a69a
Disable timezone hostpath for jindoruntime, To #37688693 (#1466)
cheyang Feb 25, 2022
a25bd08
Build docker image for disabling timezone hostpath for jindoruntime, …
cheyang Feb 25, 2022
23b5c8d
remove duplicates in Makefile (#1472)
haoeeeee Feb 27, 2022
6db2060
Add community meeting information (#1471)
RongGu Feb 27, 2022
1be5ed0
Update docs for enabling cache dir (#1474)
cheyang Feb 27, 2022
b37e587
Update install document (#1475)
TrafalgarZZZ Mar 1, 2022
45e5d4d
enable run jindo dataload synchronously (#1477)
frankleaf Mar 1, 2022
0f23ad9
fix jindo some parameter to bool (#1480)
frankleaf Mar 1, 2022
62f60bf
Build docker image for release v0.7.0, To #39482692 (#1484)
cheyang Mar 2, 2022
0c38f4a
Fix typo in install document (#1486)
TrafalgarZZZ Mar 2, 2022
8fd42c4
modify bug : clean cache in Alluxio due to unknown linux release vers…
haoeeeee Mar 2, 2022
0e493da
Build docker image for 0.7.0 release, To #37688693 (#1488)
cheyang Mar 2, 2022
cac1fec
Added what's new for v0.7 release (#1489)
RongGu Mar 2, 2022
c852a78
Branch v0.8.0 (#1491)
cheyang Mar 3, 2022
74a88d5
Update CHANGELOG.md for 0.7 (#1493)
TrafalgarZZZ Mar 4, 2022
e2ab343
modify userguide faq zh & en documents (#1502)
javyxu Mar 8, 2022
ef5ca12
Default csi fuse recovery to false (#1501)
TrafalgarZZZ Mar 8, 2022
acb662b
add s3 example (#1494)
ssz1997 Mar 9, 2022
531c54b
Fix misleading logging in csi plugin (#1499)
cheyang Mar 9, 2022
500f2a6
add dmetasoul.com in adopters documents (#1506)
javyxu Mar 9, 2022
4d402bf
Updated the Chinese version of the S3 configuration document and adde…
javyxu Mar 9, 2022
0f76196
Build docker image for env in csi plugin, To #37688693 (#1509)
cheyang Mar 10, 2022
ccc2b2d
Build docker image for env in csi plugin, To #37688693 (#1510)
cheyang Mar 10, 2022
d17c5cd
fix update JindoRuntime doc link (#1512)
frankleaf Mar 11, 2022
4ebe2b5
update data_warm_up's document (#1513)
javyxu Mar 11, 2022
7976904
add fluid app controller (#1481)
zwwhdls Mar 13, 2022
7bbb015
Update committer name list. (#1517)
RongGu Mar 14, 2022
a5ceee1
fix pod/exec rbac in application controller (#1518)
zwwhdls Mar 14, 2022
c8bfb12
Build docker image for fluid app controller (#1516)
cheyang Mar 14, 2022
0b471a0
fix application controller image (#1519)
zwwhdls Mar 14, 2022
b2d9f14
warmup in each fuse pod
zwwhdls Mar 17, 2022
67667cb
fix test
zwwhdls Mar 18, 2022
3f758b3
update default fuse image
zwwhdls Mar 18, 2022
fde4b07
add label in job
zwwhdls Mar 18, 2022
ba97ed9
fix DCO
zwwhdls Mar 18, 2022
17500b7
fix conflict
zwwhdls Mar 18, 2022
51a3e6d
update typo
zwwhdls Mar 18, 2022
8d61be8
update dataload rbac to juicefs controller
zwwhdls Mar 18, 2022
e22f938
update warmup to worker
zwwhdls Mar 19, 2022
d40a378
add timeout & deal with error
zwwhdls Mar 19, 2022
0914e31
set timeout in dataload options
zwwhdls Mar 21, 2022
c254c0a
update configmap volume mode 0755 & add unit test
zwwhdls Mar 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions charts/fluid-dataloader/juicefs/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
### 0.1.0

- Support parallel prefetch job
- Support configurations by setting values
23 changes: 23 additions & 0 deletions charts/fluid-dataloader/juicefs/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: v2
name: fluid-dataloader
description: A Helm chart for Fluid to prefetch data

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 0.1.0
33 changes: 33 additions & 0 deletions charts/fluid-dataloader/juicefs/templates/configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ printf "%s-data-load-script" .Release.Name }}
labels:
release: {{ .Release.Name }}
role: dataload-job
data:
dataloader.distributedLoad: |
#!/usr/bin/env bash
set -xe

function main() {
paths="$DATA_PATH"
paths=(${paths//:/ })

podNames="$POD_NAMES"
podNames=(${podNames//:/ })

ns="$POD_NAMESPACE"
for((i=0;i<${#podNames[@]};i++)) do
local pod="${podNames[i]}"

for((j=0;j<${#paths[@]};j++)) do
echo -e "juicefs warmup on $pod ${paths[j]} starts"
/usr/local/bin/kubectl -n $ns exec -it $pod -- $COMMAND
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you add timeout to avoid the job hang?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/usr/local/bin/kubectl -n $ns exec -it $pod -- juicefs warmup $MOUNTPATH${paths[j]}
/usr/local/bin/kubectl -n $ns exec -it $pod -- umount $MOUNTPATH
echo -e "juicefs warmup on $pod ${paths[j]} ends"
done
done
}
main "$@"
146 changes: 146 additions & 0 deletions charts/fluid-dataloader/juicefs/templates/dataloader.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# .Release.Name will be used to decide which dataset will be preload
# .Release.Name should be like `<pvc-name>-load`(e.g. hbase-load for a PersistentVolumeClaim named `hbase`)
# TODO: the length of .Release.Name won't exceed 53(limited by Helm), which means length of `<pvc-name>` can't exceed 48. This might be a problem.
{{/* {{ $datasetName := "" -}}*/}}
{{/* {{- $randomSuffix := "" -}}*/}}
{{/* {{- if regexMatch "^[A-Za-z0-9._-]+-load-[A-Za-z0-9]{5}$" .Release.Name -}}*/}}
{{/* {{- $arr := regexSplit "-load-" .Release.Name -1 -}}*/}}
{{/* {{- $datasetName = first $arr -}}*/}}
{{/* {{- $randomSuffix = last $arr -}}*/}}
{{/* {{- else -}}*/}}
{{/* {{- printf "Illegal release name. Should be like <dataset-name>-load-<suffix-length-5>. Current name: %s" .Release.Name | fail -}}*/}}
{{/* {{- end }}*/}}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ printf "%s-job" .Release.Name }}
labels:
release: {{ .Release.Name }}
role: dataload-job
app: juicefs
targetDataset: {{ required "targetDataset should be set" .Values.dataloader.targetDataset }}
spec:
backoffLimit: {{ .Values.dataloader.backoffLimit | default "3" }}
completions: 1
parallelism: 1
template:
metadata:
name: {{ printf "%s-loader" .Release.Name }}
labels:
release: {{ .Release.Name }}
role: dataload-pod
app: juicefs
targetDataset: {{ required "targetDataset should be set" .Values.dataloader.targetDataset }}
spec:
restartPolicy: OnFailure
{{- range $key, $val := .Values.dataloader.options }}
{{- if eq $key "runtimeName" }}
serviceAccountName: {{ printf "%s-loader" $val | quote }}
{{- end }}
{{- end }}
containers:
- name: dataloader
image: {{ required "Dataloader image should be set" .Values.dataloader.image }}
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c"]
args: ["/scripts/juicefs_dataload.sh"]
{{- $targetPaths := "" }}
{{- range .Values.dataloader.targetPaths }}
{{- $targetPaths = cat $targetPaths (required "Path must be set" .path) ":" }}
{{- end }}
{{- $targetPaths = $targetPaths | nospace | trimSuffix ":" }}

{{- $pathReplicas := ""}}
{{- range .Values.dataloader.targetPaths }}
{{- $pathReplicas = cat $pathReplicas ( default 1 .replicas ) ":"}}
{{- end }}
{{- $pathReplicas = $pathReplicas | nospace | trimSuffix ":"}}

env:
- name: STORAGE_ADDRESS
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: CACHEDIR2
value: /test
{{- range $key, $val := .Values.dataloader.options }}
{{- if eq $key "cachedir" }}
- name: CACHEDIR
value: {{ $val | quote }}
{{- end }}
{{- if eq $key "mountpath" }}
- name: MOUNTPATH
value: {{ $val | quote }}
{{- end }}
{{- if eq $key "command" }}
- name: COMMAND
value: {{ $val | quote }}
{{- end }}
{{- end }}
- name: DATA_PATH
value: {{ $targetPaths | quote }}
- name: PATH_REPLICAS
value: {{ $pathReplicas | quote }}
{{- range $key, $val := .Values.dataloader.options }}
{{- if eq $key "podNames" }}
- name: POD_NAMES
value: {{ $val | quote }}
{{- end }}
{{- end }}
- name: POD_NAMESPACE
value: {{ .Release.Namespace | quote }}
envFrom:
- configMapRef:
name: {{ required "targetDataset should be set" .Values.dataloader.targetDataset }}-juicefs-values
securityContext:
privileged: true
volumeMounts:
- mountPath: /scripts
name: data-load-script
{{- range .Values.dataloader.targetPaths }}
{{- if .fluidNative }}
- mountPath: {{ .path | trimAll "/" | replace "/" "-" | printf "/data/%s"}}
name: {{ .path | trimAll "/" | replace "/" "-" | printf "native-%s"}}
{{- end }}
{{- end }}
{{- range $key, $val := .Values.dataloader.options }}
{{- if eq $key "cachedir" }}
- mountPath: {{ $val | quote }}
{{- end }}
{{- end }}
name: cachedir
{{- range $key, $val := .Values.dataloader.options }}
{{- if eq $key "mountpath" }}
- mountPath: {{ $val | quote }}
{{- end }}
{{- end }}
name: mountpath1
volumes:
- name: data-load-script
configMap:
name: {{ printf "%s-data-load-script" .Release.Name }}
items:
- key: dataloader.distributedLoad
path: juicefs_dataload.sh
mode: 365
{{- range .Values.dataloader.targetPaths }}
{{- if .fluidNative }}
- name: {{ .path | trimAll "/" | replace "/" "-" | printf "native-%s"}}
hostPath:
path: {{ .path }}
{{- end }}
{{- end }}
- name: cachedir
{{- range $key, $val := .Values.dataloader.options }}
{{- if eq $key "cachedir" }}
hostPath:
path: {{ $val | quote }}
{{- end }}
{{- end }}
- name: mountpath1
{{- range $key, $val := .Values.dataloader.options }}
{{- if eq $key "mountpath" }}
hostPath:
path: {{ $val | quote }}
{{- end }}
{{- end }}
34 changes: 34 additions & 0 deletions charts/fluid-dataloader/juicefs/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Default values for fluid-dataloader.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

dataloader:
# Optional
# Default: 3
# Description: how many times the prefetch job can fail, i.e. `Job.spec.backoffLimit`
backoffLimit: 3

# Required
# Description: the dataset that this DataLoad targets
targetDataset: #imagenet

# Optional
# Default: false
# Description: should load metadata from UFS when doing data load
loadMetadata: false

# Optional
# Default: (path: "/", replicas: 1, fluidNative: false)
# Description: which paths should the DataLoad load
targetPaths:
- path: "/"
replicas: 1
fluidNative: false

# Required
# Description: the image that the DataLoad job uses
image: #<juicefs-image>

# Optional
# Description: optional parameter DataLoad job uses
options:
6 changes: 0 additions & 6 deletions charts/fluid/fluid/templates/role/dataset/rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,6 @@ rules:
- juicefsruntimes/status
verbs:
- '*'
- apiGroups:
- ""
resources:
- events
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is events missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events bas been involved in line22.

verbs:
- '*'
- apiGroups:
- apps
resources:
Expand Down
19 changes: 19 additions & 0 deletions charts/fluid/fluid/templates/role/juicefs/rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,25 @@ rules:
- datasets/status
verbs:
- '*'
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- create
- list
- get
- delete
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterroles
- clusterrolebindings
verbs:
- create
- list
- get
- delete
- apiGroups:
- apps
resources:
Expand Down
2 changes: 1 addition & 1 deletion charts/fluid/fluid/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ runtime:
controller:
image: fluidcloudnative/juicefsruntime-controller:v0.8.0-b491ba8
fuse:
image: juicedata/juicefs-csi-driver:v0.11.0
image: registry.cn-hangzhou.aliyuncs.com/juicefs/juicefs-fuse:v1.0.0-beta2

webhook:
enabled: true
Expand Down
34 changes: 34 additions & 0 deletions charts/juicefs/templates/role/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ printf "%s-loader" .Release.Name }}
rules:
- apiGroups:
- ""
resources:
- pods
- pods/exec
verbs:
- get
- create
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ printf "%s-loader" .Release.Name }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ printf "%s-loader" .Release.Name }}
subjects:
- kind: ServiceAccount
name: {{ printf "%s-loader" .Release.Name }}
namespace: {{ .Release.Namespace | quote }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ printf "%s-loader" .Release.Name }}
namespace: {{ .Release.Namespace | quote }}
39 changes: 34 additions & 5 deletions charts/juicefs/templates/worker/statefuleset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,24 +59,53 @@ spec:
{{- end }}
{{- end }}
command: ["sh", "-c", "sleep infinity"]
securityContext:
privileged: true
{{- if .Values.worker.ports }}
ports:
{{ toYaml .Values.worker.ports | trim | indent 10 }}
{{- end }}
{{- if .Values.worker.envs }}
env:
{{- if .Values.worker.envs }}
{{ toYaml .Values.worker.envs | trim | indent 10 }}
{{- end }}
{{- if .Values.worker.cacheDir }}
{{- if .Values.fuse.metaurlSecret }}
- name: METAURL
valueFrom:
secretKeyRef:
name: {{ .Values.fuse.metaurlSecret }}
key: metaurl
{{- end }}
{{- if .Values.fuse.accesskeySecret }}
- name: ACCESS_KEY
valueFrom:
secretKeyRef:
name: {{ .Values.fuse.accesskeySecret }}
key: access-key
{{- end }}
{{- if .Values.fuse.secretkeySecret }}
- name: SECRET_KEY
valueFrom:
secretKeyRef:
name: {{ .Values.fuse.secretkeySecret }}
key: secret-key
{{- end }}
volumeMounts:
- mountPath: /root/script
name: script
{{- if .Values.worker.cacheDir }}
- name: cache-dir
mountPath: {{ .Values.worker.cacheDir }}
{{- end }}
{{- end }}
restartPolicy: Always
{{- if .Values.worker.cacheDir }}
volumes:
{{- if .Values.worker.cacheDir }}
- name: cache-dir
hostPath:
path: {{ .Values.worker.cacheDir }}
type: DirectoryOrCreate
{{- end }}
{{- end }}
- name: script
configMap:
name: {{ template "juicefs.fullname" . }}-script
defaultMode: 0777
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not setting 0755?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0755 is ok, done

4 changes: 2 additions & 2 deletions pkg/common/juicefs.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ const (

JuiceFSFuseImageEnv = "JUICEFS_FUSE_IMAGE_ENV"

DefaultJuiceFSFuseImage = "juicedata/juicefs-csi-driver:v0.10.5"
DefaultJuiceFSFuseImage = "registry.cn-hangzhou.aliyuncs.com/juicefs/juicefs-fuse:v1.0.0-beta2"

DefaultJuiceFSRuntimeImage = "juicedata/juicefs-csi-driver:v0.10.5"
DefaultJuiceFSRuntimeImage = "registry.cn-hangzhou.aliyuncs.com/juicefs/juicefs-fuse:v1.0.0-beta2"

JuiceFSMountPath = "/bin/mount.juicefs"

Expand Down
Loading