-
Notifications
You must be signed in to change notification settings - Fork 961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add feature with JindoRuntime dataloader #764
Conversation
Codecov Report
@@ Coverage Diff @@
## master #764 +/- ##
==========================================
- Coverage 12.65% 12.52% -0.13%
==========================================
Files 93 93
Lines 4236 4279 +43
==========================================
Hits 536 536
- Misses 3640 3683 +43
Partials 60 60
Continue to review full report at Codecov.
|
fileUtils := alluxioOperations.NewAlluxioFileUtils(podName, containerName, targetDataset.Namespace, ctx.Log) | ||
ready = fileUtils.Ready() | ||
case common.JINDO_RUNTIME: | ||
podName := fmt.Sprintf("%s-jindofs-master-0", targetDataset.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest put %s-jindofs-master-0
in a unified place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx,same situation as alluxio , will resolve next pr
if err != nil { | ||
log.Error(err, "failed to generate dataload chart's value file") | ||
return utils.RequeueIfError(err) | ||
} | ||
chartName := utils.GetChartsDirectory() + "/" + cdataload.DATALOAD_CHART | ||
chartName := "" | ||
if boundedRuntimeType == "alluxio" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid hardcode alluxio
if boundedRuntimeType == common.DataRuntimeAlluxio.String() {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, replace by common constant
pkg/dataload/constants.go
Outdated
@@ -21,4 +21,6 @@ const ( | |||
DATALOAD_DEFAULT_IMAGE = "registry.cn-hangzhou.aliyuncs.com/fluid/fluid-dataloader" | |||
DATALOAD_SUFFIX_LENGTH = 5 | |||
ENV_DATALOADER_IMG = "DATALOADER_IMG" | |||
DATALOAD_ALLUXIO_CHART = "alluxio" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alluxio
appear many time, can it be unified into one place ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, removed and replaced by common constant
// Check if the Alluxio is ready by running `alluxio fsadmin report` command | ||
func (a JindoFileUtils) Ready() (ready bool) { | ||
var ( | ||
command = []string{"/sdk/bin/jindo", "jfs", "-report"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx,this is jindofs's fixed shell command, so be hard code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can define constant in common pkg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just same as with other many command , will replace "/sdk/bin/jindo" by "jindo"
for _, mount := range dataset.Spec.Mounts { | ||
|
||
jfsNamespace = jfsNamespace + mount.Name + "," | ||
//jfsNamespace = jfsNamespace + mount.Name + "," | ||
|
||
if !strings.HasSuffix(mount.MountPoint, "/") { | ||
mount.MountPoint = mount.MountPoint + "/" | ||
} | ||
// transform mountpoint for oss or hdfs format | ||
if strings.HasPrefix(mount.MountPoint, "hdfs://") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which protocol can be encapsulated as a util function
api/v1alpha1/dataload_types.go
Outdated
LoadMemoryData bool `json:"loadMemoryData,omitempty"` | ||
|
||
// add HdfsConfig for JindoRuntime | ||
HdfsConfig string `json:"hdfsConfig,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should user define this HDFSConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -33,8 +33,8 @@ runtime: | |||
portRange: 18000-19999 | |||
enabled: false | |||
smartdata: | |||
image: registry.cn-shanghai.aliyuncs.com/jindofs/smartdata:3.5.2 | |||
image: registry.cn-shanghai.aliyuncs.com/jindofs/smartdata:3.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.5.0 is older than 3.5.2? Is it your expectation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx,The version of the mirror will be same as JindoFS version in the future,so 3.5.0 is expected
api/v1alpha1/dataload_types.go
Outdated
@@ -53,6 +53,12 @@ type DataLoadSpec struct { | |||
|
|||
// Target defines target paths that needs to be loaded | |||
Target []TargetPath `json:"target,omitempty"` | |||
|
|||
// LoadMemoryData specifies if the dataload job should load memory or not | |||
LoadMemoryData bool `json:"loadMemoryData,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does loadMemoryData mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already removed
OssKey string `yaml:"osskey,omitempty"` | ||
OssSecret string `yaml:"osssecret,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use ossKey
and ossSecret
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, Capitalization is for external reference
@@ -0,0 +1,86 @@ | |||
# fluid-dataloader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
READMEs in both fluid-dataloader/alluxio
and fluid-dataloader/jindo
is deprecated. So I suggest we should delete them to avoid misleading info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is two different dataload type I think it's better to be separated
# Required | ||
# Description: the dataset that this DataLoad targets | ||
targetDataset: #imagenet | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add default values and comments for hdfsConfig
and loadMemoryData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx,already removed this two unnecessary parameter
@@ -68,10 +68,20 @@ func GetWorkerImage(client client.Client, datasetName string, runtimeType string | |||
|
|||
} | |||
if imageName == "" { | |||
imageName = "registry.cn-huhehaote.aliyuncs.com/alluxio/alluxio" | |||
if runtimeType == common.ALLUXIO_RUNTIME { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use switch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I and @XIAO-HOU are trying to unify get image methods.
perhaps you need to agree with us in the future.
pkg/dataload/value.go
Outdated
@@ -21,6 +21,12 @@ type DataLoadInfo struct { | |||
|
|||
// Image specifies the image that the DataLoad job uses | |||
Image string `yaml:"image,omitempty"` | |||
|
|||
// LoadMemoryData specifies if the dataload job should load memory or not | |||
LoadMemoryData bool `yaml:"loadMemoryData,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need define this in the struct for the comon dataloadInfo Object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, so define JindoOptions to include these two parameter, not in the common dataloadInfo
if err != nil { | ||
log.Error(err, "failed to generate dataload chart's value file") | ||
return utils.RequeueIfError(err) | ||
} | ||
chartName := utils.GetChartsDirectory() + "/" + cdataload.DATALOAD_CHART | ||
chartName := "" | ||
if boundedRuntimeType == common.ALLUXIO_RUNTIME { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to change to switch
, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
image := fmt.Sprintf("%s:%s", imageName, imageTag) | ||
|
||
runtime, err := utils.GetJindoRuntime(r.Client, boundedRuntime.Name, boundedRuntime.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this change break dataload logic for alluxio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx,to avoid break the logic for alluixo, make generateJindoDataLoadValueFile function to separate each other
pkg/dataload/value.go
Outdated
@@ -21,6 +21,17 @@ type DataLoadInfo struct { | |||
|
|||
// Image specifies the image that the DataLoad job uses | |||
Image string `yaml:"image,omitempty"` | |||
|
|||
// JindoOptions specifies the options that jindoruntime uses | |||
JindoOptions JindoOptions `yaml:"jindoOptions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think JindoOptions shouldn't be part of dataLoadInfo. Just like the carOption is not part of the vehicle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Perhaps better to use two different chart value structs for each ddc engine and shared properties like Image
and TargetDataset
can be put together in a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can build a field to pass options, which option can be effective is decided by runtime type, just like csi options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can provide csi sample for reference.
@@ -68,10 +68,20 @@ func GetWorkerImage(client client.Client, datasetName string, runtimeType string | |||
|
|||
} | |||
if imageName == "" { | |||
imageName = "registry.cn-huhehaote.aliyuncs.com/alluxio/alluxio" | |||
if runtimeType == common.ALLUXIO_RUNTIME { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I and @XIAO-HOU are trying to unify get image methods.
perhaps you need to agree with us in the future.
log.Error(err, "failed to generate dataload chart's value file") | ||
return utils.RequeueIfError(err) | ||
} | ||
chartName = utils.GetChartsDirectory() + "/" + cdataload.DATALOAD_CHART + "/" + common.ALLUXIO_RUNTIME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about make it a function which return chartName according to runtimeName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, will refeactor next PR
pkg/dataload/value.go
Outdated
@@ -21,6 +21,17 @@ type DataLoadInfo struct { | |||
|
|||
// Image specifies the image that the DataLoad job uses | |||
Image string `yaml:"image,omitempty"` | |||
|
|||
// JindoOptions specifies the options that jindoruntime uses | |||
JindoOptions JindoOptions `yaml:"jindoOptions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can build a field to pass options, which option can be effective is decided by runtime type, just like csi options.
* add ns with the cache path * fix err unused * fix err unused * fix directory dir * add dataloader crd for JindoRuntime * fix dataloader runtime * fix constant ns to jindo * add jindofs client conf yaml * fix configmap yaml * add support with s3 and global fuse * upgrade jindo version to 3.5.0 * refine dataloader config * change default nsname * add core-site dataloader * add jindo dataset status judge * add load data to memory type * change default ns to jindo * fix ufs mode * change dataload * change dataload to alluxio and jindo * refine dataload * refine dataload * refine dataload * refine dataload * refine dataload function * delete readme file * add generateJindoDataLoadValueFile to separate from alluxioRuntime * dataload options define * refine Signed-off-by: xieydd <[email protected]>
Ⅰ. Describe what this PR does
Ⅱ. Does this pull request fix one issue?
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews