You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But it can not allows file transfer between tasks. For example, I have two python scripts to do some analysis work. The second script process the data that come from the first script. I have to pass a path variable as a parameter.
Parameter passing will not work as expected if the two tasks are not the same worker, because actually, the path is not correct.
I think if DolphinScheduler supports this feature, it would be a handy boost for scenarios such as data analysis and machine learning.
Use case
I think we can use the resource center as a file transfer store If the user has enabled the resource center. For example, In the task plugin, we can agree on a new path specification:
use $from_remote(remote_path, local_path) to download file from remote_path to local_path before task start.
use $to_remote(remote_path, local_path) to upload file from local_path to remote_path
base_uri=f"s3://{default_bucket}/abalone"input_data_uri=sagemaker.s3.S3Uploader.upload(
local_path=local_path,
desired_s3_uri=base_uri,
)
input_data=ParameterString(
name="InputData",
default_value=input_data_uri,
)
# This is the path to use directlyProcessingInput(source=input_data, destination="/opt/ml/processing/input")
Above is the example of Sagemaker. If DolphinScheduler supports it, it should be easier to use it.
Such as
# It will process data and save output data to the local path output/demo.csv, and upload that to bucket1/demo.csv in the resource center after the task is done.
python process_data.py --output=$to_remote('bucket1/demo.csv', 'output/demo.csv')
# It will download data from "bucket1/demo.csv" in the resource center and save it to the local path "output/demo.csv"# and than the following command actually executes# python analysis.py --input=data/demo.csv
python analysis.py --input=$from_remote('bucket1/demo.csv', 'data/demo.csv')
I think this feature depends on configuration center #10283. Otherwise, it is impossible to determine which object to use to store the configuration during uploading and downloading.
Search before asking
Description
DolphinScheduler allows parameter transfer between tasks: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/parameter/context.html
But it can not allows file transfer between tasks. For example, I have two python scripts to do some analysis work. The second script process the data that come from the first script. I have to pass a path variable as a parameter.
Parameter passing will not work as expected if the two tasks are not the same worker, because actually, the path is not correct.
I think if DolphinScheduler supports this feature, it would be a handy boost for scenarios such as data analysis and machine learning.
Use case
I think we can use the resource center as a file transfer store If the user has enabled the resource center. For example, In the task plugin, we can agree on a new path specification:
$from_remote(remote_path, local_path)
to download file from remote_path to local_path before task start.$to_remote(remote_path, local_path)
to upload file from local_path to remote_pathThe appeal was inspired by AWS Sagemaker
Above is the example of Sagemaker. If DolphinScheduler supports it, it should be easier to use it.
Such as
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: