Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Task] Transfer files between tasks #12552

Merged
merged 5 commits into from
Nov 3, 2022

Conversation

jieguangzhou
Copy link
Member

@jieguangzhou jieguangzhou commented Oct 26, 2022

Purpose of the pull request

FILE Parameter

Use the file parameter to pass files (or folders, hereinafter referred to as file) in the working directory of the upstream task to the downstream task in the same workflow instance. The following scenarios may be used

  • In the ETL scenario, pass the data files processed by multiple upstream tasks to a specific downstream task.
  • In the machine learning scenario, pass the data set file of the upstream data preparation task to the downstream model training task.

Usage

Configure file parameter

File parameter configuration method: click the plus sign on the right side of "Custom Parameters" on the task definition page to configure.

Output file to downstream task

Four options of custom parameters are:

  • Parameter name: the identifier used when passing tasks, such as KEY1 and KEY2 in the figure below
  • Direction: OUT, which means outputting the file to the downstream task
  • Parameter type: FILE, indicating file parameter
  • Parameter value: output file path, such as data and data/test2/text.txt in the figure below

The configuration in the figure below indicates that the output task passes two file data to the downstream task, respectively:

  • Pass out the folder data, and mark it as dir-data. The downstream task can get this folder through output.dir-data
  • Pass out the file data/test2/text.txt, and mark it as file-text. The downstream task can get this folder through output.file-text

Get the file from the upstream task

Four options of custom parameters are:

  • Parameter name: the position where the upstream file is saved after input, such as input_dir used in the figure below
  • Direction: IN, which means to get the file from the upstream task
  • Parameter type: FILE, indicating file parameter
  • Parameter value: the identifier of the upstream file, in the format of taskName.KEY. For example, output.dir-data in the figure below, where output is the name of the upstream task, and dir-data is the file identifier output by the upstream task

The configuration in the figure below indicates that the task gets the folder identified by dir-data from the upstream task output and saves it as input_dir

The configuration in the figure below indicates that the task gets the file identified by file-text from the upstream task output and saves it as input.txt

close: #12479

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

If your pull request contain incompatible change, you should also add it to docs/docs/en/guide/upgrede/incompatible.md

@jieguangzhou jieguangzhou force-pushed the data-transfer branch 2 times, most recently from 64ac2f6 to 27e4d74 Compare October 26, 2022 11:43
@jieguangzhou jieguangzhou changed the title [Feature][Task] Transfer files between tasks #12479 [Feature][Task] Transfer files between tasks Oct 26, 2022
@jieguangzhou jieguangzhou self-assigned this Oct 26, 2022
@jieguangzhou jieguangzhou added backend UI ui and front end related labels Oct 26, 2022
@jieguangzhou jieguangzhou added this to the 3.2.0 milestone Oct 26, 2022
@zhongjiajie
Copy link
Member

The pressure bandwidth should be show in the documentation

@jieguangzhou
Copy link
Member Author

The pressure bandwidth should be show in the documentation

I had add related message into documentation.

@jieguangzhou jieguangzhou marked this pull request as draft October 27, 2022 03:06
@jieguangzhou
Copy link
Member Author

I have marked this pr as a draft because I want to add a clean mechanism to clear data in the resource center.

@jieguangzhou jieguangzhou force-pushed the data-transfer branch 3 times, most recently from ef7c1a9 to a1783a2 Compare October 27, 2022 10:54
@codecov-commenter
Copy link

codecov-commenter commented Oct 27, 2022

Codecov Report

Merging #12552 (832f9f9) into dev (3ff328c) will increase coverage by 0.10%.
The diff coverage is 62.43%.

❗ Current head 832f9f9 differs from pull request most recent head 31545db. Consider uploading reports for the commit 31545db to get more accurate results

@@             Coverage Diff              @@
##                dev   #12552      +/-   ##
============================================
+ Coverage     39.05%   39.15%   +0.10%     
- Complexity     4186     4206      +20     
============================================
  Files          1043     1044       +1     
  Lines         39506    39673     +167     
  Branches       4539     4564      +25     
============================================
+ Hits          15430    15535     +105     
- Misses        22318    22372      +54     
- Partials       1758     1766       +8     
Impacted Files Coverage Δ
...nscheduler/api/controller/ResourcesController.java 54.71% <0.00%> (-1.06%) ⬇️
...duler/common/lifecycle/ServerLifeCycleManager.java 0.00% <0.00%> (ø)
.../apache/dolphinscheduler/common/utils/OSUtils.java 33.72% <0.00%> (-0.61%) ⬇️
.../dao/repository/impl/ProcessDefinitionDaoImpl.java 4.34% <0.00%> (-5.66%) ⬇️
.../server/master/registry/MasterWaitingStrategy.java 0.00% <ø> (ø)
...erver/worker/runner/WorkerTaskExecuteRunnable.java 39.73% <0.00%> (-0.54%) ⬇️
...permission/ResourcePermissionCheckServiceImpl.java 66.15% <33.33%> (-0.77%) ⬇️
...heduler/api/service/impl/ResourcesServiceImpl.java 38.61% <62.50%> (+0.92%) ⬆️
...er/server/worker/utils/TaskFilesTransferUtils.java 79.00% <79.00%> (ø)
...inscheduler/api/service/impl/UsersServiceImpl.java 70.49% <100.00%> (ø)
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Tianqi-Dotes
Tianqi-Dotes previously approved these changes Nov 1, 2022
Copy link
Member

@Tianqi-Dotes Tianqi-Dotes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

add delete DATA_TRANSFER API
@jieguangzhou
Copy link
Member Author

@caishunfeng @Tianqi-Dotes PTAL, thanks.

caishunfeng
caishunfeng previously approved these changes Nov 2, 2022
Copy link
Contributor

@caishunfeng caishunfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sonarcloud
Copy link

sonarcloud bot commented Nov 3, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 13 Code Smells

78.6% 78.6% Coverage
2.1% 2.1% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend document feature new feature UI ui and front end related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Task] Transfer files between tasks
6 participants