-
Select the modified local branch and the branch to merge past to create a pull request.
diff --git a/README.md b/README.md
index a2e6c3c2fdd7..a14bce1bc6a7 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
Dolphin Scheduler Official Website
[dolphinscheduler.apache.org](https://dolphinscheduler.apache.org)
-============
+==================================================================
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)](https://codecov.io/gh/apache/dolphinscheduler/branch/dev)
@@ -8,9 +8,6 @@ Dolphin Scheduler Official Website
[![Twitter Follow](https://img.shields.io/twitter/follow/dolphinschedule.svg?style=social&label=Follow)](https://twitter.com/dolphinschedule)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://s.apache.org/dolphinscheduler-slack)
-
-
-
[![Stargazers over time](https://starchart.cc/apache/dolphinscheduler.svg)](https://starchart.cc/apache/dolphinscheduler)
[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md)
@@ -21,35 +18,35 @@ Dolphin Scheduler Official Website
DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`.
Its main objectives are as follows:
- - Highly Reliable,
+- Highly Reliable,
DolphinScheduler adopts a decentralized multi-master and multi-worker architecture design, which naturally supports easy expansion and high availability (not restricted by a single point of bottleneck), and its performance increases linearly with the increase of machines
- - High performance, supporting tens of millions of tasks every day
- - Support multi-tenant.
- - Cloud Native, DolphinScheduler supports multi-cloud/data center workflow management, also
+- High performance, supporting tens of millions of tasks every day
+- Support multi-tenant.
+- Cloud Native, DolphinScheduler supports multi-cloud/data center workflow management, also
supports Kubernetes, Docker deployment and custom task types, distributed
scheduling, with overall scheduling capability increased linearly with the
scale of the cluster
- - Support various task types: Shell, MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Sub_Process, Procedure, etc.
- - Support scheduling of workflows and dependencies, manual scheduling to pause/stop/recover task, support failure task retry/alarm, recover specified nodes from failure, kill task, etc.
- - Associate the tasks according to the dependencies of the tasks in a DAG graph, which can visualize the running state of the task in real-time.
- - WYSIWYG online editing tasks
- - Support the priority of workflows & tasks, task failover, and task timeout alarm or failure.
- - Support workflow global parameters and node customized parameter settings.
- - Support online upload/download/management of resource files, etc. Support online file creation and editing.
- - Support task log online viewing and scrolling and downloading, etc.
- - Support the viewing of Master/Worker CPU load, memory, and CPU usage metrics.
- - Support displaying workflow history in tree/Gantt chart, as well as statistical analysis on the task status & process status in each workflow.
- - Support back-filling data.
- - Support internationalization.
- - More features waiting for partners to explore...
+- Support various task types: Shell, MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Sub_Process, Procedure, etc.
+- Support scheduling of workflows and dependencies, manual scheduling to pause/stop/recover task, support failure task retry/alarm, recover specified nodes from failure, kill task, etc.
+- Associate the tasks according to the dependencies of the tasks in a DAG graph, which can visualize the running state of the task in real-time.
+- WYSIWYG online editing tasks
+- Support the priority of workflows & tasks, task failover, and task timeout alarm or failure.
+- Support workflow global parameters and node customized parameter settings.
+- Support online upload/download/management of resource files, etc. Support online file creation and editing.
+- Support task log online viewing and scrolling and downloading, etc.
+- Support the viewing of Master/Worker CPU load, memory, and CPU usage metrics.
+- Support displaying workflow history in tree/Gantt chart, as well as statistical analysis on the task status & process status in each workflow.
+- Support back-filling data.
+- Support internationalization.
+- More features waiting for partners to explore...
## What's in DolphinScheduler
- Stability | Accessibility | Features | Scalability |
- --------- | ------------- | -------- | ------------|
-Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance. | Support pause, recover operation | Support customized task types
-support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment.
-Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
+| Stability | Accessibility | Features | Scalability |
+|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance. | Support pause, recover operation | Support customized task types |
+| support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. |
+| Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | |
## User Interface Screenshots
diff --git a/README_zh_CN.md b/README_zh_CN.md
index c5058eac1524..2226b9edbaf0 100644
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@@ -1,12 +1,11 @@
Dolphin Scheduler Official Website
[dolphinscheduler.apache.org](https://dolphinscheduler.apache.org)
-============
+==================================================================
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)](https://codecov.io/gh/apache/dolphinscheduler/branch/dev)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=apache-dolphinscheduler&metric=alert_status)](https://sonarcloud.io/dashboard?id=apache-dolphinscheduler)
-
[![Stargazers over time](https://starchart.cc/apache/dolphinscheduler.svg)](https://starchart.cc/apache/dolphinscheduler)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_zh_CN.md)
@@ -18,20 +17,20 @@ Dolphin Scheduler Official Website
其主要目标如下:
- - 以DAG图的方式将Task按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态
- - 支持丰富的任务类型:Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql)、Python、Sub_Process、Procedure等
- - 支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill任务等操作
- - 支持工作流优先级、任务优先级及任务的故障转移及任务超时告警/失败
- - 支持工作流全局参数及节点自定义参数设置
- - 支持资源文件的在线上传/下载,管理等,支持在线文件创建、编辑
- - 支持任务日志在线查看及滚动、在线下载日志等
- - 实现集群HA,通过Zookeeper实现Master集群和Worker集群去中心化
- - 支持对`Master/Worker` cpu load,memory,cpu在线查看
- - 支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计
- - 支持补数
- - 支持多租户
- - 支持国际化
- - 还有更多等待伙伴们探索
+- 以DAG图的方式将Task按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态
+- 支持丰富的任务类型:Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql)、Python、Sub_Process、Procedure等
+- 支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill任务等操作
+- 支持工作流优先级、任务优先级及任务的故障转移及任务超时告警/失败
+- 支持工作流全局参数及节点自定义参数设置
+- 支持资源文件的在线上传/下载,管理等,支持在线文件创建、编辑
+- 支持任务日志在线查看及滚动、在线下载日志等
+- 实现集群HA,通过Zookeeper实现Master集群和Worker集群去中心化
+- 支持对`Master/Worker` cpu load,memory,cpu在线查看
+- 支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计
+- 支持补数
+- 支持多租户
+- 支持国际化
+- 还有更多等待伙伴们探索
## 系统部分截图
diff --git a/deploy/README.md b/deploy/README.md
index c1b8fa543403..925c40530c8b 100644
--- a/deploy/README.md
+++ b/deploy/README.md
@@ -2,3 +2,4 @@
* [Start Up DolphinScheduler with Docker](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/start/docker.html)
* [Start Up DolphinScheduler with Kubernetes](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/installation/kubernetes.html)
+
diff --git a/docs/docs/en/DSIP.md b/docs/docs/en/DSIP.md
index 07d617875e86..69475f0804de 100644
--- a/docs/docs/en/DSIP.md
+++ b/docs/docs/en/DSIP.md
@@ -55,11 +55,11 @@ Here is the template for mail
```text
Hi community,
-
+
-
+
I already add a GitHub Issue for my proposal, which you could see in .
-
+
Looking forward any feedback for this thread.
```
@@ -89,3 +89,4 @@ closed and transfer from [current DSIPs][current-DSIPs] to [past DSIPs][past-DSI
[github-issue-choose]: https://github.com/apache/dolphinscheduler/issues/new/choose
[mail-to-dev]: mailto:dev@dolphinscheduler.apache.org
[DSIP-1]: https://github.com/apache/dolphinscheduler/issues/6407
+
diff --git a/docs/docs/en/about/features.md b/docs/docs/en/about/features.md
index 75393ce142d4..e45f75d565a1 100644
--- a/docs/docs/en/about/features.md
+++ b/docs/docs/en/about/features.md
@@ -16,4 +16,5 @@
## High Scalability
-- **Scalability**: Supports multitenancy and online resource management. Stable operation of 100,000 data tasks per day is supported.
\ No newline at end of file
+- **Scalability**: Supports multitenancy and online resource management. Stable operation of 100,000 data tasks per day is supported.
+
diff --git a/docs/docs/en/about/glossary.md b/docs/docs/en/about/glossary.md
index f8ad9355bcb4..dc3df7bb5c9c 100644
--- a/docs/docs/en/about/glossary.md
+++ b/docs/docs/en/about/glossary.md
@@ -71,4 +71,3 @@ process fails and ends
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation
ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued
-
diff --git a/docs/docs/en/about/hardware.md b/docs/docs/en/about/hardware.md
index f67066e8c9cf..b10a0b688096 100644
--- a/docs/docs/en/about/hardware.md
+++ b/docs/docs/en/about/hardware.md
@@ -6,15 +6,15 @@ This section briefs about the hardware requirements for DolphinScheduler. Dolphi
The Linux operating systems specified below can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN.
-| Operating System | Version |
-| :----------------------- | :----------: |
-| Red Hat Enterprise Linux | 7.0 and above |
-| CentOS | 7.0 and above |
-| Oracle Enterprise Linux | 7.0 and above |
+| Operating System | Version |
+|:-------------------------|:---------------:|
+| Red Hat Enterprise Linux | 7.0 and above |
+| CentOS | 7.0 and above |
+| Oracle Enterprise Linux | 7.0 and above |
| Ubuntu LTS | 16.04 and above |
> **Note:**
->The above Linux operating systems can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN.
+> The above Linux operating systems can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN.
## Server Configuration
@@ -23,8 +23,8 @@ DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architectu
### Production Environment
| **CPU** | **MEM** | **HD** | **NIC** | **Num** |
-| --- | --- | --- | --- | --- |
-| 4 core+ | 8 GB+ | SAS | GbE | 1+ |
+|---------|---------|--------|---------|---------|
+| 4 core+ | 8 GB+ | SAS | GbE | 1+ |
> **Note:**
> - The above recommended configuration is the minimum configuration for deploying DolphinScheduler. Higher configuration is strongly recommended for production environments.
@@ -34,11 +34,11 @@ DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architectu
DolphinScheduler provides the following network port configurations for normal operation:
-| Server | Port | Desc |
-| --- | --- | --- |
-| MasterServer | 5678 | not the communication port, require the native ports do not conflict |
-| WorkerServer | 1234 | not the communication port, require the native ports do not conflict |
-| ApiApplicationServer | 12345 | backend communication port |
+| Server | Port | Desc |
+|----------------------|-------|----------------------------------------------------------------------|
+| MasterServer | 5678 | not the communication port, require the native ports do not conflict |
+| WorkerServer | 1234 | not the communication port, require the native ports do not conflict |
+| ApiApplicationServer | 12345 | backend communication port |
> **Note:**
> - MasterServer and WorkerServer do not need to enable communication between the networks. As long as the local ports do not conflict.
@@ -46,4 +46,4 @@ DolphinScheduler provides the following network port configurations for normal o
## Browser Requirements
-The minimum supported version of Google Chrome is version 85, but version 90 or above is recommended.
\ No newline at end of file
+The minimum supported version of Google Chrome is version 85, but version 90 or above is recommended.
diff --git a/docs/docs/en/about/introduction.md b/docs/docs/en/about/introduction.md
index 059401a4ac6c..4bc7ee49af0a 100644
--- a/docs/docs/en/about/introduction.md
+++ b/docs/docs/en/about/introduction.md
@@ -4,4 +4,4 @@ Apache DolphinScheduler provides a distributed and easy to expand visual workflo
Apache DolphinScheduler aims to solve complex big data task dependencies and to trigger relationships in data OPS orchestration for various big data applications. Solves the intricate dependencies of data R&D ETL and the inability to monitor the health status of tasks. DolphinScheduler assembles tasks in the Directed Acyclic Graph (DAG) streaming mode, which can monitor the execution status of tasks in time, and supports operations like retry, recovery failure from specified nodes, pause, resume, and kill tasks, etc.
-![Apache DolphinScheduler](../../../img/introduction_ui.png)
\ No newline at end of file
+![Apache DolphinScheduler](../../../img/introduction_ui.png)
diff --git a/docs/docs/en/architecture/cache.md b/docs/docs/en/architecture/cache.md
index 3885dddd2447..6084a5cc6569 100644
--- a/docs/docs/en/architecture/cache.md
+++ b/docs/docs/en/architecture/cache.md
@@ -39,4 +39,4 @@ Note: the final strategy for cache update comes from the expiration strategy con
The sequence diagram shows below:
-
\ No newline at end of file
+
diff --git a/docs/docs/en/architecture/configuration.md b/docs/docs/en/architecture/configuration.md
index cfa853c8960b..279411ef75b2 100644
--- a/docs/docs/en/architecture/configuration.md
+++ b/docs/docs/en/architecture/configuration.md
@@ -101,8 +101,6 @@ The directory structure of DolphinScheduler is as follows:
## Configurations in Details
-
-
### dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]
dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown.
@@ -110,6 +108,7 @@ Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via do
Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.
Default simplified parameters are:
+
```bash
export DOLPHINSCHEDULER_OPTS="
-server
@@ -157,8 +156,8 @@ The default configuration is as follows:
Note that DolphinScheduler also supports database configuration through `bin/env/dolphinscheduler_env.sh`.
-
### Zookeeper related configuration
+
DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:
|Service| Configuration file |
|--|--|
@@ -226,8 +225,8 @@ The default configuration is as follows:
|alert.rpc.port | 50052 | the RPC port of Alert Server|
|zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin|
-
### Api-server related configuration
+
Location: `api-server/conf/application.yaml`
|Parameters | Default value| Description|
@@ -257,6 +256,7 @@ Location: `api-server/conf/application.yaml`
|traffic.control.customize-tenant-qps-rate||customize tenant max request number per second|
### Master Server related configuration
+
Location: `master-server/conf/application.yaml`
|Parameters | Default value| Description|
@@ -278,8 +278,8 @@ Location: `master-server/conf/application.yaml`
|master.registry-disconnect-strategy.strategy|stop|Used when the master disconnect from registry, default value: stop. Optional values include stop, waiting|
|master.registry-disconnect-strategy.max-waiting-time|100s|Used when the master disconnect from registry, and the disconnect strategy is waiting, this config means the master will waiting to reconnect to registry in given times, and after the waiting times, if the master still cannot connect to registry, will stop itself, if the value is 0s, the Master will waitting infinitely|
-
### Worker Server related configuration
+
Location: `worker-server/conf/application.yaml`
|Parameters | Default value| Description|
@@ -298,6 +298,7 @@ Location: `worker-server/conf/application.yaml`
|worker.registry-disconnect-strategy.max-waiting-time|100s|Used when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will waitting infinitely |
### Alert Server related configuration
+
Location: `alert-server/conf/application.yaml`
|Parameters | Default value| Description|
@@ -305,7 +306,6 @@ Location: `alert-server/conf/application.yaml`
|server.port|50053|the port of Alert Server|
|alert.port|50052|the port of alert|
-
### Quartz related configuration
This part describes quartz configs and configure them based on your practical situation and resources.
@@ -335,7 +335,6 @@ The default configuration is as follows:
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
-
### dolphinscheduler_env.sh [load environment variables configs]
When using shell to commit tasks, DolphinScheduler will export environment variables from `bin/env/dolphinscheduler_env.sh`. The
diff --git a/docs/docs/en/architecture/design.md b/docs/docs/en/architecture/design.md
index 9e09e15948c0..9579ab36517f 100644
--- a/docs/docs/en/architecture/design.md
+++ b/docs/docs/en/architecture/design.md
@@ -22,58 +22,58 @@
### Architecture Description
-* **MasterServer**
+* **MasterServer**
- MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time.
- When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper.
- MasterServer provides monitoring services based on netty.
+ MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time.
+ When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper.
+ MasterServer provides monitoring services based on netty.
- #### The Service Mainly Includes:
-
- - **DistributedQuartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;
+ #### The Service Mainly Includes:
- - **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**;
+ - **DistributedQuartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task;
- - **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types;
+ - **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**;
- - **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance;
+ - **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types;
- - **EventExecuteService** is mainly responsible for the polling of the event queue of the process instances;
+ - **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance;
- - **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance;
+ - **EventExecuteService** is mainly responsible for the polling of the event queue of the process instances;
- - **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance;
+ - **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance;
-* **WorkerServer**
+ - **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance;
- WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services.
+* **WorkerServer**
- When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat.
- WorkerServer provides monitoring services based on netty.
-
- #### The Service Mainly Includes:
+ WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services.
- - **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing;
+ When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat.
+ WorkerServer provides monitoring services based on netty.
- - **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types;
+ #### The Service Mainly Includes:
- - **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status;
+ - **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing;
-* **ZooKeeper**
+ - **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types;
- ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper.
+ - **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status;
- We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation.
+* **ZooKeeper**
-* **AlertServer**
+ ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper.
+
+ We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation.
+
+* **AlertServer**
Provides alarm services, and implements rich alarm methods through alarm plugins.
-* **API**
+* **API**
- The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external.
+ The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external.
-* **UI**
+* **UI**
The front-end page of the system provides various visual operation interfaces of the system, see more at [Introduction to Functions](../guide/homepage.md) section.
@@ -84,6 +84,7 @@
##### Centralized Thinking
The centralized design concept is relatively simple. The nodes in the distributed cluster are roughly divided into two roles according to responsibilities:
+
@@ -120,8 +121,6 @@ The service fault-tolerance design relies on ZooKeeper's Watcher mechanism, and
Among them, the Master monitors the directories of other Masters and Workers. If the remove event is triggered, perform fault tolerance of the process instance or task instance according to the specific business logic.
-
-
- Master fault tolerance:
@@ -146,7 +145,7 @@ Fault-tolerant content: When sending the remove event of the Worker node, the Ma
Fault-tolerant post-processing: Once the Master Scheduler thread finds that the task instance is in the "fault-tolerant" state, it takes over the task and resubmits it.
- Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service.
+Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service.
##### Task Failed and Try Again
@@ -170,26 +169,26 @@ If there is a task failure in the workflow that reaches the maximum retry times,
In the early schedule design, if there is no priority design and use the fair scheduling, the task submitted first may complete at the same time with the task submitted later, thus invalid the priority of process or task. So we have re-designed this, and the following is our current design:
-- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
- - The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
+- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest.
+ - The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string.
+ - The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
- - The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
-
-
-
+
+
+
- - The priority of the task is also divides into 5 levels, ordered by HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below:
-
-
-
+ - The priority of the task is also divides into 5 levels, ordered by HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below:
-#### Logback and Netty Implement Log Access
+
+
+
-- Since Web (UI) and Worker are not always on the same machine, to view the log cannot be like querying a local file. There are two options:
- - Put logs on the ES search engine.
- - Obtain remote log information through netty communication.
+#### Logback and Netty Implement Log Access
-- In consideration of the lightness of DolphinScheduler as much as possible, so choose gRPC to achieve remote access to log information.
+- Since Web (UI) and Worker are not always on the same machine, to view the log cannot be like querying a local file. There are two options:
+- Put logs on the ES search engine.
+- Obtain remote log information through netty communication.
+- In consideration of the lightness of DolphinScheduler as much as possible, so choose gRPC to achieve remote access to log information.
diff --git a/docs/docs/en/architecture/load-balance.md b/docs/docs/en/architecture/load-balance.md
index 5cf2d4e34ad3..f58f864128e1 100644
--- a/docs/docs/en/architecture/load-balance.md
+++ b/docs/docs/en/architecture/load-balance.md
@@ -57,3 +57,4 @@ You can customise the configuration by changing the following properties in work
- worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2)
- worker.reserved.memory=0.3 (worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G)
+
diff --git a/docs/docs/en/architecture/metadata.md b/docs/docs/en/architecture/metadata.md
index 2e55e1d9258f..b4633707f532 100644
--- a/docs/docs/en/architecture/metadata.md
+++ b/docs/docs/en/architecture/metadata.md
@@ -1,8 +1,8 @@
# MetaData
## Table Schema
-see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
+see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
---
@@ -15,7 +15,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- One tenant can own Multiple users.
- The queue field in the `t_ds_user` table stores the `queue_name` information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using `queue_id` column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue.
- The `user_id` field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source.
-
+
### Project Resource Alert
![image.png](../../../img/metadata-erd/project-resource-alert.png)
@@ -26,6 +26,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- The `user_id` in the `t_ds_udfs` table represents the user who create the UDF, and the `user_id` in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF.
### Project - Tenant - ProcessDefinition - Schedule
+
![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png)
- A project can have multiple process definitions, and each process definition belongs to only one project.
@@ -33,8 +34,10 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql`
- A workflow definition can have one or more schedules.
### Process Definition Execution
+
![image.png](../../../img/metadata-erd/process_definition.png)
- A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is `code + version`. When the pre-task of the task is empty, the corresponding `pre_task_node` and `pre_task_version` are 0.
- A process definition can have multiple process instances `t_ds_process_instance`, one process instance corresponds to one or more task instances `t_ds_task_instance`.
-- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively.
\ No newline at end of file
+- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively.
+
diff --git a/docs/docs/en/architecture/task-structure.md b/docs/docs/en/architecture/task-structure.md
index bd6482401695..73042e3dccbf 100644
--- a/docs/docs/en/architecture/task-structure.md
+++ b/docs/docs/en/architecture/task-structure.md
@@ -6,28 +6,28 @@ All tasks in DolphinScheduler are saved in the `t_ds_process_definition` table.
The following shows the `t_ds_process_definition` table structure:
-No. | field | type | description
--------- | ---------| -------- | ---------
-1|id|int(11)|primary key
-2|name|varchar(255)|process definition name
-3|version|int(11)|process definition version
-4|release_state|tinyint(4)|release status of process definition: 0 not released, 1 released
-5|project_id|int(11)|project id
-6|user_id|int(11)|user id of the process definition
-7|process_definition_json|longtext|process definition JSON
-8|description|text|process definition description
-9|global_params|text|global parameters
-10|flag|tinyint(4)|specify whether the process is available: 0 is not available, 1 is available
-11|locations|text|node location information
-12|connects|text|node connectivity info
-13|receivers|text|receivers
-14|receivers_cc|text|CC receivers
-15|create_time|datetime|create time
-16|timeout|int(11) |timeout
-17|tenant_id|int(11) |tenant id
-18|update_time|datetime|update time
-19|modify_by|varchar(36)|specify the user that made the modification
-20|resource_ids|varchar(255)|resource ids
+| No. | field | type | description |
+|-----|-------------------------|--------------|------------------------------------------------------------------------------|
+| 1 | id | int(11) | primary key |
+| 2 | name | varchar(255) | process definition name |
+| 3 | version | int(11) | process definition version |
+| 4 | release_state | tinyint(4) | release status of process definition: 0 not released, 1 released |
+| 5 | project_id | int(11) | project id |
+| 6 | user_id | int(11) | user id of the process definition |
+| 7 | process_definition_json | longtext | process definition JSON |
+| 8 | description | text | process definition description |
+| 9 | global_params | text | global parameters |
+| 10 | flag | tinyint(4) | specify whether the process is available: 0 is not available, 1 is available |
+| 11 | locations | text | node location information |
+| 12 | connects | text | node connectivity info |
+| 13 | receivers | text | receivers |
+| 14 | receivers_cc | text | CC receivers |
+| 15 | create_time | datetime | create time |
+| 16 | timeout | int(11) | timeout |
+| 17 | tenant_id | int(11) | tenant id |
+| 18 | update_time | datetime | update time |
+| 19 | modify_by | varchar(36) | specify the user that made the modification |
+| 20 | resource_ids | varchar(255) | resource ids |
The `process_definition_json` field is the core field, which defines the task information in the DAG diagram, and it is stored in JSON format.
@@ -40,6 +40,7 @@ No. | field | type | description
4|timeout|int|timeout
Data example:
+
```bash
{
"globalParams":[
@@ -74,7 +75,7 @@ No.|parameter name||type|description |notes
9|runFlag | |String |execution flag| |
10|conditionResult | |Object|condition branch | |
11| | successNode| Array|jump to node if success| |
-12| | failedNode|Array|jump to node if failure|
+12| | failedNode|Array|jump to node if failure|
13| dependence| |Object |task dependency |mutual exclusion with params
14|maxRetryTimes | |String|max retry times | |
15|retryInterval | |String |retry interval| |
@@ -159,7 +160,7 @@ No.|parameter name||type|description |note
19|runFlag | |String |execution flag| |
20|conditionResult | |Object|condition branch | |
21| | successNode| Array|jump to node if success| |
-22| | failedNode|Array|jump to node if failure|
+22| | failedNode|Array|jump to node if failure|
23| dependence| |Object |task dependency |mutual exclusion with params
24|maxRetryTimes | |String|max retry times | |
25|retryInterval | |String |retry interval| |
@@ -238,38 +239,38 @@ No.|parameter name||type|description |note
**The following shows the node data structure:**
-No.|parameter name||type|description |notes
--------- | ---------| ---------| -------- | --------- | ---------
-1|id | |String| task Id|
-2|type ||String |task type |SPARK
-3| name| |String|task name |
-4| params| |Object|customized parameters |JSON format
-5| |mainClass |String | main class
-6| |mainArgs | String| execution arguments
-7| |others | String| other arguments
-8| |mainJar |Object | application jar package
-9| |deployMode |String |deployment mode |local,client,cluster
-10| |driverCores | String| driver cores
-11| |driverMemory | String| driver memory
-12| |numExecutors |String | executor count
-13| |executorMemory |String | executor memory
-14| |executorCores |String | executor cores
-15| |programType | String| program type|JAVA,SCALA,PYTHON
-16| | sparkVersion| String| Spark version| SPARK1 , SPARK2
-17| | localParams| Array|customized local parameters
-18| | resourceList| Array|resource files
-19|description | |String|description | |
-20|runFlag | |String |execution flag| |
-21|conditionResult | |Object|condition branch| |
-22| | successNode| Array|jump to node if success| |
-23| | failedNode|Array|jump to node if failure|
-24| dependence| |Object |task dependency |mutual exclusion with params
-25|maxRetryTimes | |String|max retry times | |
-26|retryInterval | |String |retry interval| |
-27|timeout | |Object|timeout | |
-28| taskInstancePriority| |String|task priority | |
-29|workerGroup | |String |Worker group| |
-30|preTasks | |Array|preposition tasks| |
+| No. | parameter name || type | description | notes |
+|-----|----------------------|----------------|--------|-----------------------------|------------------------------|
+| 1 | id | | String | task Id |
+| 2 | type || String | task type | SPARK |
+| 3 | name | | String | task name |
+| 4 | params | | Object | customized parameters | JSON format |
+| 5 | | mainClass | String | main class |
+| 6 | | mainArgs | String | execution arguments |
+| 7 | | others | String | other arguments |
+| 8 | | mainJar | Object | application jar package |
+| 9 | | deployMode | String | deployment mode | local,client,cluster |
+| 10 | | driverCores | String | driver cores |
+| 11 | | driverMemory | String | driver memory |
+| 12 | | numExecutors | String | executor count |
+| 13 | | executorMemory | String | executor memory |
+| 14 | | executorCores | String | executor cores |
+| 15 | | programType | String | program type | JAVA,SCALA,PYTHON |
+| 16 | | sparkVersion | String | Spark version | SPARK1 , SPARK2 |
+| 17 | | localParams | Array | customized local parameters |
+| 18 | | resourceList | Array | resource files |
+| 19 | description | | String | description | |
+| 20 | runFlag | | String | execution flag | |
+| 21 | conditionResult | | Object | condition branch | |
+| 22 | | successNode | Array | jump to node if success | |
+| 23 | | failedNode | Array | jump to node if failure |
+| 24 | dependence | | Object | task dependency | mutual exclusion with params |
+| 25 | maxRetryTimes | | String | max retry times | |
+| 26 | retryInterval | | String | retry interval | |
+| 27 | timeout | | Object | timeout | |
+| 28 | taskInstancePriority | | String | task priority | |
+| 29 | workerGroup | | String | Worker group | |
+| 30 | preTasks | | Array | preposition tasks | |
**Node data example:**
@@ -336,31 +337,31 @@ No.|parameter name||type|description |notes
**The following shows the node data structure:**
-No.|parameter name||type|description |notes
--------- | ---------| ---------| -------- | --------- | ---------
-1|id | |String| task Id|
-2|type ||String |task type |MR
-3| name| |String|task name |
-4| params| |Object|customized parameters |JSON format
-5| |mainClass |String | main class
-6| |mainArgs | String|execution arguments
-7| |others | String|other arguments
-8| |mainJar |Object | application jar package
-9| |programType | String|program type|JAVA,PYTHON
-10| | localParams| Array|customized local parameters
-11| | resourceList| Array|resource files
-12|description | |String|description | |
-13|runFlag | |String |execution flag| |
-14|conditionResult | |Object|condition branch| |
-15| | successNode| Array|jump to node if success| |
-16| | failedNode|Array|jump to node if failure|
-17| dependence| |Object |task dependency |mutual exclusion with params
-18|maxRetryTimes | |String|max retry times | |
-19|retryInterval | |String |retry interval| |
-20|timeout | |Object|timeout | |
-21| taskInstancePriority| |String|task priority| |
-22|workerGroup | |String |Worker group| |
-23|preTasks | |Array|preposition tasks| |
+| No. | parameter name || type | description | notes |
+|-----|----------------------|--------------|--------|-----------------------------|------------------------------|
+| 1 | id | | String | task Id |
+| 2 | type || String | task type | MR |
+| 3 | name | | String | task name |
+| 4 | params | | Object | customized parameters | JSON format |
+| 5 | | mainClass | String | main class |
+| 6 | | mainArgs | String | execution arguments |
+| 7 | | others | String | other arguments |
+| 8 | | mainJar | Object | application jar package |
+| 9 | | programType | String | program type | JAVA,PYTHON |
+| 10 | | localParams | Array | customized local parameters |
+| 11 | | resourceList | Array | resource files |
+| 12 | description | | String | description | |
+| 13 | runFlag | | String | execution flag | |
+| 14 | conditionResult | | Object | condition branch | |
+| 15 | | successNode | Array | jump to node if success | |
+| 16 | | failedNode | Array | jump to node if failure |
+| 17 | dependence | | Object | task dependency | mutual exclusion with params |
+| 18 | maxRetryTimes | | String | max retry times | |
+| 19 | retryInterval | | String | retry interval | |
+| 20 | timeout | | Object | timeout | |
+| 21 | taskInstancePriority | | String | task priority | |
+| 22 | workerGroup | | String | Worker group | |
+| 23 | preTasks | | Array | preposition tasks | |
**Node data example:**
@@ -432,7 +433,7 @@ No.|parameter name||type|description |notes
9|runFlag | |String |execution flag| |
10|conditionResult | |Object|condition branch| |
11| | successNode| Array|jump to node if success| |
-12| | failedNode|Array|jump to node if failure |
+12| | failedNode|Array|jump to node if failure |
13| dependence| |Object |task dependency |mutual exclusion with params
14|maxRetryTimes | |String|max retry times | |
15|retryInterval | |String |retry interval| |
@@ -493,36 +494,36 @@ No.|parameter name||type|description |notes
**The following shows the node data structure:**
-No.|parameter name||type|description |notes
--------- | ---------| ---------| -------- | --------- | ---------
-1|id | |String|task Id|
-2|type ||String |task type|FLINK
-3| name| |String|task name|
-4| params| |Object|customized parameters |JSON format
-5| |mainClass |String |main class
-6| |mainArgs | String|execution arguments
-7| |others | String|other arguments
-8| |mainJar |Object |application jar package
-9| |deployMode |String |deployment mode |local,client,cluster
-10| |slot | String| slot count
-11| |taskManager |String | taskManager count
-12| |taskManagerMemory |String |taskManager memory size
-13| |jobManagerMemory |String | jobManager memory size
-14| |programType | String| program type|JAVA,SCALA,PYTHON
-15| | localParams| Array|local parameters
-16| | resourceList| Array|resource files
-17|description | |String|description | |
-18|runFlag | |String |execution flag| |
-19|conditionResult | |Object|condition branch| |
-20| | successNode| Array|jump node if success| |
-21| | failedNode|Array|jump node if failure|
-22| dependence| |Object |task dependency |mutual exclusion with params
-23|maxRetryTimes | |String|max retry times| |
-24|retryInterval | |String |retry interval| |
-25|timeout | |Object|timeout | |
-26| taskInstancePriority| |String|task priority| |
-27|workerGroup | |String |Worker group| |
-38|preTasks | |Array|preposition tasks| |
+| No. | parameter name || type | description | notes |
+|-----|----------------------|-------------------|--------|-------------------------|------------------------------|
+| 1 | id | | String | task Id |
+| 2 | type || String | task type | FLINK |
+| 3 | name | | String | task name |
+| 4 | params | | Object | customized parameters | JSON format |
+| 5 | | mainClass | String | main class |
+| 6 | | mainArgs | String | execution arguments |
+| 7 | | others | String | other arguments |
+| 8 | | mainJar | Object | application jar package |
+| 9 | | deployMode | String | deployment mode | local,client,cluster |
+| 10 | | slot | String | slot count |
+| 11 | | taskManager | String | taskManager count |
+| 12 | | taskManagerMemory | String | taskManager memory size |
+| 13 | | jobManagerMemory | String | jobManager memory size |
+| 14 | | programType | String | program type | JAVA,SCALA,PYTHON |
+| 15 | | localParams | Array | local parameters |
+| 16 | | resourceList | Array | resource files |
+| 17 | description | | String | description | |
+| 18 | runFlag | | String | execution flag | |
+| 19 | conditionResult | | Object | condition branch | |
+| 20 | | successNode | Array | jump node if success | |
+| 21 | | failedNode | Array | jump node if failure |
+| 22 | dependence | | Object | task dependency | mutual exclusion with params |
+| 23 | maxRetryTimes | | String | max retry times | |
+| 24 | retryInterval | | String | retry interval | |
+| 25 | timeout | | Object | timeout | |
+| 26 | taskInstancePriority | | String | task priority | |
+| 27 | workerGroup | | String | Worker group | |
+| 38 | preTasks | | Array | preposition tasks | |
**Node data example:**
@@ -588,30 +589,30 @@ No.|parameter name||type|description |notes
**The following shows the node data structure:**
-No.|parameter name||type|description |notes
--------- | ---------| ---------| -------- | --------- | ---------
-1|id | |String|task Id|
-2|type ||String |task type|HTTP
-3| name| |String|task name|
-4| params| |Object|customized parameters |JSON format
-5| |url |String |request url
-6| |httpMethod | String|http method|GET,POST,HEAD,PUT,DELETE
-7| | httpParams| Array|http parameters
-8| |httpCheckCondition | String|validation of HTTP code status|default code 200
-9| |condition |String |validation conditions
-10| | localParams| Array|customized local parameters
-11|description | |String|description| |
-12|runFlag | |String |execution flag| |
-13|conditionResult | |Object|condition branch| |
-14| | successNode| Array|jump node if success| |
-15| | failedNode|Array|jump node if failure|
-16| dependence| |Object |task dependency |mutual exclusion with params
-17|maxRetryTimes | |String|max retry times | |
-18|retryInterval | |String |retry interval| |
-19|timeout | |Object|timeout | |
-20| taskInstancePriority| |String|task priority| |
-21|workerGroup | |String |Worker group| |
-22|preTasks | |Array|preposition tasks| |
+| No. | parameter name || type | description | notes |
+|-----|----------------------|--------------------|--------|--------------------------------|------------------------------|
+| 1 | id | | String | task Id |
+| 2 | type || String | task type | HTTP |
+| 3 | name | | String | task name |
+| 4 | params | | Object | customized parameters | JSON format |
+| 5 | | url | String | request url |
+| 6 | | httpMethod | String | http method | GET,POST,HEAD,PUT,DELETE |
+| 7 | | httpParams | Array | http parameters |
+| 8 | | httpCheckCondition | String | validation of HTTP code status | default code 200 |
+| 9 | | condition | String | validation conditions |
+| 10 | | localParams | Array | customized local parameters |
+| 11 | description | | String | description | |
+| 12 | runFlag | | String | execution flag | |
+| 13 | conditionResult | | Object | condition branch | |
+| 14 | | successNode | Array | jump node if success | |
+| 15 | | failedNode | Array | jump node if failure |
+| 16 | dependence | | Object | task dependency | mutual exclusion with params |
+| 17 | maxRetryTimes | | String | max retry times | |
+| 18 | retryInterval | | String | retry interval | |
+| 19 | timeout | | Object | timeout | |
+| 20 | taskInstancePriority | | String | task priority | |
+| 21 | workerGroup | | String | Worker group | |
+| 22 | preTasks | | Array | preposition tasks | |
**Node data example:**
@@ -682,7 +683,7 @@ No.|parameter name||type|description |notes
6| |dsType |String | datasource type
7| |dataSource |Int | datasource ID
8| |dtType | String|target database type
-9| |dataTarget | Int|target database ID
+9| |dataTarget | Int|target database ID
10| |sql |String | SQL statements
11| |targetTable |String |target table
12| |jobSpeedByte |Int |job speed limiting(bytes)
@@ -695,7 +696,7 @@ No.|parameter name||type|description |notes
19|runFlag | |String |execution flag| |
20|conditionResult | |Object|condition branch| |
21| | successNode| Array|jump node if success| |
-22| | failedNode|Array|jump node if failure|
+22| | failedNode|Array|jump node if failure|
23| dependence| |Object |task dependency |mutual exclusion with params
24|maxRetryTimes | |String|max retry times| |
25|retryInterval | |String |retry interval| |
@@ -776,7 +777,7 @@ No.|parameter name||type|description |notes
13|runFlag | |String |execution flag| |
14|conditionResult | |Object|condition branch| |
15| | successNode| Array|jump node if success| |
-16| | failedNode|Array|jump node if failure|
+16| | failedNode|Array|jump node if failure|
17| dependence| |Object |task dependency |mutual exclusion with params
18|maxRetryTimes | |String|max retry times| |
19|retryInterval | |String |retry interval| |
@@ -844,7 +845,7 @@ No.|parameter name||type|description |notes
6|runFlag | |String |execution flag| |
7|conditionResult | |Object|condition branch | |
8| | successNode| Array|jump to node if success| |
-9| | failedNode|Array|jump to node if failure|
+9| | failedNode|Array|jump to node if failure|
10| dependence| |Object |task dependency |mutual exclusion with params
11|maxRetryTimes | |String|max retry times | |
12|retryInterval | |String |retry interval| |
@@ -909,7 +910,7 @@ No.|parameter name||type|description |notes
7|runFlag | |String |execution flag| |
8|conditionResult | |Object|condition branch | |
9| | successNode| Array|jump to node if success| |
-10| | failedNode|Array|jump to node if failure|
+10| | failedNode|Array|jump to node if failure|
11| dependence| |Object |task dependency |mutual exclusion with params
12|maxRetryTimes | |String|max retry times| |
13|retryInterval | |String |retry interval| |
@@ -970,7 +971,7 @@ No.|parameter name||type|description |notes
9|runFlag | |String |execution flag| |
10|conditionResult | |Object|condition branch| |
11| | successNode| Array|jump to node if success| |
-12| | failedNode|Array|jump to node if failure|
+12| | failedNode|Array|jump to node if failure|
13| dependence| |Object |task dependency |mutual exclusion with params
14| | relation|String |relation|AND,OR
15| | dependTaskList|Array |dependent task list|
@@ -1111,4 +1112,5 @@ No.|parameter name||type|description |notes
]
}
-```
\ No newline at end of file
+```
+
diff --git a/docs/docs/en/contribute/api-standard.md b/docs/docs/en/contribute/api-standard.md
index 61d6622165c3..cebde7f3a751 100644
--- a/docs/docs/en/contribute/api-standard.md
+++ b/docs/docs/en/contribute/api-standard.md
@@ -1,9 +1,11 @@
# API design standard
+
A standardized and unified API is the cornerstone of project design.The API of DolphinScheduler follows the REST ful standard. REST ful is currently the most popular Internet software architecture. It has a clear structure, conforms to standards, is easy to understand and extend.
This article uses the DolphinScheduler API as an example to explain how to construct a Restful API.
## 1. URI design
+
REST is "Representational State Transfer".The design of Restful URI is based on resources.The resource corresponds to an entity on the network, for example: a piece of text, a picture, and a service. And each resource corresponds to a URI.
+ One Kind of Resource: expressed in the plural, such as `task-instances`、`groups` ;
@@ -12,36 +14,43 @@ REST is "Representational State Transfer".The design of Restful URI is based on
+ A Sub Resource:`/instances/{instanceId}/tasks/{taskId}`;
## 2. Method design
+
We need to locate a certain resource by URI, and then use Method or declare actions in the path suffix to reflect the operation of the resource.
### ① Query - GET
+
Use URI to locate the resource, and use GET to indicate query.
+ When the URI is a type of resource, it means to query a type of resource. For example, the following example indicates paging query `alter-groups`.
+
```
Method: GET
/dolphinscheduler/alert-groups
```
+ When the URI is a single resource, it means to query this resource. For example, the following example means to query the specified `alter-group`.
+
```
Method: GET
/dolphinscheduler/alter-groups/{id}
```
+ In addition, we can also express query sub-resources based on URI, as follows:
+
```
Method: GET
/dolphinscheduler/projects/{projectId}/tasks
```
**The above examples all represent paging query. If we need to query all data, we need to add `/list` after the URI to distinguish. Do not mix the same API for both paged query and query.**
+
```
Method: GET
/dolphinscheduler/alert-groups/list
```
### ② Create - POST
+
Use URI to locate the resource, use POST to indicate create, and then return the created id to requester.
+ create an `alter-group`:
@@ -52,35 +61,42 @@ Method: POST
```
+ create sub-resources is also the same as above.
+
```
Method: POST
/dolphinscheduler/alter-groups/{alterGroupId}/tasks
```
### ③ Modify - PUT
+
Use URI to locate the resource, use PUT to indicate modify.
+ modify an `alert-group`
+
```
Method: PUT
/dolphinscheduler/alter-groups/{alterGroupId}
```
### ④ Delete -DELETE
+
Use URI to locate the resource, use DELETE to indicate delete.
+ delete an `alert-group`
+
```
Method: DELETE
/dolphinscheduler/alter-groups/{alterGroupId}
```
+ batch deletion: batch delete the id array,we should use POST. **(Do not use the DELETE method, because the body of the DELETE request has no semantic meaning, and it is possible that some gateways, proxies, and firewalls will directly strip off the request body after receiving the DELETE request.)**
+
```
Method: POST
/dolphinscheduler/alter-groups/batch-delete
```
### ⑤ Partial Modifications -PATCH
+
Use URI to locate the resource, use PATCH to partial modifications.
```
@@ -89,20 +105,27 @@ Method: PATCH
```
### ⑥ Others
+
In addition to creating, deleting, modifying and quering, we also locate the corresponding resource through url, and then append operations to it after the path, such as:
+
```
/dolphinscheduler/alert-groups/verify-name
/dolphinscheduler/projects/{projectCode}/process-instances/{code}/view-gantt
```
## 3. Parameter design
+
There are two types of parameters, one is request parameter and the other is path parameter. And the parameter must use small hump.
In the case of paging, if the parameter entered by the user is less than 1, the front end needs to automatically turn to 1, indicating that the first page is requested; When the backend finds that the parameter entered by the user is greater than the total number of pages, it should directly return to the last page.
## 4. Others design
+
### base URL
+
The URI of the project needs to use `/` as the base path, so as to identify that these APIs are under this project.
+
```
/dolphinscheduler
-```
\ No newline at end of file
+```
+
diff --git a/docs/docs/en/contribute/api-test.md b/docs/docs/en/contribute/api-test.md
index 7953e9dbd86c..c7005e954077 100644
--- a/docs/docs/en/contribute/api-test.md
+++ b/docs/docs/en/contribute/api-test.md
@@ -10,7 +10,6 @@ In contrast, API testing focuses on whether a complete operation chain can be co
For example, the API test of the tenant management interface focuses on whether users can log in normally; If the login fails, whether the error message can be displayed correctly. After logging in, you can perform tenant management operations through the sessionid you carry.
-
## API Test
### API-Pages
@@ -49,7 +48,6 @@ In addition, during the testing process, the interface are not requested directl
On the login page, only the input parameter specification of the interface request is defined. For the output parameter of the interface request, only the unified basic response structure is defined. The data actually returned by the interface is tested in the actual test case. Whether the input and output of main test interfaces can meet the requirements of test cases.
-
### API-Cases
The following is an example of a tenant management test. As explained earlier, we use docker-compose for deployment, so for each test case, we need to import the corresponding file in the form of an annotation.
@@ -86,7 +84,7 @@ https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-api-test/do
## Supplements
-When running API tests locally, First, you need to start the local service, you can refer to this page:
+When running API tests locally, First, you need to start the local service, you can refer to this page:
[development-environment-setup](./development-environment-setup.md)
When running API tests locally, the `-Dlocal=true` parameter can be configured to connect locally and facilitate changes to the UI.
diff --git a/docs/docs/en/contribute/architecture-design.md b/docs/docs/en/contribute/architecture-design.md
index a46bfb285932..1e50f2592d0c 100644
--- a/docs/docs/en/contribute/architecture-design.md
+++ b/docs/docs/en/contribute/architecture-design.md
@@ -1,4 +1,5 @@
## Architecture Design
+
Before explaining the architecture of the schedule system, let us first understand the common nouns of the schedule system.
### 1.Noun Interpretation
@@ -12,7 +13,7 @@ Before explaining the architecture of the schedule system, let us first understa
-**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes
+**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes
**Process instance**: A process instance is an instantiation of a process definition, which can be generated by manual startup or scheduling. The process definition runs once, a new process instance is generated
@@ -34,11 +35,10 @@ Before explaining the architecture of the schedule system, let us first understa
**Complement**: Complement historical data, support **interval parallel and serial** two complement methods
-
-
### 2.System architecture
#### 2.1 System Architecture Diagram
+
@@ -46,60 +46,51 @@ Before explaining the architecture of the schedule system, let us first understa
-
-
#### 2.2 Architectural description
-* **MasterServer**
+* **MasterServer**
- MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
- When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
+ MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer.
+ When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing.
-
+ ##### The service mainly contains:
- ##### The service mainly contains:
+ - **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
- - **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task.
+ - **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types**
- - **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types**
+ - **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types
- - **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types
+ - **MasterTaskExecThread** is mainly responsible for task persistence
- - **MasterTaskExecThread** is mainly responsible for task persistence
+* **WorkerServer**
-
+ - WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
-* **WorkerServer**
+ ##### This service contains:
- - WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat.
+ - **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
+ - **ZooKeeper**
- ##### This service contains:
+ The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
+ We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation.
- - **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types.
+ - **Task Queue**
- - **ZooKeeper**
+ The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance.
- The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper.
- We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation.
+ - **Alert**
- - **Task Queue**
+ Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**.
- The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance.
+ - **API**
- - **Alert**
+ The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally.
+ Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more.
- Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**.
+ - **UI**
- - **API**
-
- The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally.
- Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more.
-
- - **UI**
-
- The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section.
-
-
+ The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section.
#### 2.3 Architectural Design Ideas
@@ -130,10 +121,9 @@ Problems in the design of centralized :
- In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features.
- The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions.
- In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go.
-
- Decentralization of DolphinScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task.
-##### 二、Distributed lock practice
+##### 二、Distributed lock practice
DolphinScheduler uses ZooKeeper distributed locks to implement only one Master to execute the Scheduler at the same time, or only one Worker to perform task submission.
@@ -184,8 +174,6 @@ Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The impl
The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic.
-
-
- Master fault tolerance flow chart:
@@ -194,8 +182,6 @@ The Master monitors the directories of other Masters and Workers. If the remove
After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in DolphinScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance.
-
-
- Worker fault tolerance flow chart:
@@ -204,7 +190,7 @@ After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler
Once the Master Scheduler thread finds the task instance as "need to be fault tolerant", it takes over the task and resubmits.
- Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service.
+Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service.
###### 2. Task failure retry
@@ -214,8 +200,6 @@ Here we must first distinguish between the concept of task failure retry, proces
- Process failure recovery is process level, is done manually, recovery can only be performed **from the failed node** or **from the current node**
- Process failure rerun is also process level, is done manually, rerun is from the start node
-
-
Next, let's talk about the topic, we divided the task nodes in the workflow into two types.
- One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on.
@@ -225,16 +209,12 @@ Each **service node** can configure the number of failed retries. When the task
If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed.
-
-
##### V. Task priority design
In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows:
- According to **different process instance priority** prioritizes **same process instance priority** prioritizes **task priority within the same process** takes precedence over **same process** commit order from high Go to low for task processing.
-
- The specific implementation is to resolve the priority according to the json of the task instance, and then save the **process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first.
-
- The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below
@@ -308,8 +288,6 @@ Public class TaskLogFilter extends Filter {
}
```
-
-
### summary
Starting from the scheduling, this paper introduces the architecture principle and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued
diff --git a/docs/docs/en/contribute/backend/mechanism/global-parameter.md b/docs/docs/en/contribute/backend/mechanism/global-parameter.md
index 53b73747d86d..b7e1f0897df4 100644
--- a/docs/docs/en/contribute/backend/mechanism/global-parameter.md
+++ b/docs/docs/en/contribute/backend/mechanism/global-parameter.md
@@ -59,3 +59,4 @@ Assign the parameters with matching values to varPool (List, which contains the
* Format the varPool as json and pass it to master.
* The parameters that are OUT would be written into the localParam after the master has received the varPool.
+
diff --git a/docs/docs/en/contribute/backend/mechanism/overview.md b/docs/docs/en/contribute/backend/mechanism/overview.md
index 4f0d592c46da..2054f283da91 100644
--- a/docs/docs/en/contribute/backend/mechanism/overview.md
+++ b/docs/docs/en/contribute/backend/mechanism/overview.md
@@ -1,6 +1,6 @@
# Overview
-
* [Global Parameter](global-parameter.md)
* [Switch Task type](task/switch.md)
+
diff --git a/docs/docs/en/contribute/backend/mechanism/task/switch.md b/docs/docs/en/contribute/backend/mechanism/task/switch.md
index 490510405ee8..fcff2643d629 100644
--- a/docs/docs/en/contribute/backend/mechanism/task/switch.md
+++ b/docs/docs/en/contribute/backend/mechanism/task/switch.md
@@ -6,3 +6,4 @@ Switch task workflow step as follows
* `SwitchTaskExecThread` processes the expressions defined in `switch` from top to bottom, obtains the value of the variable from `varPool`, and parses the expression through `javascript`. If the expression returns true, stop checking and record The order of the expression, here we record as resultConditionLocation. The task of SwitchTaskExecThread is over
* After the `switch` task runs, if there is no error (more commonly, the user-defined expression is out of specification or there is a problem with the parameter name), then `MasterExecThread.submitPostNode` will obtain the downstream node of the `DAG` to continue execution.
* If it is found in `DagHelper.parsePostNodes` that the current node (the node that has just completed the work) is a `switch` node, the `resultConditionLocation` will be obtained, and all branches except `resultConditionLocation` in the SwitchParameters will be skipped. In this way, only the branches that need to be executed are left
+
diff --git a/docs/docs/en/contribute/backend/spi/alert.md b/docs/docs/en/contribute/backend/spi/alert.md
index 9b6c45e54753..b7934242e196 100644
--- a/docs/docs/en/contribute/backend/spi/alert.md
+++ b/docs/docs/en/contribute/backend/spi/alert.md
@@ -6,7 +6,7 @@ DolphinScheduler is undergoing a microkernel + plug-in architecture change. All
For alarm-related codes, please refer to the `dolphinscheduler-alert-api` module. This module defines the extension interface of the alarm plug-in and some basic codes. When we need to realize the plug-inization of related functions, it is recommended to read the code of this block first. Of course, it is recommended that you read the document. This will reduce a lot of time, but the document There is a certain degree of lag. When the document is missing, it is recommended to take the source code as the standard (if you are interested, we also welcome you to submit related documents). In addition, we will hardly make changes to the extended interface (excluding new additions) , Unless there is a major structural adjustment, there is an incompatible upgrade version, so the existing documents can generally be satisfied.
-We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple.
+We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple.
In additional, the `AlertChannelFactory` extends from `PrioritySPI`, this means you can set the plugin priority, when you have two plugin has the same name, you can customize the priority by override the `getIdentify` method. The high priority plugin will be load, but if you have two plugin with the same name and same priority, the server will throw `IllegalArgumentException` when load the plugin.
@@ -26,8 +26,8 @@ If you don't care about its internal design, but simply want to know how to deve
This module is currently a plug-in provided by us, and now we have supported dozens of plug-ins, such as Email, DingTalk, Script, etc.
-
#### Alert SPI Main class information.
+
AlertChannelFactory
Alarm plug-in factory interface. All alarm plug-ins need to implement this interface. This interface is used to define the name of the alarm plug-in and the required parameters. The create method is used to create a specific alarm plug-in instance.
@@ -56,36 +56,40 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* Email
- Email alert notification
+ Email alert notification
* DingTalk
- Alert for DingTalk group chat bots
-
- Related parameter configuration can refer to the DingTalk robot document.
+ Alert for DingTalk group chat bots
+
+ Related parameter configuration can refer to the DingTalk robot document.
* EnterpriseWeChat
- EnterpriseWeChat alert notifications
+ EnterpriseWeChat alert notifications
- Related parameter configuration can refer to the EnterpriseWeChat robot document.
+ Related parameter configuration can refer to the EnterpriseWeChat robot document.
* Script
- We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications.
+ We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications.
* SMS
- SMS alerts
+ SMS alerts
+
* FeiShu
FeiShu alert notification
+
* Slack
Slack alert notification
+
* PagerDuty
PagerDuty alert notification
+
* WebexTeams
WebexTeams alert notification
@@ -95,9 +99,10 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design]
* Telegram
Telegram alert notification
-
+
Related parameter configuration can refer to the Telegram document.
* Http
We have implemented a Http script for alerting. And calling most of the alerting plug-ins end up being Http requests, if we not support your alert plug-in yet, you can use Http to realize your alert login. Also welcome to contribute your common plug-ins to the community :)
+
diff --git a/docs/docs/en/contribute/backend/spi/datasource.md b/docs/docs/en/contribute/backend/spi/datasource.md
index 9738e073309e..caf8a5be46ec 100644
--- a/docs/docs/en/contribute/backend/spi/datasource.md
+++ b/docs/docs/en/contribute/backend/spi/datasource.md
@@ -22,4 +22,4 @@ In additional, the `DataSourceChannelFactory` extends from `PrioritySPI`, this m
#### **Future plan**
-Support data sources such as kafka, http, files, sparkSQL, FlinkSQL, etc.
\ No newline at end of file
+Support data sources such as kafka, http, files, sparkSQL, FlinkSQL, etc.
diff --git a/docs/docs/en/contribute/backend/spi/registry.md b/docs/docs/en/contribute/backend/spi/registry.md
index 0957ff3cdd26..b612ba5dcda2 100644
--- a/docs/docs/en/contribute/backend/spi/registry.md
+++ b/docs/docs/en/contribute/backend/spi/registry.md
@@ -6,9 +6,10 @@ Make the following configuration (take zookeeper as an example)
* Registry plug-in configuration, take Zookeeper as an example (registry.properties)
dolphinscheduler-service/src/main/resources/registry.properties
+
```registry.properties
- registry.plugin.name=zookeeper
- registry.servers=127.0.0.1:2181
+ registry.plugin.name=zookeeper
+ registry.servers=127.0.0.1:2181
```
For specific configuration information, please refer to the parameter information provided by the specific plug-in, for example zk: `org/apache/dolphinscheduler/plugin/registry/zookeeper/ZookeeperConfiguration.java`
diff --git a/docs/docs/en/contribute/backend/spi/task.md b/docs/docs/en/contribute/backend/spi/task.md
index f909d42fa8c9..91ee108bad3f 100644
--- a/docs/docs/en/contribute/backend/spi/task.md
+++ b/docs/docs/en/contribute/backend/spi/task.md
@@ -14,4 +14,4 @@ In additional, the `TaskChannelFactory` extends from `PrioritySPI`, this means y
Since the task plug-in involves the front-end page, the front-end SPI has not yet been implemented, so you need to implement the front-end page corresponding to the plug-in separately.
-If there is a class conflict in the task plugin, you can use [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) to solve this problem.
\ No newline at end of file
+If there is a class conflict in the task plugin, you can use [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) to solve this problem.
diff --git a/docs/docs/en/contribute/e2e-test.md b/docs/docs/en/contribute/e2e-test.md
index 82affec552a2..c6c49168a839 100644
--- a/docs/docs/en/contribute/e2e-test.md
+++ b/docs/docs/en/contribute/e2e-test.md
@@ -77,31 +77,31 @@ In addition, during the testing process, the elements are not manipulated direct
The SecurityPage provides goToTab methods to test the corresponding sidebar jumps, mainly including TenantPage, UserPage, WorkerGroupPage and QueuePage. These pages are implemented in the same way, mainly to test whether the input, add and delete buttons of the form can return to the corresponding page.
```java
- public T goToTab(Class tab) {
- if (tab == TenantPage.class) {
- WebElement menuTenantManageElement = new WebDriverWait(driver, 60)
- .until(ExpectedConditions.elementToBeClickable(menuTenantManage));
- ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menuTenantManageElement);
- return tab.cast(new TenantPage(driver));
- }
- if (tab == UserPage.class) {
- WebElement menUserManageElement = new WebDriverWait(driver, 60)
- .until(ExpectedConditions.elementToBeClickable(menUserManage));
- ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menUserManageElement);
- return tab.cast(new UserPage(driver));
- }
- if (tab == WorkerGroupPage.class) {
- WebElement menWorkerGroupManageElement = new WebDriverWait(driver, 60)
- .until(ExpectedConditions.elementToBeClickable(menWorkerGroupManage));
- ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menWorkerGroupManageElement);
- return tab.cast(new WorkerGroupPage(driver));
- }
- if (tab == QueuePage.class) {
- menuQueueManage().click();
- return tab.cast(new QueuePage(driver));
- }
- throw new UnsupportedOperationException("Unknown tab: " + tab.getName());
- }
+public T goToTab(Class tab) {
+ if (tab == TenantPage.class) {
+ WebElement menuTenantManageElement = new WebDriverWait(driver, 60)
+ .until(ExpectedConditions.elementToBeClickable(menuTenantManage));
+ ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menuTenantManageElement);
+ return tab.cast(new TenantPage(driver));
+ }
+ if (tab == UserPage.class) {
+ WebElement menUserManageElement = new WebDriverWait(driver, 60)
+ .until(ExpectedConditions.elementToBeClickable(menUserManage));
+ ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menUserManageElement);
+ return tab.cast(new UserPage(driver));
+ }
+ if (tab == WorkerGroupPage.class) {
+ WebElement menWorkerGroupManageElement = new WebDriverWait(driver, 60)
+ .until(ExpectedConditions.elementToBeClickable(menWorkerGroupManage));
+ ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menWorkerGroupManageElement);
+ return tab.cast(new WorkerGroupPage(driver));
+ }
+ if (tab == QueuePage.class) {
+ menuQueueManage().click();
+ return tab.cast(new QueuePage(driver));
+ }
+ throw new UnsupportedOperationException("Unknown tab: " + tab.getName());
+ }
```
![SecurityPage](../../../img/e2e-test/SecurityPage.png)
@@ -146,14 +146,14 @@ The following is an example of a tenant management test. As explained earlier, w
The browser is loaded using the RemoteWebDriver provided with Selenium. Before each test case is started there is some preparation work that needs to be done. For example: logging in the user, jumping to the corresponding page (depending on the specific test case).
```java
- @BeforeAll
- public static void setup() {
- new LoginPage(browser)
- .login("admin", "dolphinscheduler123")
- .goToNav(SecurityPage.class)
- .goToTab(TenantPage.class)
- ;
- }
+@BeforeAll
+public static void setup() {
+ new LoginPage(browser)
+ .login("admin", "dolphinscheduler123")
+ .goToNav(SecurityPage.class)
+ .goToTab(TenantPage.class)
+ ;
+}
```
When the preparation is complete, it is time for the formal test case writing. We use a form of @Order() annotation for modularity, to confirm the order of the tests. After the tests have been run, assertions are used to determine if the tests were successful, and if the assertion returns true, the tenant creation was successful. The following code can be used as a reference:
@@ -176,14 +176,14 @@ The rest are similar cases and can be understood by referring to the specific so
https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-e2e/dolphinscheduler-e2e-case/src/test/java/org/apache/dolphinscheduler/e2e/cases
-## III. Supplements
+## III. Supplements
-When running E2E tests locally, First, you need to start the local service, you can refer to this page:
+When running E2E tests locally, First, you need to start the local service, you can refer to this page:
[development-environment-setup](./development-environment-setup.md)
When running E2E tests locally, the `-Dlocal=true` parameter can be configured to connect locally and facilitate changes to the UI.
-When running E2E tests with `M1` chip, you can use `-Dm1_chip=true` parameter to configure containers supported by
+When running E2E tests with `M1` chip, you can use `-Dm1_chip=true` parameter to configure containers supported by
`ARM64`.
![Dlocal](../../../img/e2e-test/Dlocal.png)
diff --git a/docs/docs/en/contribute/frontend-development.md b/docs/docs/en/contribute/frontend-development.md
index 297a7ccee0da..9ab23cc5be29 100644
--- a/docs/docs/en/contribute/frontend-development.md
+++ b/docs/docs/en/contribute/frontend-development.md
@@ -1,6 +1,7 @@
# Front-end development documentation
### Technical selection
+
```
Vue mvvm framework
@@ -17,10 +18,16 @@ Lodash high performance JavaScript utility library
### Development environment
-- #### Node installation
-Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
+-
+
+#### Node installation
+
+Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/`
+
+-
+
+#### Front-end project construction
-- #### Front-end project construction
Use the command line mode `cd` enter the `dolphinscheduler-ui` project directory and execute `npm install` to pull the project dependency package.
> If `npm install` is very slow, you can set the taobao mirror
@@ -36,13 +43,16 @@ npm config set registry http://registry.npm.taobao.org/
API_BASE = http://127.0.0.1:12345
```
-> ##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
+##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution.
```bash
npm install node-sass --unsafe-perm #Install node-sass dependency separately
```
-- #### Development environment operation
+-
+
+#### Development environment operation
+
- `npm start` project development environment (after startup address http://localhost:8888)
#### Front-end project release
@@ -140,6 +150,7 @@ Public module and utill `src/js/module`
Home => `http://localhost:8888/#/home`
Project Management => `http://localhost:8888/#/projects/list`
+
```
| Project Home
| Workflow
@@ -149,6 +160,7 @@ Project Management => `http://localhost:8888/#/projects/list`
```
Resource Management => `http://localhost:8888/#/resource/file`
+
```
| File Management
| udf Management
@@ -159,6 +171,7 @@ Resource Management => `http://localhost:8888/#/resource/file`
Data Source Management => `http://localhost:8888/#/datasource/list`
Security Center => `http://localhost:8888/#/security/tenant`
+
```
| Tenant Management
| User Management
@@ -174,16 +187,19 @@ User Center => `http://localhost:8888/#/user/account`
The project `src/js/conf/home` is divided into
`pages` => route to page directory
+
```
- The page file corresponding to the routing address
+The page file corresponding to the routing address
```
`router` => route management
+
```
vue router, the entry file index.js in each page will be registered. Specific operations: https://router.vuejs.org/zh/
```
`store` => status management
+
```
The page corresponding to each route has a state management file divided into:
@@ -201,9 +217,13 @@ Specific action:https://vuex.vuejs.org/zh/
```
## specification
+
## Vue specification
+
##### 1.Component name
+
The component is named multiple words and is connected with a wire (-) to avoid conflicts with HTML tags and a clearer structure.
+
```
// positive example
export default {
@@ -212,7 +232,9 @@ export default {
```
##### 2.Component files
+
The internal common component of the `src/js/module/components` project writes the folder name with the same name as the file name. The subcomponents and util tools that are split inside the common component are placed in the internal `_source` folder of the component.
+
```
└── components
├── header
@@ -228,6 +250,7 @@ The internal common component of the `src/js/module/components` project writes t
```
##### 3.Prop
+
When you define Prop, you should always name it in camel format (camelCase) and use the connection line (-) when assigning values to the parent component.
This follows the characteristics of each language, because it is case-insensitive in HTML tags, and the use of links is more friendly; in JavaScript, the more natural is the hump name.
@@ -270,7 +293,9 @@ props: {
```
##### 4.v-for
+
When performing v-for traversal, you should always bring a key value to make rendering more efficient when updating the DOM.
+
```
@@ -280,6 +305,7 @@ When performing v-for traversal, you should always bring a key value to make ren
```
v-for should be avoided on the same element as v-if (`for example:
`) because v-for has a higher priority than v-if. To avoid invalid calculations and rendering, you should try to use v-if Put it on top of the container's parent element.
+
```
@@ -289,7 +315,9 @@ v-for should be avoided on the same element as v-if (`for example:
`) becaus
```
##### 5.v-if / v-else-if / v-else
+
If the elements in the same set of v-if logic control are logically identical, Vue reuses the same part for more efficient element switching, `such as: value`. In order to avoid the unreasonable effect of multiplexing, you should add key to the same element for identification.
+
```
{{ mazeyData }}
@@ -300,12 +328,15 @@ If the elements in the same set of v-if logic control are logically identical, V
```
##### 6.Instruction abbreviation
+
In order to unify the specification, the instruction abbreviation is always used. Using `v-bind`, `v-on` is not bad. Here is only a unified specification.
+
```
```
##### 7.Top-level element order of single file components
+
Styles are packaged in a file, all the styles defined in a single vue file, the same name in other files will also take effect. All will have a top class name before creating a component.
Note: The sass plugin has been added to the project, and the sas syntax can be written directly in a single vue file.
For uniformity and ease of reading, they should be placed in the order of ``、`
```
##### 2.Naming
+
The naming of Class and ID should be semantic, and you can see what you are doing by looking at the name; multiple words are connected by a link.
+
```
// positive example
.test-header{
@@ -426,6 +468,7 @@ The naming of Class and ID should be semantic, and you can see what you are doin
```
##### 3.Attribute abbreviation
+
CSS attributes use abbreviations as much as possible to improve the efficiency and ease of understanding of the code.
```
@@ -439,6 +482,7 @@ border: 1px solid #ccc;
```
##### 4.Document type
+
The HTML5 standard should always be used.
```
@@ -446,7 +490,9 @@ The HTML5 standard should always be used.
```
##### 5.Notes
+
A block comment should be written to a module file.
+
```
/**
* @module mazey/api
@@ -457,7 +503,8 @@ A block comment should be written to a module file.
## interface
-##### All interfaces are returned as Promise
+##### All interfaces are returned as Promise
+
Note that non-zero is wrong for catching catch
```
@@ -477,6 +524,7 @@ test.then(res => {
```
Normal return
+
```
{
code:0,
@@ -486,6 +534,7 @@ Normal return
```
Error return
+
```
{
code:10000,
@@ -493,8 +542,10 @@ Error return
msg:'failed'
}
```
+
If the interface is a post request, the Content-Type defaults to application/x-www-form-urlencoded; if the Content-Type is changed to application/json,
Interface parameter transfer needs to be changed to the following way
+
```
io.post('url', payload, null, null, { emulateJSON: false } res => {
resolve(res)
@@ -524,6 +575,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(1) First place the icon icon of the node in the `src/js/conf/home/pages/dag/img `folder, and note the English name of the node defined by the `toolbar_${in the background. For example: SHELL}.png`
(2) Find the `tasksType` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
+
```
'DEPENDENT': { // The background definition node type English name is used as the key value
desc: 'DEPENDENT', // tooltip desc
@@ -532,6 +584,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
(3) Add a `${node type (lowercase)}`.vue file in `src/js/conf/home/pages/dag/_source/formModel/tasks`. The contents of the components related to the current node are written here. Must belong to a node component must have a function _verification () After the verification is successful, the relevant data of the current component is thrown to the parent component.
+
```
/**
* Verification
@@ -566,6 +619,7 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
(4) Common components used inside the node component are under` _source`, and `commcon.js` is used to configure public data.
##### 2.Increase the status type
+
(1) Find the `tasksState` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
```
@@ -579,7 +633,9 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
```
##### 3.Add the action bar tool
+
(1) Find the `toolOper` object in `src/js/conf/home/pages/dag/_source/config.js` and add it to it.
+
```
{
code: 'pointer', // tool identifier
@@ -599,13 +655,12 @@ User Center Related Interfaces `src/js/conf/home/store/user/actions.js`
`util.js` => belongs to the `plugIn` tool class
-
The operation is handled in the `src/js/conf/home/pages/dag/_source/dag.js` => `toolbarEvent` event.
-
##### 3.Add a routing page
(1) First add a routing address`src/js/conf/home/router/index.js` in route management
+
```
routing address{
path: '/test', // routing address
@@ -619,12 +674,12 @@ routing address{
(2)Create a `test` folder in `src/js/conf/home/pages` and create an `index.vue `entry file in the folder.
- This will give you direct access to`http://localhost:8888/#/test`
-
+ This will give you direct access to`http://localhost:8888/#/test`
##### 4.Increase the preset mailbox
Find the `src/lib/localData/email.js` startup and timed email address input to automatically pull down the match.
+
```
export default ["test@analysys.com.cn","test1@analysys.com.cn","test3@analysys.com.cn"]
```
diff --git a/docs/docs/en/contribute/have-questions.md b/docs/docs/en/contribute/have-questions.md
index 8497d033cd09..72e49988f101 100644
--- a/docs/docs/en/contribute/have-questions.md
+++ b/docs/docs/en/contribute/have-questions.md
@@ -21,8 +21,9 @@ Some quick tips when using email:
- Tagging the subject line of your email will help you get a faster response, e.g. [api-server]: How to get open api interface?
- Tags may help identify a topic by:
+
- Component: MasterServer,ApiServer,WorkerServer,AlertServer, etc
- Level: Beginner, Intermediate, Advanced
- Scenario: Debug, How-to
-
- For error logs or long code examples, please use [GitHub gist](https://gist.github.com/) and include only a few lines of the pertinent code / log within the email.
+
diff --git a/docs/docs/en/contribute/join/DS-License.md b/docs/docs/en/contribute/join/DS-License.md
index c3f13d7bfbcd..f365c65d31d2 100644
--- a/docs/docs/en/contribute/join/DS-License.md
+++ b/docs/docs/en/contribute/join/DS-License.md
@@ -20,7 +20,6 @@ Moreover, when we intend to refer a new software ( not limited to 3rd party jar,
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
-
For example, we should contain the NOTICE file (every open-source project has NOTICE file, generally under root directory) of ZooKeeper in our project when we are using ZooKeeper. As the Apache explains, "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work.
We are not going to dive into every 3rd party open-source license policy, you may look up them if interested.
@@ -40,3 +39,4 @@ We need to follow the following steps when we need to add new jars or external r
* [COMMUNITY-LED DEVELOPMENT "THE APACHE WAY"](https://apache.org/dev/licensing-howto.html)
* [ASF 3RD PARTY LICENSE POLICY](https://apache.org/legal/resolved.html)
+
diff --git a/docs/docs/en/contribute/join/become-a-committer.md b/docs/docs/en/contribute/join/become-a-committer.md
index deac7d863b1a..f17b6b99a621 100644
--- a/docs/docs/en/contribute/join/become-a-committer.md
+++ b/docs/docs/en/contribute/join/become-a-committer.md
@@ -8,4 +8,4 @@ In Dolphinscheduler community, if a committer who have earned even more merit, c
One thing that is sometimes hard to understand when you are new to the open development process used at the ASF, is that we value the community more than the code. A strong and healthy community will be respectful and be a fun and rewarding place. More importantly, a diverse and healthy community can continue to support the code over the longer term, even as individual companies come and go from the field.
-More details could be found [here](https://community.apache.org/contributors/).
\ No newline at end of file
+More details could be found [here](https://community.apache.org/contributors/).
diff --git a/docs/docs/en/contribute/join/code-conduct.md b/docs/docs/en/contribute/join/code-conduct.md
index 5505e95852b8..4a6c20b89c18 100644
--- a/docs/docs/en/contribute/join/code-conduct.md
+++ b/docs/docs/en/contribute/join/code-conduct.md
@@ -3,66 +3,67 @@
The following Code of Conduct is based on full compliance with the [Apache Software Foundation Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).
## Development philosophy
- - **Consistent** code style, naming, and usage are consistent.
- - **Easy to read** code is obvious, easy to read and understand, when debugging one knows the intent of the code.
- - **Neat** agree with the concepts of《Refactoring》and《Code Cleanliness》and pursue clean and elegant code.
- - **Abstract** hierarchy is clear and the concepts are refined and reasonable. Keep methods, classes, packages, and modules at the same level of abstraction.
- - **Heart** Maintain a sense of responsibility and continue to be carved in the spirit of artisans.
-
+
+- **Consistent** code style, naming, and usage are consistent.
+- **Easy to read** code is obvious, easy to read and understand, when debugging one knows the intent of the code.
+- **Neat** agree with the concepts of《Refactoring》and《Code Cleanliness》and pursue clean and elegant code.
+- **Abstract** hierarchy is clear and the concepts are refined and reasonable. Keep methods, classes, packages, and modules at the same level of abstraction.
+- **Heart** Maintain a sense of responsibility and continue to be carved in the spirit of artisans.
+
## Development specifications
- - Executing `mvn -U clean package -Prelease` can compile and test through all test cases.
- - The test coverage tool checks for no less than dev branch coverage.
- - In the root directory, use Checkstyle to check your code for special reasons for violating validation rules. The template location is located at ds_check_style.xml.
- - Follow the coding specifications.
+- Executing `mvn -U clean package -Prelease` can compile and test through all test cases.
+- The test coverage tool checks for no less than dev branch coverage.
+- In the root directory, use Checkstyle to check your code for special reasons for violating validation rules. The template location is located at ds_check_style.xml.
+- Follow the coding specifications.
## Coding specifications
- - Use linux line breaks.
- - Indentation (including empty lines) is consistent with the last line.
- - An empty line is required between the class declaration and the following variable or method.
- - There should be no meaningless empty lines.
- - Classes, methods, and variables should be named as the name implies and abbreviations should be avoided.
- - Return value variables are named after `result`; `each` is used in loops to name loop variables; and `entry` is used in map instead of `each`.
- - The cached exception is called `e`; Catch the exception and do nothing, and the exception is named `ignored`.
- - Configuration Files are named in camelCase, and file names are lowercase with uppercase initial/starting letter.
- - Code that requires comment interpretation should be as small as possible and interpreted by method name.
- - `equals` and `==` In a conditional expression, the constant is left, the variable is on the right, and in the expression greater than less than condition, the variable is left and the constant is right.
- - In addition to the abstract classes used for inheritance, try to design the class as `final`.
- - Nested loops are as much a method as possible.
- - The order in which member variables are defined and the order in which parameters are passed is consistent across classes and methods.
- - Priority is given to the use of guard statements.
- - Classes and methods have minimal access control.
- - The private method used by the method should follow the method, and if there are multiple private methods, the writing private method should appear in the same order as the private method in the original method.
- - Method entry and return values are not allowed to be `null`.
- - The return and assignment statements of if else are preferred with the tri-objective operator.
- - Priority is given to `LinkedList` and only use `ArrayList` if you need to get element values in the collection through the index.
- - Collection types such as `ArrayList`,`HashMap` that may produce expansion must specify the initial size of the collection to avoid expansion.
- - Logs and notes are always in English.
- - Comments can only contain `javadoc`, `todo` and `fixme`.
- - Exposed classes and methods must have javadoc, other classes and methods and methods that override the parent class do not require javadoc.
+- Use linux line breaks.
+- Indentation (including empty lines) is consistent with the last line.
+- An empty line is required between the class declaration and the following variable or method.
+- There should be no meaningless empty lines.
+- Classes, methods, and variables should be named as the name implies and abbreviations should be avoided.
+- Return value variables are named after `result`; `each` is used in loops to name loop variables; and `entry` is used in map instead of `each`.
+- The cached exception is called `e`; Catch the exception and do nothing, and the exception is named `ignored`.
+- Configuration Files are named in camelCase, and file names are lowercase with uppercase initial/starting letter.
+- Code that requires comment interpretation should be as small as possible and interpreted by method name.
+- `equals` and `==` In a conditional expression, the constant is left, the variable is on the right, and in the expression greater than less than condition, the variable is left and the constant is right.
+- In addition to the abstract classes used for inheritance, try to design the class as `final`.
+- Nested loops are as much a method as possible.
+- The order in which member variables are defined and the order in which parameters are passed is consistent across classes and methods.
+- Priority is given to the use of guard statements.
+- Classes and methods have minimal access control.
+- The private method used by the method should follow the method, and if there are multiple private methods, the writing private method should appear in the same order as the private method in the original method.
+- Method entry and return values are not allowed to be `null`.
+- The return and assignment statements of if else are preferred with the tri-objective operator.
+- Priority is given to `LinkedList` and only use `ArrayList` if you need to get element values in the collection through the index.
+- Collection types such as `ArrayList`,`HashMap` that may produce expansion must specify the initial size of the collection to avoid expansion.
+- Logs and notes are always in English.
+- Comments can only contain `javadoc`, `todo` and `fixme`.
+- Exposed classes and methods must have javadoc, other classes and methods and methods that override the parent class do not require javadoc.
## Unit test specifications
- - Test code and production code are subject to the same code specifications.
- - Unit tests are subject to AIR (Automatic, Independent, Repeatable) Design concept.
- - Automatic: Unit tests should be fully automated, not interactive. Manual checking of output results is prohibited, `System.out`, `log`, etc. are not allowed, and must be verified with assertions.
- - Independent: It is prohibited to call each other between unit test cases and to rely on the order of execution. Each unit test can be run independently.
- - Repeatable: Unit tests cannot be affected by the external environment and can be repeated.
- - Unit tests are subject to BCDE(Border, Correct, Design, Error) Design principles.
- - Border (Boundary value test): The expected results are obtained by entering the boundaries of loop boundaries, special values, data order, etc.
- - Correct (Correctness test): The expected results are obtained with the correct input.
- - Design (Rationality Design): Design high-quality unit tests in combination with production code design.
- - Error (Fault tolerance test): The expected results are obtained through incorrect input such as illegal data, abnormal flow, etc.
- - If there is no special reason, the test needs to be fully covered.
- - Each test case needs to be accurately asserted.
- - Prepare the environment for code separation from the test code.
- - Only jUnit `Assert`,hamcrest `CoreMatchers`,Mockito Correlation can use static import.
- - Single-data assertions should use `assertTrue`,`assertFalse`,`assertNull` and `assertNotNull`.
- - Multi-data assertions should use `assertThat`.
- - Accurate assertion, try not to use `not`,`containsString` assertion.
- - The true value of the test case should be named actualXXX, and the expected value should be named expectedXXX.
- - Classes and Methods with `@Test` labels do not require javadoc.
+- Test code and production code are subject to the same code specifications.
+- Unit tests are subject to AIR (Automatic, Independent, Repeatable) Design concept.
+ - Automatic: Unit tests should be fully automated, not interactive. Manual checking of output results is prohibited, `System.out`, `log`, etc. are not allowed, and must be verified with assertions.
+ - Independent: It is prohibited to call each other between unit test cases and to rely on the order of execution. Each unit test can be run independently.
+ - Repeatable: Unit tests cannot be affected by the external environment and can be repeated.
+- Unit tests are subject to BCDE(Border, Correct, Design, Error) Design principles.
+ - Border (Boundary value test): The expected results are obtained by entering the boundaries of loop boundaries, special values, data order, etc.
+ - Correct (Correctness test): The expected results are obtained with the correct input.
+ - Design (Rationality Design): Design high-quality unit tests in combination with production code design.
+ - Error (Fault tolerance test): The expected results are obtained through incorrect input such as illegal data, abnormal flow, etc.
+- If there is no special reason, the test needs to be fully covered.
+- Each test case needs to be accurately asserted.
+- Prepare the environment for code separation from the test code.
+- Only jUnit `Assert`,hamcrest `CoreMatchers`,Mockito Correlation can use static import.
+- Single-data assertions should use `assertTrue`,`assertFalse`,`assertNull` and `assertNotNull`.
+- Multi-data assertions should use `assertThat`.
+- Accurate assertion, try not to use `not`,`containsString` assertion.
+- The true value of the test case should be named actualXXX, and the expected value should be named expectedXXX.
+- Classes and Methods with `@Test` labels do not require javadoc.
+- Public specifications.
+ - Each line is no longer than `200` in length, ensuring that each line is semantically complete for easy understanding.
- - Public specifications.
- - Each line is no longer than `200` in length, ensuring that each line is semantically complete for easy understanding.
diff --git a/docs/docs/en/contribute/join/contribute.md b/docs/docs/en/contribute/join/contribute.md
index ea8959604622..9a7cf2ff544b 100644
--- a/docs/docs/en/contribute/join/contribute.md
+++ b/docs/docs/en/contribute/join/contribute.md
@@ -13,8 +13,8 @@ We encourage any form of participation in the community that will eventually bec
* Help promote DolphinScheduler, participate in technical conferences or meetup, sharing and more.
Welcome to the contributing team and join open source starting with submitting your first PR.
- - For example, add code comments or find "easy to fix" tags or some very simple issue (misspellings, etc.) and so on, first familiarize yourself with the submission process through the first simple PR.
-
+- For example, add code comments or find "easy to fix" tags or some very simple issue (misspellings, etc.) and so on, first familiarize yourself with the submission process through the first simple PR.
+
Note: Contributions are not limited to PR Only, but contribute to the development of the project.
I'm sure you'll benefit from open source by participating in DolphinScheduler!
@@ -37,4 +37,4 @@ If you want to implement a Feature or fix a Bug. Please refer to the following:
* You should create a new branch to start your work, to get the name of the branch refer to the [Submit Guide-Pull Request Notice](./pull-request.md). For example, if you want to complete the feature and submit Issue 111, your branch name should be feature-111. The feature name can be determined after discussion with the instructor.
* When you're done, send a Pull Request to dolphinscheduler, please refer to the《[Submit Guide-Submit Pull Request Process](./submit-code.md)》
-If you want to submit a Pull Request to complete a Feature or fix a Bug, it is recommended that you start with the `good first issue`, `easy-to-fix` issues, complete a small function to submit, do not change too many files at a time, changing too many files will also put a lot of pressure on Reviewers, it is recommended to submit them through multiple Pull Requests, not all at once.
\ No newline at end of file
+If you want to submit a Pull Request to complete a Feature or fix a Bug, it is recommended that you start with the `good first issue`, `easy-to-fix` issues, complete a small function to submit, do not change too many files at a time, changing too many files will also put a lot of pressure on Reviewers, it is recommended to submit them through multiple Pull Requests, not all at once.
diff --git a/docs/docs/en/contribute/join/document.md b/docs/docs/en/contribute/join/document.md
index f2fd83140c63..16ed650b9f8f 100644
--- a/docs/docs/en/contribute/join/document.md
+++ b/docs/docs/en/contribute/join/document.md
@@ -2,7 +2,7 @@
Good documentation is critical for any type of software. Any contribution that can improve the DolphinScheduler documentation is welcome.
-### Get the document project
+### Get the document project
Documentation for the DolphinScheduler project is maintained in a separate [git repository](https://github.com/apache/dolphinscheduler-website).
@@ -52,8 +52,8 @@ Now you can run and build the website in your local environment.
2. Simply push the changed files, for example:
- * `*.md`
- * `blog.js or docs.js or site.js`
+* `*.md`
+* `blog.js or docs.js or site.js`
3. Submit the Pull Request to the **master** branch.
diff --git a/docs/docs/en/contribute/join/issue.md b/docs/docs/en/contribute/join/issue.md
index 376b06598096..b7a763ddf8f2 100644
--- a/docs/docs/en/contribute/join/issue.md
+++ b/docs/docs/en/contribute/join/issue.md
@@ -1,6 +1,7 @@
# Issue Notice
## Preface
+
Issues function is used to track various Features, Bugs, Functions, etc. The project maintainer can organize the tasks to be completed through issues.
Issue is an important step in drawing out a feature or bug,
@@ -129,8 +130,8 @@ The main purpose of this is to avoid wasting time caused by different opinions o
- How to deal with the user who raises an issue does not know the module corresponding to the issue.
- It is true that most users when raising issue do not know which module the issue belongs to.
- In fact, this is very common in many open source communities. In this case, the committer / contributor actually knows the module affected by the issue.
- If the issue is really valuable after being approved by committer and contributor, then the committer can modify the issue title according to the specific module involved in the issue,
- or leave a message to the user who raises the issue to modify it into the corresponding title.
+ It is true that most users when raising issue do not know which module the issue belongs to.
+ In fact, this is very common in many open source communities. In this case, the committer / contributor actually knows the module affected by the issue.
+ If the issue is really valuable after being approved by committer and contributor, then the committer can modify the issue title according to the specific module involved in the issue,
+ or leave a message to the user who raises the issue to modify it into the corresponding title.
diff --git a/docs/docs/en/contribute/join/pull-request.md b/docs/docs/en/contribute/join/pull-request.md
index 5127845e3a41..c3eccff663e8 100644
--- a/docs/docs/en/contribute/join/pull-request.md
+++ b/docs/docs/en/contribute/join/pull-request.md
@@ -1,6 +1,7 @@
# Pull Request Notice
## Preface
+
Pull Request is a way of software cooperation, which is a process of bringing code involving different functions into the trunk. During this process, the code can be discussed, reviewed, and modified.
In Pull Request, we try not to discuss the implementation of the code. The general implementation of the code and its logic should be determined in Issue. In the Pull Request, we only focus on the code format and code specification, so as to avoid wasting time caused by different opinions on implementation.
@@ -62,8 +63,8 @@ Please refer to the commit message section.
### Pull Request Code Style
-DolphinScheduler uses `Spotless` to automatically fix code style and formatting errors,
-see [Code Style](../development-environment-setup.md#code-style) for details.
+DolphinScheduler uses `Spotless` to automatically fix code style and formatting errors,
+see [Code Style](../development-environment-setup.md#code-style) for details.
### Question
@@ -74,4 +75,5 @@ see [Code Style](../development-environment-setup.md#code-style) for details.
Usually, there are two solutions to this scenario: the first is to merge multiple issues with into the same issue, and then close the other issues;
the second is multiple issues have subtle differences.
In this scenario, the responsibilities of each issue can be clearly divided. The type of each issue is marked as Sub-Task, and then these sub task type issues are associated with one issue.
- And each Pull Request is submitted should be associated with only one issue of a sub task.
\ No newline at end of file
+ And each Pull Request is submitted should be associated with only one issue of a sub task.
+
diff --git a/docs/docs/en/contribute/join/review.md b/docs/docs/en/contribute/join/review.md
index 40c8a23a7af6..cdfb01d653b6 100644
--- a/docs/docs/en/contribute/join/review.md
+++ b/docs/docs/en/contribute/join/review.md
@@ -10,7 +10,7 @@ from the community to review them. You could see detail in [mail][mail-review-wa
in [GitHub Discussion][discussion-result-review-wanted].
> Note: It is only users mentioned in the [GitHub Discussion][discussion-result-review-wanted] can review Issues or Pull
-> Requests, Community advocates **Anyone is encouraged to review Issues and Pull Requests**. Users in
+> Requests, Community advocates **Anyone is encouraged to review Issues and Pull Requests**. Users in
> [GitHub Discussion][discussion-result-review-wanted] show their willing to review when we collect in the mail thread.
> The advantage of this list is when the community has discussion, in addition to the mention Members in [team](/us-en/community/community.html),
> you can also find some help in [GitHub Discussion][discussion-result-review-wanted] people. If you want to join the
@@ -27,43 +27,43 @@ go to section [review Pull Requests](#pull-requests).
Review Issues means discuss [Issues][all-issues] in GitHub and give suggestions on it. Include but are not limited to the following situations
-| Situation | Reason | Label | Action |
-| ------ | ------ | ------ | ------ |
-| wont fix | Has been fixed in dev branch | [wontfix][label-wontfix] | Close Issue, inform creator the fixed version if it already release |
-| duplicate issue | Had the same problem before | [duplicate][label-duplicate] | Close issue, inform creator the link of same issue |
-| Description not clearly | Without detail reproduce step | [need more information][label-need-more-information] | Inform creator add more description |
+| Situation | Reason | Label | Action |
+|-------------------------|-------------------------------|------------------------------------------------------|---------------------------------------------------------------------|
+| wont fix | Has been fixed in dev branch | [wontfix][label-wontfix] | Close Issue, inform creator the fixed version if it already release |
+| duplicate issue | Had the same problem before | [duplicate][label-duplicate] | Close issue, inform creator the link of same issue |
+| Description not clearly | Without detail reproduce step | [need more information][label-need-more-information] | Inform creator add more description |
In addition give suggestion, add label for issue is also important during review. The labeled issues can be retrieved
better, which convenient for further processing. An issue can with more than one label. Common issue categories are:
-| Label | Meaning |
-| ------ | ------ |
-| [UI][label-UI] | UI and front-end related |
-| [security][label-security] | Security Issue |
-| [user experience][label-user-experience] | User experience Issue |
-| [development][label-development] | Development Issue |
-| [Python][label-Python] | Python Issue |
-| [plug-in][label-plug-in] | Plug-in Issue |
-| [document][label-document] | Document Issue |
-| [docker][label-docker] | Docker Issue |
-| [need verify][label-need-verify] | Need verify Issue |
-| [e2e][label-e2e] | E2E Issue |
-| [win-os][label-win-os] | windows operating system Issue |
-| [suggestion][label-suggestion] | Give suggestion to us |
-
+| Label | Meaning |
+|------------------------------------------|--------------------------------|
+| [UI][label-UI] | UI and front-end related |
+| [security][label-security] | Security Issue |
+| [user experience][label-user-experience] | User experience Issue |
+| [development][label-development] | Development Issue |
+| [Python][label-Python] | Python Issue |
+| [plug-in][label-plug-in] | Plug-in Issue |
+| [document][label-document] | Document Issue |
+| [docker][label-docker] | Docker Issue |
+| [need verify][label-need-verify] | Need verify Issue |
+| [e2e][label-e2e] | E2E Issue |
+| [win-os][label-win-os] | windows operating system Issue |
+| [suggestion][label-suggestion] | Give suggestion to us |
+
Beside classification, label could also set the priority of Issues. The higher the priority, the more attention pay
in the community, the easier it is to be fixed or implemented. The priority label are as follows
-| Label | priority |
-| ------ | ------ |
-| [priority:high][label-priority-high] | High priority |
+| Label | priority |
+|------------------------------------------|-----------------|
+| [priority:high][label-priority-high] | High priority |
| [priority:middle][label-priority-middle] | Middle priority |
-| [priority:low][label-priority-low] | Low priority |
+| [priority:low][label-priority-low] | Low priority |
All the labels above in common label. For all labels in this project you could see in [full label list][label-all-list]
Before reading following content, please make sure you have labeled the Issue.
-
+
* Remove label [Waiting for reply][label-waiting-for-reply] after replying: Label [Waiting for reply][label-waiting-for-reply]
added when [creating an Issue][issue-choose]. It makes positioning un reply issue more convenient, and you should remove
this label after you reviewed it. If you do not remove it, will cause others to waste time looking on the same issue.
@@ -74,12 +74,12 @@ Before reading following content, please make sure you have labeled the Issue.
When an Issue need to create Pull Requests, you could also labeled it from below.
-| Label | Mean |
-| ------ | ------ |
-| [Chore][label-Chore] | Chore for project |
-| [Good first issue][label-good-first-issue] | Good first issue for new contributor |
-| [easy to fix][label-easy-to-fix] | Easy to fix, harder than `Good first issue` |
-| [help wanted][label-help-wanted] | Help wanted |
+| Label | Mean |
+|--------------------------------------------|---------------------------------------------|
+| [Chore][label-Chore] | Chore for project |
+| [Good first issue][label-good-first-issue] | Good first issue for new contributor |
+| [easy to fix][label-easy-to-fix] | Easy to fix, harder than `Good first issue` |
+| [help wanted][label-help-wanted] | Help wanted |
> Note: Only members have permission to add or delete label. When you need to add or remove lebals but are not member,
> you can `@` members to do that. But as long as you have a GitHub account, you can comment on issues and give suggestions.
@@ -90,14 +90,14 @@ When an Issue need to create Pull Requests, you could also labeled it from below
Review Pull mean discussing in [Pull Requests][all-PRs] in GitHub and giving suggestions to it. DolphinScheduler's
Pull Requests reviewing are the same as [GitHub's reviewing changes in pull requests][gh-review-pr]. You can give your
-suggestions in Pull Requests
-
+suggestions in Pull Reque-->
* When you think the Pull Request is OK to be merged, you can agree to the Pull Request according to the "Approve" process
in [GitHub's reviewing changes in pull requests][gh-review-pr].
-* When you think Pull Request needs to be changed, you can comment it according to the "Comment" process in
+* When you think Pull Request needs to be changed, you can comment it according to the "Comment" process in
[GitHub's reviewing changes in pull requests][gh-review-pr]. And when you think issues that must be fixed before they
merged, please follow "Request changes" in [GitHub's reviewing changes in pull requests][gh-review-pr] to ask contributors
modify it.
+
Labeled Pull Requests is an important part. Reasonable classification can save a lot of time for reviewers. The good news
@@ -107,11 +107,11 @@ and [priority:high][label-priority-high].
Pull Requests have some unique labels of it own
-| Label | Mean |
-| ------ | ------ |
-| [miss document][label-miss-document] | Pull Requests miss document, and should be add |
-| [first time contributor][label-first-time-contributor] | Pull Requests submit by first time contributor |
-| [don't merge][label-do-not-merge] | Pull Requests have some problem and should not be merged |
+| Label | Mean |
+|--------------------------------------------------------|----------------------------------------------------------|
+| [miss document][label-miss-document] | Pull Requests miss document, and should be add |
+| [first time contributor][label-first-time-contributor] | Pull Requests submit by first time contributor |
+| [don't merge][label-do-not-merge] | Pull Requests have some problem and should not be merged |
> Note: Only members have permission to add or delete label. When you need to add or remove lebals but are not member,
> you can `@` members to do that. But as long as you have a GitHub account, you can comment on Pull Requests and give suggestions.
@@ -151,3 +151,4 @@ Pull Requests have some unique labels of it own
[all-issues]: https://github.com/apache/dolphinscheduler/issues
[all-PRs]: https://github.com/apache/dolphinscheduler/pulls
[gh-review-pr]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/about-pull-request-reviews
+
diff --git a/docs/docs/en/contribute/join/submit-code.md b/docs/docs/en/contribute/join/submit-code.md
index ac8795032047..b1e249d338ba 100644
--- a/docs/docs/en/contribute/join/submit-code.md
+++ b/docs/docs/en/contribute/join/submit-code.md
@@ -3,26 +3,23 @@
* First from the remote repository *https://github.com/apache/dolphinscheduler.git* fork a copy of the code into your own repository
* There are currently three branches in the remote repository:
- * master normal delivery branch
- After the stable release, merge the code from the stable branch into the master.
-
- * dev daily development branch
- Every day dev development branch, newly submitted code can pull request to this branch.
-
+ * master normal delivery branch
+ After the stable release, merge the code from the stable branch into the master.
+
+ * dev daily development branch
+ Every day dev development branch, newly submitted code can pull request to this branch.
* Clone your repository to your local
- `git clone https://github.com/apache/dolphinscheduler.git`
-
+ `git clone https://github.com/apache/dolphinscheduler.git`
* Add remote repository address, named upstream
- `git remote add upstream https://github.com/apache/dolphinscheduler.git`
-
+ `git remote add upstream https://github.com/apache/dolphinscheduler.git`
* View repository
- `git remote -v`
+ `git remote -v`
->At this time, there will be two repositories: origin (your own repository) and upstream (remote repository)
+> At this time, there will be two repositories: origin (your own repository) and upstream (remote repository)
* Get/Update remote repository code
- `git fetch upstream`
+ `git fetch upstream`
* Synchronize remote repository code to local repository
@@ -32,22 +29,23 @@ git merge --no-ff upstream/dev
```
If remote branch has a new branch such as `dev-1.0`, you need to synchronize this branch to the local repository
-
+
```
git checkout -b dev-1.0 upstream/dev-1.0
git push --set-upstream origin dev-1.0
```
* Create new branch
+
```
git checkout -b xxx origin/dev
```
Make sure that the branch `xxx` is building successfully on the latest code of the official dev branch
* After modifying the code locally in the new branch, submit it to your own repository:
-
+
`git commit -m 'commit content'`
-
+
`git push origin xxx --set-upstream`
* Submit changes to the remote repository
@@ -60,4 +58,3 @@ Make sure that the branch `xxx` is building successfully on the latest code of t
* Finally, congratulations, you have become an official contributor to dolphinscheduler!
-
diff --git a/docs/docs/en/contribute/join/subscribe.md b/docs/docs/en/contribute/join/subscribe.md
index f6e8a749348c..31d670fceb65 100644
--- a/docs/docs/en/contribute/join/subscribe.md
+++ b/docs/docs/en/contribute/join/subscribe.md
@@ -21,3 +21,4 @@ Unsubscribe from the mailing list steps are as follows:
2. Receive confirmation email and reply. After completing step 1, you will receive a confirmation email from dev-help@dolphinscheduler.apache.org (if not received, please confirm whether the email is automatically classified as spam, promotion email, subscription email, etc.) . Then reply directly to the email, or click on the link in the email to reply quickly, the subject and content are arbitrary.
3. Receive a goodbye email. After completing the above steps, you will receive a goodbye email with the subject GOODBYE from dev@dolphinscheduler.apache.org, and you have successfully unsubscribed to the Apache DolphinScheduler mailing list, and you will not receive emails from dev@dolphinscheduler.apache.org.
+
diff --git a/docs/docs/en/contribute/join/unit-test.md b/docs/docs/en/contribute/join/unit-test.md
index 796cf59e891c..932a0bf64a1c 100644
--- a/docs/docs/en/contribute/join/unit-test.md
+++ b/docs/docs/en/contribute/join/unit-test.md
@@ -2,25 +2,27 @@
### 1. The Benefits of Writing Unit Tests
-- Unit tests help everyone to get into the details of the code and understand how it works.
-- Through test cases we can find bugs and submit robust code.
-- The test case is also a demo usage of the code.
+- Unit tests help everyone to get into the details of the code and understand how it works.
+- Through test cases we can find bugs and submit robust code.
+- The test case is also a demo usage of the code.
### 2. Some design principles for unit test cases
-- The steps, granularity and combination of conditions should be carefully designed.
-- Pay attention to boundary conditions.
-- Unit tests should be well designed as well as avoiding useless code.
-- When you find a `method` is difficult to write unit test, and if you confirm that the `method` is `bad code`, then refactor it with the developer.
+- The steps, granularity and combination of conditions should be carefully designed.
+- Pay attention to boundary conditions.
+- Unit tests should be well designed as well as avoiding useless code.
+- When you find a `method` is difficult to write unit test, and if you confirm that the `method` is `bad code`, then refactor it with the developer.
+
-- DolphinScheduler: [mockito](http://site.mockito.org/). Here are some development guides: [mockito tutorial](http://www.baeldung.com/bdd-mockito), [mockito refcard](https://dzone.com/refcardz/mockito)
+- DolphinScheduler: [mockito](http://site.mockito.org/). Here are some development guides: [mockito tutorial](http://www.baeldung.com/bdd-mockito), [mockito refcard](https://dzone.com/refcardz/mockito)
+
-- TDD(option): When you start writing a new feature, you can try writing test cases first.
+- TDD(option): When you start writing a new feature, you can try writing test cases first.
### 3. Test coverage setpoint
-- At this stage, the default value for test coverage of Delta change codes is >= 60%, the higher the better.
-- We can see the test reports on this page: https://codecov.io/gh/apache/dolphinscheduler
+- At this stage, the default value for test coverage of Delta change codes is >= 60%, the higher the better.
+- We can see the test reports on this page: https://codecov.io/gh/apache/dolphinscheduler
## Fundamental guidelines for unit test
@@ -64,13 +66,13 @@ Invalid assertions make the test itself meaningless, it has little to do with wh
There are several types of invalid assertions:
-1. Different types of comparisons.
+1. Different types of comparisons.
-2. Determines that an object or variable with a default value is not null.
+2. Determines that an object or variable with a default value is not null.
- This seems meaningless. Therefore, when making the relevant judgements you should pay attention to whether it contains a default value itself.
+ This seems meaningless. Therefore, when making the relevant judgements you should pay attention to whether it contains a default value itself.
-3. Assertions should be affirmative rather than negative if possible. Assertions should be within a range of predicted results, or exact values, whenever possible (otherwise you may end up with something that doesn't match your actual expectations but passes the assertion) unless your code only cares about whether it is empty or not.
+3. Assertions should be affirmative rather than negative if possible. Assertions should be within a range of predicted results, or exact values, whenever possible (otherwise you may end up with something that doesn't match your actual expectations but passes the assertion) unless your code only cares about whether it is empty or not.
### 8. Some points to note for unit tests
@@ -90,17 +92,18 @@ For example @Ignore("see #1").
The test will fail when the code in the unit test throws an exception. Therefore, there is no need to use try-catch to catch exceptions.
- ```java
- @Test
- public void testMethod() {
- try {
- // Some code
- } catch (MyException e) {
- Assert.fail(e.getMessage()); // Noncompliant
- }
- }
- ```
-You should this:
+ ```java
+ @Test
+ public void testMethod() {
+ try {
+ // Some code
+ } catch (MyException e) {
+ Assert.fail(e.getMessage()); // Noncompliant
+ }
+ }
+ ```
+
+You should this:
```java
@Test
diff --git a/docs/docs/en/contribute/log-specification.md b/docs/docs/en/contribute/log-specification.md
index 69692495e1c8..9746b05ca474 100644
--- a/docs/docs/en/contribute/log-specification.md
+++ b/docs/docs/en/contribute/log-specification.md
@@ -35,7 +35,7 @@ The content of the logs determines whether the logs can completely restore the s
### Log format specification
-The logs of Master module and Worker module are printed using the following format.
+The logs of Master module and Worker module are printed using the following format.
```xml
[%level] %date{yyyy-MM-dd HH:mm:ss.SSS Z} %logger{96}:[%line] - [WorkflowInstance-%X{workflowInstanceId:-0}][TaskInstance-%X{taskInstanceId:-0}] - %msg%n
@@ -49,4 +49,5 @@ That is, the workflow instance ID and task instance ID are injected in the print
- The use of printStackTrace() is prohibited for exception handling. This method prints the exception stack information to the standard error output.
- Branch printing of logs is prohibited. The contents of the logs need to be associated with the relevant information in the log format, and printing them in separate lines will cause the contents of the logs to not match the time and other information, and cause the logs to be mixed in a large number of log environments, which will make log retrieval more difficult.
- The use of the "+" operator for splicing log content is prohibited. Use placeholders for formatting logs for printing to improve memory usage efficiency.
-- When the log content includes object instances, you need to make sure to override the toString() method to prevent printing meaningless hashcode.
\ No newline at end of file
+- When the log content includes object instances, you need to make sure to override the toString() method to prevent printing meaningless hashcode.
+
diff --git a/docs/docs/en/contribute/release/release-prepare.md b/docs/docs/en/contribute/release/release-prepare.md
index fe51973f1aba..74fc22672f5a 100644
--- a/docs/docs/en/contribute/release/release-prepare.md
+++ b/docs/docs/en/contribute/release/release-prepare.md
@@ -4,9 +4,9 @@
Compared with the last release, the `release-docs` of the current release needs to be updated to the latest, if there are dependencies and versions changes
- - `dolphinscheduler-dist/release-docs/LICENSE`
- - `dolphinscheduler-dist/release-docs/NOTICE`
- - `dolphinscheduler-dist/release-docs/licenses`
+- `dolphinscheduler-dist/release-docs/LICENSE`
+- `dolphinscheduler-dist/release-docs/NOTICE`
+- `dolphinscheduler-dist/release-docs/licenses`
## Update Version
@@ -29,3 +29,4 @@ For example, to release `x.y.z`, the following updates are required:
- Add new history version
- `docs/docs/en/history-versions.md` and `docs/docs/zh/history-versions.md`: Add the new version and link for `x.y.z`
- `docs/configs/docsdev.js`: change `/dev/` to `/x.y.z/`
+
diff --git a/docs/docs/en/contribute/release/release.md b/docs/docs/en/contribute/release/release.md
index 8451e7a4f923..2b21c342cf6f 100644
--- a/docs/docs/en/contribute/release/release.md
+++ b/docs/docs/en/contribute/release/release.md
@@ -210,7 +210,7 @@ git push origin --tags
> Note1: In this step, you should use github token for password because native password no longer supported, you can see
> https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token for more
> detail about how to create token about it.
-
+>
> Note2: After the command done, it will auto-created `release.properties` file and `*.Backup` files, their will be need
> in the following command and DO NOT DELETE THEM
@@ -293,6 +293,7 @@ cd ~/ds_svn/dev/dolphinscheduler
svn add *
svn --username="${A_USERNAME}" commit -m "release ${VERSION}"
```
+
## Check Release
### Check sha512 hash
@@ -353,14 +354,14 @@ cd ../
Decompress `apache-dolphinscheduler--src.tar.gz` and `python/apache-dolphinscheduler-python-.tar.gz` then check the following items:
-* Check whether source tarball is oversized for including nonessential files
-* `LICENSE` and `NOTICE` files exist
-* Correct year in `NOTICE` file
-* There is only text files but no binary files
-* All source files have ASF headers
-* Codes can be compiled and pass the unit tests (mvn install)
-* The contents of the release match with what's tagged in version control (diff -r a verify_dir tag_dir)
-* Check if there is any extra files or folders, empty folders for example
+* Check whether source tarball is oversized for including nonessential files
+* `LICENSE` and `NOTICE` files exist
+* Correct year in `NOTICE` file
+* There is only text files but no binary files
+* All source files have ASF headers
+* Codes can be compiled and pass the unit tests (mvn install)
+* The contents of the release match with what's tagged in version control (diff -r a verify_dir tag_dir)
+* Check if there is any extra files or folders, empty folders for example
#### Check binary packages
@@ -387,8 +388,8 @@ maybe not correct, you should filter them by yourself) and classify them and pas
### Vote procedure
1. DolphinScheduler community vote: send the vote e-mail to `dev@dolphinscheduler.apache.org`.
-PMC needs to check the rightness of the version according to the document before they vote.
-After at least 72 hours and with at least 3 `+1 and no -1 PMC member` votes, it can come to the next stage of the vote.
+ PMC needs to check the rightness of the version according to the document before they vote.
+ After at least 72 hours and with at least 3 `+1 and no -1 PMC member` votes, it can come to the next stage of the vote.
2. Announce the vote result: send the result vote e-mail to `dev@dolphinscheduler.apache.org`。
@@ -538,3 +539,4 @@ DolphinScheduler Resources:
- Mailing list: dev@dolphinscheduler.apache.org
- Documents: https://dolphinscheduler.apache.org/zh-cn/docs//user_doc/about/introduction.html
```
+
diff --git a/docs/docs/en/guide/alert/alert_plugin_user_guide.md b/docs/docs/en/guide/alert/alert_plugin_user_guide.md
index d8ceebefea78..330a7e3f7f9e 100644
--- a/docs/docs/en/guide/alert/alert_plugin_user_guide.md
+++ b/docs/docs/en/guide/alert/alert_plugin_user_guide.md
@@ -9,7 +9,7 @@ The alarm module supports the following scenarios:
Steps to be used are as follows:
-- Go to `Security -> Alarm Group Management -> Alarm Instance Management -> Alarm Instance`.
+- Go to `Security -> Alarm Group Management -> Alarm Instance Management -> Alarm Instance`.
- Select the corresponding alarm plug-in and fill in the relevant alarm parameters.
- Select `Alarm Group Management`, create an alarm group, and choose the corresponding alarm instance.
@@ -19,4 +19,4 @@ Steps to be used are as follows:
![alert-instance03](../../../../img/new_ui/dev/alert/alert_instance03.png)
-![alert-instance04](../../../../img/new_ui/dev/alert/alert_instance04.png)
\ No newline at end of file
+![alert-instance04](../../../../img/new_ui/dev/alert/alert_instance04.png)
diff --git a/docs/docs/en/guide/alert/dingtalk.md b/docs/docs/en/guide/alert/dingtalk.md
index f0b9196386c0..422811bd8811 100644
--- a/docs/docs/en/guide/alert/dingtalk.md
+++ b/docs/docs/en/guide/alert/dingtalk.md
@@ -8,20 +8,21 @@ The following shows the `DingTalk` configuration example:
## Parameter Configuration
-| **Parameter** | **Description** |
-| --- | --- |
-| Warning Type | Alert on success or failure or both. |
-| WebHook | The format is: [https://oapi.dingtalk.com/robot/send?access\_token=XXXXXX](https://oapi.dingtalk.com/robot/send?access_token=XXXXXX) |
-| Keyword | Custom keywords for security settings. |
-| Secret | Signature of security settings |
-| Msg Type | Message parse type (support txt, markdown, markdownV2, html). |
+| **Parameter** | **Description** |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Warning Type | Alert on success or failure or both. |
+| WebHook | The format is: [https://oapi.dingtalk.com/robot/send?access\_token=XXXXXX](https://oapi.dingtalk.com/robot/send?access_token=XXXXXX) |
+| Keyword | Custom keywords for security settings. |
+| Secret | Signature of security settings |
+| Msg Type | Message parse type (support txt, markdown, markdownV2, html). |
| At User Mobile | When a custom bot sends a message, you can specify the "@person list" by their mobile phone number. When the selected people in the "@people list" receive the message, there will be a `@` message reminder. `No disturb` mode always receives reminders, and "someone @ you" appears in the message. The "At User Mobile" represents mobile phone number of the "@person" |
-| At User Ids | The user ID by "@person" |
-| Proxy | The proxy address of the proxy server. |
-| Port | The proxy port of Proxy-Server. |
-| User | Authentication(Username) for the proxy server. |
-| Password | Authentication(Password) for the proxy server. |
+| At User Ids | The user ID by "@person" |
+| Proxy | The proxy address of the proxy server. |
+| Port | The proxy port of Proxy-Server. |
+| User | Authentication(Username) for the proxy server. |
+| Password | Authentication(Password) for the proxy server. |
## Reference
-- [DingTalk Custom Robot Access Development Documentation](https://open.dingtalk.com/document/robots/custom-robot-access)
\ No newline at end of file
+- [DingTalk Custom Robot Access Development Documentation](https://open.dingtalk.com/document/robots/custom-robot-access)
+
diff --git a/docs/docs/en/guide/alert/email.md b/docs/docs/en/guide/alert/email.md
index d87b2964f1c7..09761733b592 100644
--- a/docs/docs/en/guide/alert/email.md
+++ b/docs/docs/en/guide/alert/email.md
@@ -1,4 +1,5 @@
-# Email
+# Email
+
If you need to use `Email` for alerting, create an alert instance in the alert instance management and select the Email plugin.
The following shows the `Email` configuration example:
@@ -7,4 +8,4 @@ The following shows the `Email` configuration example:
![alert-email](../../../../img/alert/email-alter-setup2-en.png)
-![alert-email](../../../../img/alert/email-alter-setup3-en.png)
\ No newline at end of file
+![alert-email](../../../../img/alert/email-alter-setup3-en.png)
diff --git a/docs/docs/en/guide/alert/enterprise-webexteams.md b/docs/docs/en/guide/alert/enterprise-webexteams.md
index 2427b1c62600..cc7070ce0d1f 100644
--- a/docs/docs/en/guide/alert/enterprise-webexteams.md
+++ b/docs/docs/en/guide/alert/enterprise-webexteams.md
@@ -7,14 +7,14 @@ The following is the `WebexTeams` configuration example:
## Parameter Configuration
-| **Parameter** | **Description** |
-| --- | --- |
-| botAccessToken | The access token of robot. |
-| roomID | The ID of the room that receives message (only support one room ID). |
-| toPersonId | The person ID of the recipient when sending a private 1:1 message. |
-| toPersonEmail | The email address of the recipient when sending a private 1:1 message. |
+| **Parameter** | **Description** |
+|-----------------|-------------------------------------------------------------------------------------------------------------------------|
+| botAccessToken | The access token of robot. |
+| roomID | The ID of the room that receives message (only support one room ID). |
+| toPersonId | The person ID of the recipient when sending a private 1:1 message. |
+| toPersonEmail | The email address of the recipient when sending a private 1:1 message. |
| atSomeoneInRoom | If the message destination is room, the emails of the person being @, use `,` (eng commas) to separate multiple emails. |
-| destination |The destination of the message (one message only support one destination). |
+| destination | The destination of the message (one message only support one destination). |
## Create Bot
@@ -58,4 +58,5 @@ The `Room ID` we can acquire it from the `id` of creating a new group chat room
## References:
- [WebexTeams Application Bot Guide](https://developer.webex.com/docs/bots)
-- [WebexTeams Message Guide](https://developer.webex.com/docs/api/v1/messages/create-a-message)
\ No newline at end of file
+- [WebexTeams Message Guide](https://developer.webex.com/docs/api/v1/messages/create-a-message)
+
diff --git a/docs/docs/en/guide/alert/enterprise-wechat.md b/docs/docs/en/guide/alert/enterprise-wechat.md
index a4fdb84bd552..baa2eb092083 100644
--- a/docs/docs/en/guide/alert/enterprise-wechat.md
+++ b/docs/docs/en/guide/alert/enterprise-wechat.md
@@ -40,7 +40,6 @@ The following is the `query userId` API example:
APP: https://work.weixin.qq.com/api/doc/90000/90135/90236
-
### Group Chat
The Group Chat send type means to notify the alert results via group chat created by Enterprise WeChat API, sending messages to all members of the group and specified users are not supported.
@@ -68,4 +67,5 @@ The following is the `create new group chat` API and `query userId` API example:
## Reference
-- Group Chat:https://work.weixin.qq.com/api/doc/90000/90135/90248
\ No newline at end of file
+- Group Chat:https://work.weixin.qq.com/api/doc/90000/90135/90248
+
diff --git a/docs/docs/en/guide/alert/feishu.md b/docs/docs/en/guide/alert/feishu.md
index bb0e94675cb9..93a4e6ac293f 100644
--- a/docs/docs/en/guide/alert/feishu.md
+++ b/docs/docs/en/guide/alert/feishu.md
@@ -10,6 +10,7 @@ The following shows the `Feishu` configuration example:
## Parameter Configuration
* Webhook
+
> Copy the robot webhook URL shown below:
![alert-feishu-webhook](../../../../img/new_ui/dev/alert/alert_feishu_webhook.png)
diff --git a/docs/docs/en/guide/alert/http.md b/docs/docs/en/guide/alert/http.md
index 3725b516f73a..ba5c00921124 100644
--- a/docs/docs/en/guide/alert/http.md
+++ b/docs/docs/en/guide/alert/http.md
@@ -4,13 +4,13 @@ If you need to use `Http script` for alerting, create an alert instance in the a
## Parameter Configuration
-| **Parameter** | **Description** |
-| --- | --- |
-| URL | The `Http` request URL needs to contain protocol, host, path and parameters if the method is `GET`. |
-| Request Type | Select the request type from `POST` or `GET`. |
-| Headers | The headers of the `Http` request in JSON format. |
-| Body | The request body of the `Http` request in JSON format, when using `POST` method to alert. |
-| Content Field | The field name to place the alert information. |
+| **Parameter** | **Description** |
+|---------------|-----------------------------------------------------------------------------------------------------|
+| URL | The `Http` request URL needs to contain protocol, host, path and parameters if the method is `GET`. |
+| Request Type | Select the request type from `POST` or `GET`. |
+| Headers | The headers of the `Http` request in JSON format. |
+| Body | The request body of the `Http` request in JSON format, when using `POST` method to alert. |
+| Content Field | The field name to place the alert information. |
## Send Type
@@ -28,4 +28,4 @@ The following shows the `GET` configuration example:
Send alert information inside `Http` body by `Http` POST method.
The following shows the `POST` configuration example:
-![enterprise-wechat-app-msg-config](../../../../img/alert/http-post-example.png)
\ No newline at end of file
+![enterprise-wechat-app-msg-config](../../../../img/alert/http-post-example.png)
diff --git a/docs/docs/en/guide/alert/script.md b/docs/docs/en/guide/alert/script.md
index b87b4f5f82d3..0f0e3a300b39 100644
--- a/docs/docs/en/guide/alert/script.md
+++ b/docs/docs/en/guide/alert/script.md
@@ -1,18 +1,18 @@
# Script
-If you need to use `Shell script` for alerting, create an alert instance in the alert instance management and select the `Script` plugin.
+If you need to use `Shell script` for alerting, create an alert instance in the alert instance management and select the `Script` plugin.
The following shows the `Script` configuration example:
![dingtalk-plugin](../../../../img/alert/script-plugin.png)
## Parameter Configuration
-| **Parameter** | **Description** |
-| --- | --- |
-| User Params | User defined parameters will pass to the script. |
-| Script Path |The file location path in the server. |
-| Type | Support `Shell` script. |
+| **Parameter** | **Description** |
+|---------------|--------------------------------------------------|
+| User Params | User defined parameters will pass to the script. |
+| Script Path | The file location path in the server. |
+| Type | Support `Shell` script. |
### Note
-Consider the script file access privileges with the executing tenant.
\ No newline at end of file
+Consider the script file access privileges with the executing tenant.
diff --git a/docs/docs/en/guide/alert/telegram.md b/docs/docs/en/guide/alert/telegram.md
index cdfb026ed4ab..5138b6772dfe 100644
--- a/docs/docs/en/guide/alert/telegram.md
+++ b/docs/docs/en/guide/alert/telegram.md
@@ -7,17 +7,17 @@ The following shows the `Telegram` configuration example:
## Parameter Configuration
-| **Parameter** | **Description** |
-| --- | --- |
-| WebHook | The WebHook of Telegram when use robot to send message. |
-| botToken | The access token of robot. |
-| chatId | Sub Telegram Channel. |
-| parseMode | Message parse type (support txt, markdown, markdownV2, html). |
-| EnableProxy | Enable proxy sever. |
-| Proxy | The proxy address of the proxy server. |
-| Port | The proxy port of proxy server. |
-| User | Authentication(Username) for the proxy server. |
-| Password | Authentication(Password) for the proxy server. |
+| **Parameter** | **Description** |
+|---------------|---------------------------------------------------------------|
+| WebHook | The WebHook of Telegram when use robot to send message. |
+| botToken | The access token of robot. |
+| chatId | Sub Telegram Channel. |
+| parseMode | Message parse type (support txt, markdown, markdownV2, html). |
+| EnableProxy | Enable proxy sever. |
+| Proxy | The proxy address of the proxy server. |
+| Port | The proxy port of proxy server. |
+| User | Authentication(Username) for the proxy server. |
+| Password | Authentication(Password) for the proxy server. |
### NOTE
@@ -34,4 +34,5 @@ The webhook needs to be able to receive and use the same JSON body of HTTP POST
- [Telegram Application Bot Guide](https://core.telegram.org/bots)
- [Telegram Bots Api](https://core.telegram.org/bots/api)
-- [Telegram SendMessage Api](https://core.telegram.org/bots/api#sendmessage)
\ No newline at end of file
+- [Telegram SendMessage Api](https://core.telegram.org/bots/api#sendmessage)
+
diff --git a/docs/docs/en/guide/data-quality.md b/docs/docs/en/guide/data-quality.md
index ec308df3bb8f..ef0d5e65b46f 100644
--- a/docs/docs/en/guide/data-quality.md
+++ b/docs/docs/en/guide/data-quality.md
@@ -1,4 +1,5 @@
# Data Quality
+
## Introduction
The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
@@ -7,9 +8,9 @@ The execution logic of the data quality task is as follows:
- The user defines the task in the interface, and the user input value is stored in `TaskParam`.
- When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
-- Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine.
+- Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine.
- The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
-`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user.
+ `Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user.
- If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
- Add config : `/conf/common.properties`
@@ -27,14 +28,14 @@ data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
## Detailed Inspection Logic
-| **Parameter** | **Description** |
-| ----- | ---- |
-| CheckMethod | [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. |
-| CheckFormula |
Expected-Actual
Actual-Expected
(Actual/Expected)x100%
(Expected-Actual)/Expected x100%
|
-| Operator | =, >, >=, <, <=, != |
+| **Parameter** | **Description** |
+|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| CheckMethod | [CheckFormula][Operator][Threshold], if the result is true, it indicates that the data does not meet expectations, and the failure strategy is executed. |
+| CheckFormula |
|
+
In the example, assuming that the actual value is 10, the operator is >, and the expected value is 9, then the result 10 -9 > 0 is true, which means that the row data in the empty column has exceeded the threshold, and the task is judged to fail.
# Task Operation Guide
@@ -50,7 +51,6 @@ The goal of the null value check is to check the number of empty rows in the spe
```sql
SELECT COUNT(*) AS miss FROM ${src_table} WHERE (${src_field} is null or ${src_field} = '') AND (${src_filter})
```
-
- The SQL to calculate the total number of rows in the table is as follows:
```sql
@@ -61,155 +61,163 @@ The goal of the null value check is to check the number of empty rows in the spe
![dataquality_null_check](../../../img/tasks/demo/null_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the validation data is located. |
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Src table check column | Drop-down to select the check column name. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the validation data is located. |
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Src table check column | Drop-down to select the check column name. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Timeliness Check of Single Table Check
+
### Inspection Introduction
+
The timeliness check is used to check whether the data is processed within the expected time. The start time and end time can be specified to define the time range. If the amount of data within the time range does not reach the set threshold, the check task will be judged as fail.
### Interface Operation Guide
![dataquality_timeliness_check](../../../img/tasks/demo/timeliness_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc.
-| Source data source | The corresponding data source under the source data type.
-| Source data table | Drop-down to select the table where the validation data is located. |
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Src table check column | Drop-down to select check column name. |
-| Start time | The start time of a time range. |
-| end time | The end time of a time range. |
-| Time Format | Set the corresponding time format. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the validation data is located. |
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Src table check column | Drop-down to select check column name. |
+| Start time | The start time of a time range. |
+| end time | The end time of a time range. |
+| Time Format | Set the corresponding time format. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Field Length Check for Single Table Check
### Inspection Introduction
+
The goal of field length verification is to check whether the length of the selected field meets the expectations. If there is data that does not meet the requirements, and the number of rows exceeds the threshold, the task will be judged to fail.
### Interface Operation Guide
![dataquality_length_check](../../../img/tasks/demo/field_length_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the validation data is located. |
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Src table check column | Drop-down to select the check column name. |
-| Logical operators | =, >, >=, <, <=, ! = |
-| Field length limit | Like the title. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the validation data is located. |
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Src table check column | Drop-down to select the check column name. |
+| Logical operators | =, >, >=, <, <=, ! = |
+| Field length limit | Like the title. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Uniqueness Check for Single Table Check
### Inspection Introduction
+
The goal of the uniqueness check is to check whether the fields are duplicated. It is generally used to check whether the primary key is duplicated. If there are duplicates and the threshold is reached, the check task will be judged to be failed.
### Interface Operation Guide
![dataquality_uniqueness_check](../../../img/tasks/demo/uniqueness_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the validation data is located. |
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Src table check column | Drop-down to select the check column name. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the validation data is located. |
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Src table check column | Drop-down to select the check column name. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Regular Expression Check for Single Table Check
### Inspection Introduction
+
The goal of regular expression verification is to check whether the format of the value of a field meets the requirements, such as time format, email format, ID card format, etc. If there is data that does not meet the format and exceeds the threshold, the task will be judged as failed.
### Interface Operation Guide
![dataquality_regex_check](../../../img/tasks/demo/regexp_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the validation data is located. |
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Src table check column | Drop-down to select check column name. |
-| Regular expression | As title. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the validation data is located. |
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Src table check column | Drop-down to select check column name. |
+| Regular expression | As title. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Enumeration Value Validation for Single Table Check
+
### Inspection Introduction
+
The goal of enumeration value verification is to check whether the value of a field is within the range of the enumeration value. If there is data that is not in the range of the enumeration value and exceeds the threshold, the task will be judged to fail.
### Interface Operation Guide
![dataquality_enum_check](../../../img/tasks/demo/enumeration_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the validation data is located. |
-| Src table filter conditions | Such as title, also used when counting the total number of rows in the table, optional. |
-| Src table check column | Drop-down to select the check column name. |
-| List of enumeration values | Separated by commas. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the validation data is located. |
+| Src table filter conditions | Such as title, also used when counting the total number of rows in the table, optional. |
+| Src table check column | Drop-down to select the check column name. |
+| List of enumeration values | Separated by commas. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Table Row Number Verification for Single Table Check
### Inspection Introduction
+
The goal of table row number verification is to check whether the number of rows in the table reaches the expected value. If the number of rows does not meet the standard, the task will be judged as failed.
### Interface Operation Guide
![dataquality_count_check](../../../img/tasks/demo/table_count_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the validation data is located. |
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Src table check column | Drop-down to select the check column name. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the validation data is located. |
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Src table check column | Drop-down to select the check column name. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Custom SQL Check for Single Table Check
@@ -217,36 +225,38 @@ The goal of table row number verification is to check whether the number of rows
![dataquality_custom_sql_check](../../../img/tasks/demo/custom_sql_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the data to be verified is located. |
-| Actual value name | Alias in SQL for statistical value calculation, such as max_num. |
-|Actual value calculation SQL | SQL for outputting actual values. Note:
The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
Select max(a) as max_num from ${src_table}, the table name must be filled like this.
|
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
-| Check operators | =, >, >=, <, <=, ! = |
-| Threshold | The value used in the formula for comparison. |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type from the drop-down menu. |
+| **Parameter** | **Description** |
+|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the data to be verified is located. |
+| Actual value name | Alias in SQL for statistical value calculation, such as max_num. |
+| Actual value calculation SQL | SQL for outputting actual values. Note:
The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
Select max(a) as max_num from ${src_table}, the table name must be filled like this.
|
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Check method |
[Expected-Actual]
[Actual-Expected]
[Actual/Expected]x100%
[(Expected-Actual)/Expected]x100%
|
+| Check operators | =, >, >=, <, <=, ! = |
+| Threshold | The value used in the formula for comparison. |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type from the drop-down menu. |
## Accuracy Check of Multi-table
+
### Inspection Introduction
+
Accuracy checks are performed by comparing the accuracy differences of data records for selected fields between two tables, examples are as follows
- table test1
| c1 | c2 |
-| :---: | :---: |
-| a | 1 |
-| b | 2 |
+|:--:|:--:|
+| a | 1 |
+| b | 2 |
- table test2
| c21 | c22 |
-| :---: | :---: |
-| a | 1 |
-| b | 3 |
+|:---:|:---:|
+| a | 1 |
+| b | 3 |
If you compare the data in c1 and c21, the tables test1 and test2 are exactly the same. If you compare c2 and c22, the data in table test1 and table test2 are inconsistent.
@@ -254,45 +264,47 @@ If you compare the data in c1 and c21, the tables test1 and test2 are exactly th
![dataquality_multi_table_accuracy_check](../../../img/tasks/demo/multi_table_accuracy_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | Drop-down to select the table where the data to be verified is located. |
-| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Target data type | Choose MySQL, PostgreSQL, etc. |
-| Target data source | The corresponding data source under the source data type. |
-| Target data table | Drop-down to select the table where the data to be verified is located. |
-| Target filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
-| Check column | Fill in the source data column, operator and target data column respectively. |
-| Verification method | Select the desired verification method. |
-| Operators | =, >, >=, <, <=, ! = |
-| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
-| Expected value type | Select the desired type in the drop-down menu, only `SrcTableTotalRow`, `TargetTableTotalRow` and fixed value are suitable for selection here. |
+| **Parameter** | **Description** |
+|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | Drop-down to select the table where the data to be verified is located. |
+| Src filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Target data type | Choose MySQL, PostgreSQL, etc. |
+| Target data source | The corresponding data source under the source data type. |
+| Target data table | Drop-down to select the table where the data to be verified is located. |
+| Target filter conditions | Such as the title, it will also be used when counting the total number of rows in the table, optional. |
+| Check column | Fill in the source data column, operator and target data column respectively. |
+| Verification method | Select the desired verification method. |
+| Operators | =, >, >=, <, <=, ! = |
+| Failure strategy |
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
+| Expected value type | Select the desired type in the drop-down menu, only `SrcTableTotalRow`, `TargetTableTotalRow` and fixed value are suitable for selection here. |
## Comparison of the values checked by the two tables
+
### Inspection Introduction
+
Two-table value comparison allows users to customize different SQL statistics for two tables and compare the corresponding values. For example, for the source table A, the total amount of a certain column is calculated, and for the target table, the total amount of a certain column is calculated. value sum2, compare sum1 and sum2 to determine the check result.
### Interface Operation Guide
![dataquality_multi_table_comparison_check](../../../img/tasks/demo/multi_table_comparison_check.png)
-| **Parameter** | **Description** |
-| ----- | ---- |
-| Source data type | Select MySQL, PostgreSQL, etc. |
-| Source data source | The corresponding data source under the source data type. |
-| Source data table | The table where the data is to be verified. |
-| Actual value name | Calculate the alias in SQL for the actual value, such as max_age1. |
-| Actual value calculation SQL | SQL for outputting actual values. Note:
The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
Select max(age) as max_age1 from ${src_table} The table name must be filled like this.
|
-| Target data type | Choose MySQL, PostgreSQL, etc. |
-| Target data source | The corresponding data source under the source data type. |
-| Target data table | The table where the data is to be verified. |
-| Expected value name | Calculate the alias in SQL for the expected value, such as max_age2. |
+| **Parameter** | **Description** |
+|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Source data type | Select MySQL, PostgreSQL, etc. |
+| Source data source | The corresponding data source under the source data type. |
+| Source data table | The table where the data is to be verified. |
+| Actual value name | Calculate the alias in SQL for the actual value, such as max_age1. |
+| Actual value calculation SQL | SQL for outputting actual values. Note:
The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
Select max(age) as max_age1 from ${src_table} The table name must be filled like this.
|
+| Target data type | Choose MySQL, PostgreSQL, etc. |
+| Target data source | The corresponding data source under the source data type. |
+| Target data table | The table where the data is to be verified. |
+| Expected value name | Calculate the alias in SQL for the expected value, such as max_age2. |
| Expected value calculation SQL | SQL for outputting expected value. Note:
The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
Select max(age) as max_age2 from ${target_table} The table name must be filled like this.
Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent.
Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent.
|
## Task result view
@@ -306,4 +318,4 @@ Two-table value comparison allows users to customize different SQL statistics fo
### Rules Details
-![dataquality_rule_detail](../../../img/tasks/demo/rule_detail.png)
\ No newline at end of file
+![dataquality_rule_detail](../../../img/tasks/demo/rule_detail.png)
diff --git a/docs/docs/en/guide/datasource/athena.md b/docs/docs/en/guide/datasource/athena.md
index ab92e06238ec..035c8f7c4bd5 100644
--- a/docs/docs/en/guide/datasource/athena.md
+++ b/docs/docs/en/guide/datasource/athena.md
@@ -4,15 +4,15 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select ATHENA. |
-| Datasource name | Enter the name of the DataSource. |
-| Description | Enter a description of the DataSource. |
-| Username | Set the AWS access key. |
-| Password | Set the AWS secret access key. |
-| AwsRegion | Set the AWS region. |
-| Database name | Enter the database name of the ATHENA connection. |
+| **Datasource** | **Description** |
+|----------------------------|-----------------------------------------------------------|
+| Datasource | Select ATHENA. |
+| Datasource name | Enter the name of the DataSource. |
+| Description | Enter a description of the DataSource. |
+| Username | Set the AWS access key. |
+| Password | Set the AWS secret access key. |
+| AwsRegion | Set the AWS region. |
+| Database name | Enter the database name of the ATHENA connection. |
| Jdbc connection parameters | Parameter settings for ATHENA connection, in JSON format. |
## Native Supported
@@ -20,3 +20,4 @@
- No, read section example in [datasource-setting](../howto/datasource-setting.md) `DataSource Center` section to activate this datasource.
- JDBC driver configuration reference document [athena-connect-with-jdbc](https://docs.amazonaws.cn/athena/latest/ug/connect-with-jdbc.html)
- Driver download link [SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar](https://s3.cn-north-1.amazonaws.com.cn/athena-downloads-cn/drivers/JDBC/SimbaAthenaJDBC-2.0.31.1000/AthenaJDBC42.jar)
+
diff --git a/docs/docs/en/guide/datasource/clickhouse.md b/docs/docs/en/guide/datasource/clickhouse.md
index 0fb78366cd54..8de091a93884 100644
--- a/docs/docs/en/guide/datasource/clickhouse.md
+++ b/docs/docs/en/guide/datasource/clickhouse.md
@@ -4,18 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select CLICKHOUSE. |
-| Datasource Name | Enter the name of the datasource. |
-| Description | Enter a description of the datasource. |
-| IP/Host Name | Enter the CLICKHOUSE service IP. |
-| Port | Enter the CLICKHOUSE service port. |
-| Username | Set the username for CLICKHOUSE connection. |
-| Password | Set the password for CLICKHOUSE connection. |
-| Database Name | Enter the database name of the CLICKHOUSE connection. |
+| **Datasource** | **Description** |
+|-------------------------|---------------------------------------------------------------|
+| Datasource | Select CLICKHOUSE. |
+| Datasource Name | Enter the name of the datasource. |
+| Description | Enter a description of the datasource. |
+| IP/Host Name | Enter the CLICKHOUSE service IP. |
+| Port | Enter the CLICKHOUSE service port. |
+| Username | Set the username for CLICKHOUSE connection. |
+| Password | Set the password for CLICKHOUSE connection. |
+| Database Name | Enter the database name of the CLICKHOUSE connection. |
| jdbc connect parameters | Parameter settings for CLICKHOUSE connection, in JSON format. |
## Native Supported
-Yes, could use this datasource by default.
\ No newline at end of file
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/db2.md b/docs/docs/en/guide/datasource/db2.md
index 33d459a0a581..ef839e38c91d 100644
--- a/docs/docs/en/guide/datasource/db2.md
+++ b/docs/docs/en/guide/datasource/db2.md
@@ -4,18 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select DB2. |
-| Datasource Name | Enter the name of the datasource. |
-| Description | Enter a description of the datasource. |
-| IP/Host Name | Enter the DB2 service IP. |
-| Port | Enter the DB2 service port. |
-| Username | Set the username for DB2 connection. |
-| Password | Set the password for DB2 connection. |
-| Database Name | Enter the database name of the DB2 connection. |
+| **Datasource** | **Description** |
+|-------------------------|--------------------------------------------------------|
+| Datasource | Select DB2. |
+| Datasource Name | Enter the name of the datasource. |
+| Description | Enter a description of the datasource. |
+| IP/Host Name | Enter the DB2 service IP. |
+| Port | Enter the DB2 service port. |
+| Username | Set the username for DB2 connection. |
+| Password | Set the password for DB2 connection. |
+| Database Name | Enter the database name of the DB2 connection. |
| jdbc connect parameters | Parameter settings for DB2 connection, in JSON format. |
## Native Supported
-Yes, could use this datasource by default.
\ No newline at end of file
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/hive.md b/docs/docs/en/guide/datasource/hive.md
index 2a29ccb31917..4af38ac0eabe 100644
--- a/docs/docs/en/guide/datasource/hive.md
+++ b/docs/docs/en/guide/datasource/hive.md
@@ -6,27 +6,27 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select HIVE. |
-| Datasource name | Enter the name of the DataSource. |
-| Description | Enter a description of the DataSource. |
-| IP/Host Name | Enter the HIVE service IP. |
-| Port | Enter the HIVE service port. |
-| Username | Set the username for HIVE connection. |
-| Password | Set the password for HIVE connection. |
-| Database name | Enter the database name of the HIVE connection. |
+| **Datasource** | **Description** |
+|----------------------------|---------------------------------------------------------|
+| Datasource | Select HIVE. |
+| Datasource name | Enter the name of the DataSource. |
+| Description | Enter a description of the DataSource. |
+| IP/Host Name | Enter the HIVE service IP. |
+| Port | Enter the HIVE service port. |
+| Username | Set the username for HIVE connection. |
+| Password | Set the password for HIVE connection. |
+| Database name | Enter the database name of the HIVE connection. |
| Jdbc connection parameters | Parameter settings for HIVE connection, in JSON format. |
-> NOTICE: If you wish to execute multiple HIVE SQL in the same session, you could set `support.hive.oneSession = true` in `common.properties`.
+> NOTICE: If you wish to execute multiple HIVE SQL in the same session, you could set `support.hive.oneSession = true` in `common.properties`.
> It is helpful when you try to set env variables before running HIVE SQL. Default value of `support.hive.oneSession` is `false` and multi-SQLs run in different sessions.
## Use HiveServer2 HA ZooKeeper
![hive-server2](../../../../img/new_ui/dev/datasource/hiveserver2.png)
-NOTICE: If Kerberos is disabled, ensure the parameter `hadoop.security.authentication.startup.state` is false, and parameter `java.security.krb5.conf.path` value sets null.
-If **Kerberos** is enabled, needs to set the following parameters in `common.properties`:
+NOTICE: If Kerberos is disabled, ensure the parameter `hadoop.security.authentication.startup.state` is false, and parameter `java.security.krb5.conf.path` value sets null.
+If **Kerberos** is enabled, needs to set the following parameters in `common.properties`:
```conf
# whether to startup kerberos
@@ -44,4 +44,4 @@ login.user.keytab.path=/opt/hdfs.headless.keytab
## Native Supported
-Yes, could use this datasource by default.
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/mysql.md b/docs/docs/en/guide/datasource/mysql.md
index 5b9c4642f59f..e4d430fb0dcb 100644
--- a/docs/docs/en/guide/datasource/mysql.md
+++ b/docs/docs/en/guide/datasource/mysql.md
@@ -4,16 +4,16 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select MYSQL. |
-| Datasource name | Enter the name of the DataSource. |
-| Description | Enter a description of the DataSource. |
-| IP/Host Name | Enter the MYSQL service IP. |
-| Port | Enter the MYSQL service port. |
-| Username | Set the username for MYSQL connection. |
-| Password | Set the password for MYSQL connection. |
-| Database name | Enter the database name of the MYSQL connection. |
+| **Datasource** | **Description** |
+|----------------------------|----------------------------------------------------------|
+| Datasource | Select MYSQL. |
+| Datasource name | Enter the name of the DataSource. |
+| Description | Enter a description of the DataSource. |
+| IP/Host Name | Enter the MYSQL service IP. |
+| Port | Enter the MYSQL service port. |
+| Username | Set the username for MYSQL connection. |
+| Password | Set the password for MYSQL connection. |
+| Database name | Enter the database name of the MYSQL connection. |
| Jdbc connection parameters | Parameter settings for MYSQL connection, in JSON format. |
## Native Supported
diff --git a/docs/docs/en/guide/datasource/oracle.md b/docs/docs/en/guide/datasource/oracle.md
index c7d217ad5118..4fdaf941952f 100644
--- a/docs/docs/en/guide/datasource/oracle.md
+++ b/docs/docs/en/guide/datasource/oracle.md
@@ -4,18 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select Oracle. |
-| Datasource Name | Enter the name of the datasource. |
-| Description | Enter a description of the datasource. |
-| IP/Host Name | Enter the Oracle service IP. |
-| Port | Enter the Oracle service port. |
-| Username | Set the username for Oracle connection. |
-| Password | Set the password for Oracle connection. |
-| Database Name | Enter the database name of the Oracle connection. |
+| **Datasource** | **Description** |
+|-------------------------|-----------------------------------------------------------|
+| Datasource | Select Oracle. |
+| Datasource Name | Enter the name of the datasource. |
+| Description | Enter a description of the datasource. |
+| IP/Host Name | Enter the Oracle service IP. |
+| Port | Enter the Oracle service port. |
+| Username | Set the username for Oracle connection. |
+| Password | Set the password for Oracle connection. |
+| Database Name | Enter the database name of the Oracle connection. |
| jdbc connect parameters | Parameter settings for Oracle connection, in JSON format. |
## Native Supported
-Yes, could use this datasource by default.
\ No newline at end of file
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/postgresql.md b/docs/docs/en/guide/datasource/postgresql.md
index 08d92edd317c..cb3daf41c4d6 100644
--- a/docs/docs/en/guide/datasource/postgresql.md
+++ b/docs/docs/en/guide/datasource/postgresql.md
@@ -4,18 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select POSTGRESQL. |
-| Datasource name | Enter the name of the DataSource. |
-| Description | Enter a description of the DataSource. |
-| IP/Host Name | Enter the PostgreSQL service IP. |
-| Port | Enter the PostgreSQL service port. |
-| Username | Set the username for PostgreSQL connection. |
-| Password | Set the password for PostgreSQL connection. |
-| Database name | Enter the database name of the PostgreSQL connection. |
+| **Datasource** | **Description** |
+|----------------------------|---------------------------------------------------------------|
+| Datasource | Select POSTGRESQL. |
+| Datasource name | Enter the name of the DataSource. |
+| Description | Enter a description of the DataSource. |
+| IP/Host Name | Enter the PostgreSQL service IP. |
+| Port | Enter the PostgreSQL service port. |
+| Username | Set the username for PostgreSQL connection. |
+| Password | Set the password for PostgreSQL connection. |
+| Database name | Enter the database name of the PostgreSQL connection. |
| Jdbc connection parameters | Parameter settings for PostgreSQL connection, in JSON format. |
## Native Supported
-Yes, could use this datasource by default.
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/presto.md b/docs/docs/en/guide/datasource/presto.md
index 6302954056e6..70b7fb90d9e0 100644
--- a/docs/docs/en/guide/datasource/presto.md
+++ b/docs/docs/en/guide/datasource/presto.md
@@ -4,19 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select Presto. |
-| Datasource Name | Enter the name of the datasource. |
-| Description | Enter a description of the datasource. |
-| IP/Host Name | Enter the Presto service IP. |
-| Port | Enter the Presto service port. |
-| Username | Set the username for Presto connection. |
-| Password | Set the password for Presto connection. |
-| Database Name | Enter the database name of the Presto connection. |
+| **Datasource** | **Description** |
+|-------------------------|-----------------------------------------------------------|
+| Datasource | Select Presto. |
+| Datasource Name | Enter the name of the datasource. |
+| Description | Enter a description of the datasource. |
+| IP/Host Name | Enter the Presto service IP. |
+| Port | Enter the Presto service port. |
+| Username | Set the username for Presto connection. |
+| Password | Set the password for Presto connection. |
+| Database Name | Enter the database name of the Presto connection. |
| jdbc connect parameters | Parameter settings for Presto connection, in JSON format. |
-
## Native Supported
-Yes, could use this datasource by default.
\ No newline at end of file
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/redshift.md b/docs/docs/en/guide/datasource/redshift.md
index 3dbae981d111..60dd982492dd 100644
--- a/docs/docs/en/guide/datasource/redshift.md
+++ b/docs/docs/en/guide/datasource/redshift.md
@@ -4,18 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select Redshift. |
-| Datasource Name | Enter the name of the datasource. |
-| Description | Enter a description of the datasource. |
-| IP/Host Name | Enter the Redshift service IP. |
-| Port | Enter the Redshift service port. |
-| Username | Set the username for Redshift connection. |
-| Password | Set the password for Redshift connection. |
-| Database Name | Enter the database name of the Redshift connection. |
+| **Datasource** | **Description** |
+|-------------------------|-------------------------------------------------------------|
+| Datasource | Select Redshift. |
+| Datasource Name | Enter the name of the datasource. |
+| Description | Enter a description of the datasource. |
+| IP/Host Name | Enter the Redshift service IP. |
+| Port | Enter the Redshift service port. |
+| Username | Set the username for Redshift connection. |
+| Password | Set the password for Redshift connection. |
+| Database Name | Enter the database name of the Redshift connection. |
| jdbc connect parameters | Parameter settings for Redshift connection, in JSON format. |
## Native Supported
-Yes, could use this datasource by default.
\ No newline at end of file
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/spark.md b/docs/docs/en/guide/datasource/spark.md
index e3ea0acac7c3..bbf1075dc169 100644
--- a/docs/docs/en/guide/datasource/spark.md
+++ b/docs/docs/en/guide/datasource/spark.md
@@ -4,18 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select Spark. |
-| Datasource name | Enter the name of the DataSource. |
-| Description | Enter a description of the DataSource. |
-| IP/Host Name | Enter the Spark service IP. |
-| Port | Enter the Spark service port. |
-| Username | Set the username for Spark connection. |
-| Password | Set the password for Spark connection. |
-| Database name | Enter the database name of the Spark connection. |
+| **Datasource** | **Description** |
+|----------------------------|----------------------------------------------------------|
+| Datasource | Select Spark. |
+| Datasource name | Enter the name of the DataSource. |
+| Description | Enter a description of the DataSource. |
+| IP/Host Name | Enter the Spark service IP. |
+| Port | Enter the Spark service port. |
+| Username | Set the username for Spark connection. |
+| Password | Set the password for Spark connection. |
+| Database name | Enter the database name of the Spark connection. |
| Jdbc connection parameters | Parameter settings for Spark connection, in JSON format. |
## Native Supported
-Yes, could use this datasource by default.
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/datasource/sqlserver.md b/docs/docs/en/guide/datasource/sqlserver.md
index be0addf991fc..788d1b477fb1 100644
--- a/docs/docs/en/guide/datasource/sqlserver.md
+++ b/docs/docs/en/guide/datasource/sqlserver.md
@@ -4,18 +4,18 @@
## Datasource Parameters
-| **Datasource** | **Description** |
-| --- | --- |
-| Datasource | Select SQLSERVER. |
-| Datasource Name | Enter the name of the datasource. |
-| Description | Enter a description of the datasource. |
-| IP/Host Name | Enter the SQLSERVER service IP. |
-| Port | Enter the SQLSERVER service port. |
-| Username | Set the username for SQLSERVER connection. |
-| Password | Set the password for SQLSERVER connection. |
-| Database Name | Enter the database name of the SQLSERVER connection. |
+| **Datasource** | **Description** |
+|-------------------------|--------------------------------------------------------------|
+| Datasource | Select SQLSERVER. |
+| Datasource Name | Enter the name of the datasource. |
+| Description | Enter a description of the datasource. |
+| IP/Host Name | Enter the SQLSERVER service IP. |
+| Port | Enter the SQLSERVER service port. |
+| Username | Set the username for SQLSERVER connection. |
+| Password | Set the password for SQLSERVER connection. |
+| Database Name | Enter the database name of the SQLSERVER connection. |
| jdbc connect parameters | Parameter settings for SQLSERVER connection, in JSON format. |
## Native Supported
-Yes, could use this datasource by default.
\ No newline at end of file
+Yes, could use this datasource by default.
diff --git a/docs/docs/en/guide/expansion-reduction.md b/docs/docs/en/guide/expansion-reduction.md
index f7cd12e5895f..c58a85e9e272 100644
--- a/docs/docs/en/guide/expansion-reduction.md
+++ b/docs/docs/en/guide/expansion-reduction.md
@@ -1,12 +1,12 @@
# DolphinScheduler Expansion and Reduction
-## Expansion
+## Expansion
This article describes how to add a new master service or worker service to an existing DolphinScheduler cluster.
```
- Attention: There cannot be more than one master service process or worker service process on a physical machine.
- If the physical machine which locate the expansion master or worker node has already installed the scheduled service, check the [1.4 Modify configuration] and edit the configuration file `conf/config/install_config.conf` on ** all ** nodes, add masters or workers parameter, and restart the scheduling cluster.
+Attention: There cannot be more than one master service process or worker service process on a physical machine.
+ If the physical machine which locate the expansion master or worker node has already installed the scheduled service, check the [1.4 Modify configuration] and edit the configuration file `conf/config/install_config.conf` on ** all ** nodes, add masters or workers parameter, and restart the scheduling cluster.
```
### Basic software installation
@@ -14,16 +14,15 @@ This article describes how to add a new master service or worker service to an e
* [required] [JDK](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (version 1.8+): must install, install and configure `JAVA_HOME` and `PATH` variables under `/etc/profile`
* [optional] If the expansion is a worker node, you need to consider whether to install an external client, such as Hadoop, Hive, Spark Client.
-
```markdown
- Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.
+Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.
```
### Get Installation Package
- Check the version of DolphinScheduler used in your existing environment, and get the installation package of the corresponding version, if the versions are different, there may be compatibility problems.
- Confirm the unified installation directory of other nodes, this article assumes that DolphinScheduler is installed in `/opt/` directory, and the full path is `/opt/dolphinscheduler`.
-- Please download the corresponding version of the installation package to the server installation directory, uncompress it and rename it to `dolphinscheduler` and store it in the `/opt` directory.
+- Please download the corresponding version of the installation package to the server installation directory, uncompress it and rename it to `dolphinscheduler` and store it in the `/opt` directory.
- Add database dependency package, this document uses Mysql database, add `mysql-connector-java` driver package to `/opt/dolphinscheduler/lib` directory.
```shell
@@ -37,7 +36,7 @@ mv apache-dolphinscheduler--bin dolphinscheduler
```
```markdown
- Attention: You can copy the installation package directly from an existing environment to an expanded physical machine.
+Attention: You can copy the installation package directly from an existing environment to an expanded physical machine.
```
### Create Deployment Users
@@ -58,53 +57,49 @@ sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
```
```markdown
- Attention:
- - Since it is `sudo -u {linux-user}` to switch between different Linux users to run multi-tenant jobs, the deploying user needs to have sudo privileges and be password free.
- - If you find the line `Default requiretty` in the `/etc/sudoers` file, please also comment it out.
- - If have needs to use resource uploads, you also need to assign read and write permissions to the deployment user on `HDFS or MinIO`.
+Attention:
+- Since it is `sudo -u {linux-user}` to switch between different Linux users to run multi-tenant jobs, the deploying user needs to have sudo privileges and be password free.
+- If you find the line `Default requiretty` in the `/etc/sudoers` file, please also comment it out.
+- If have needs to use resource uploads, you also need to assign read and write permissions to the deployment user on `HDFS or MinIO`.
```
### Modify Configuration
- From an existing node such as `Master/Worker`, copy the configuration directory directly to replace the configuration directory in the new node. After finishing the file copy, check whether the configuration items are correct.
-
- ```markdown
- Highlights:
- datasource.properties: database connection information
- zookeeper.properties: information for connecting zk
- common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
- dolphinscheduler_env.sh: environment Variables
- ````
+ ```markdown
+ Highlights:
+ datasource.properties: database connection information
+ zookeeper.properties: information for connecting zk
+ common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
+ dolphinscheduler_env.sh: environment Variables
+ ```
- Modify the `dolphinscheduler_env.sh` environment variable in the `bin/env/dolphinscheduler_env.sh` directory according to the machine configuration (the following is the example that all the used software install under `/opt/soft`)
- ```shell
- export HADOOP_HOME=/opt/soft/hadoop
- export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
- # export SPARK_HOME1=/opt/soft/spark1
- export SPARK_HOME2=/opt/soft/spark2
- export PYTHON_HOME=/opt/soft/python
- export JAVA_HOME=/opt/soft/jav
- export HIVE_HOME=/opt/soft/hive
- export FLINK_HOME=/opt/soft/flink
- export DATAX_HOME=/opt/soft/datax/bin/datax.py
- export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
-
- ```
+ ```shell
+ export HADOOP_HOME=/opt/soft/hadoop
+ export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
+ # export SPARK_HOME1=/opt/soft/spark1
+ export SPARK_HOME2=/opt/soft/spark2
+ export PYTHON_HOME=/opt/soft/python
+ export JAVA_HOME=/opt/soft/jav
+ export HIVE_HOME=/opt/soft/hive
+ export FLINK_HOME=/opt/soft/flink
+ export DATAX_HOME=/opt/soft/datax/bin/datax.py
+ export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH
- `Attention: This step is very important, such as `JAVA_HOME` and `PATH` is necessary to configure if haven not used just ignore or comment out`
+ ```
+ `Attention: This step is very important, such as `JAVA_HOME` and `PATH` is necessary to configure if haven not used just ignore or comment out`
- Soft link the `JDK` to `/usr/bin/java` (still using `JAVA_HOME=/opt/soft/java` as an example)
- ```shell
- sudo ln -s /opt/soft/java/bin/java /usr/bin/java
- ```
-
- - Modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
-
- * To add a new master node, you need to modify the IPs and masters parameters.
- * To add a new worker node, modify the IPs and workers parameters.
+ ```shell
+ sudo ln -s /opt/soft/java/bin/java /usr/bin/java
+ ```
+- Modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
+ * To add a new master node, you need to modify the IPs and masters parameters.
+ * To add a new worker node, modify the IPs and workers parameters.
```shell
# which machines to deploy DS services on, separated by commas between multiple physical machines
@@ -120,6 +115,7 @@ masters="existing master01,existing master02,ds1,ds2"
workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"
```
+
- If the expansion is for worker nodes, you need to set the worker group, refer to the security of the [Worker grouping](./security.md)
- On all new nodes, change the directory permissions so that the deployment user has access to the DolphinScheduler directory
@@ -154,26 +150,26 @@ bash bin/dolphinscheduler-daemon.sh start alert-server # start alert service
```
```
- Attention: When using `stop-all.sh` or `stop-all.sh`, if the physical machine execute the command is not configured to be ssh-free on all machines, it will prompt to enter the password
+Attention: When using `stop-all.sh` or `stop-all.sh`, if the physical machine execute the command is not configured to be ssh-free on all machines, it will prompt to enter the password
```
- After completing the script, use the `jps` command to see if every node service is started (`jps` comes with the `Java JDK`)
```
- MasterServer ----- master service
- WorkerServer ----- worker service
- ApiApplicationServer ----- api service
- AlertServer ----- alert service
+MasterServer ----- master service
+WorkerServer ----- worker service
+ApiApplicationServer ----- api service
+AlertServer ----- alert service
```
After successful startup, you can view the logs, which are stored in the `logs` folder.
```Log Path
- logs/
- ├── dolphinscheduler-alert-server.log
- ├── dolphinscheduler-master-server.log
- ├── dolphinscheduler-worker-server.log
- ├── dolphinscheduler-api-server.log
+logs/
+ ├── dolphinscheduler-alert-server.log
+ ├── dolphinscheduler-master-server.log
+ ├── dolphinscheduler-worker-server.log
+ ├── dolphinscheduler-api-server.log
```
If the above services start normally and the scheduling system page is normal, check whether there is an expanded Master or Worker service in the [Monitor] of the web system. If it exists, the expansion is complete.
@@ -187,9 +183,9 @@ There are two steps for shrinking. After performing the following two steps, the
### Stop the Service on the Scaled-Down Node
- * If you are scaling down the master node, identify the physical machine where the master service is located, and stop the master service on the physical machine.
- * If scale down the worker node, determine the physical machine where the worker service scale down and stop the worker services on the physical machine.
-
+* If you are scaling down the master node, identify the physical machine where the master service is located, and stop the master service on the physical machine.
+* If scale down the worker node, determine the physical machine where the worker service scale down and stop the worker services on the physical machine.
+
```shell
# stop command:
bin/stop-all.sh # stop all services
@@ -211,26 +207,25 @@ bash bin/dolphinscheduler-daemon.sh start alert-server # start alert service
```
```
- Attention: When using `stop-all.sh` or `stop-all.sh`, if the machine without the command is not configured to be ssh-free for all machines, it will prompt to enter the password
+Attention: When using `stop-all.sh` or `stop-all.sh`, if the machine without the command is not configured to be ssh-free for all machines, it will prompt to enter the password
```
- After the script is completed, use the `jps` command to see if every node service was successfully shut down (`jps` comes with the `Java JDK`)
```
- MasterServer ----- master service
- WorkerServer ----- worker service
- ApiApplicationServer ----- api service
- AlertServer ----- alert service
+MasterServer ----- master service
+WorkerServer ----- worker service
+ApiApplicationServer ----- api service
+AlertServer ----- alert service
```
-If the corresponding master service or worker service does not exist, then the master or worker service is successfully shut down.
+If the corresponding master service or worker service does not exist, then the master or worker service is successfully shut down.
### Modify the Configuration File
- - modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
-
- * to scale down the master node, modify the IPs and masters parameters.
- * to scale down worker nodes, modify the IPs and workers parameters.
+- modify the configuration file `conf/config/install_config.conf` on the **all** nodes, synchronizing the following configuration.
+ * to scale down the master node, modify the IPs and masters parameters.
+ * to scale down worker nodes, modify the IPs and workers parameters.
```shell
# which machines to deploy DS services on, "localhost" for this machine
@@ -246,3 +241,4 @@ masters="existing master01,existing master02,ds1,ds2"
workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"
```
+
diff --git a/docs/docs/en/guide/healthcheck.md b/docs/docs/en/guide/healthcheck.md
index fdb6efd45657..d80c683daf89 100644
--- a/docs/docs/en/guide/healthcheck.md
+++ b/docs/docs/en/guide/healthcheck.md
@@ -39,3 +39,4 @@ curl --request GET 'http://localhost:50053/actuator/health'
```
> Notice: If you modify the default service port and address, you need to modify the IP+Port to the modified value.
+
diff --git a/docs/docs/en/guide/howto/datasource-setting.md b/docs/docs/en/guide/howto/datasource-setting.md
index 4dddb9a7a29b..5fc4e4be096c 100644
--- a/docs/docs/en/guide/howto/datasource-setting.md
+++ b/docs/docs/en/guide/howto/datasource-setting.md
@@ -5,7 +5,7 @@
We here use MySQL as an example to illustrate how to configure an external database:
> NOTE: If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler
-which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
+> which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
* First of all, follow the instructions in [datasource-setting](datasource-setting.md) `Pseudo-Cluster/Cluster Initialize the Database` section to create and initialize database
* Set the following environment variables in your terminal or modify the `bin/env/dolphinscheduler_env.sh` with your database username and password for `{user}` and `{password}`:
@@ -26,7 +26,6 @@ DolphinScheduler stores metadata in `relational database`. Currently, we support
> If you use MySQL, you need to manually download [mysql-connector-java driver][mysql] (8.0.16) and move it to the libs directory of DolphinScheduler which is `api-server/libs` and `alert-server/libs` and `master-server/libs` and `worker-server/libs`.
-
For mysql 5.6 / 5.7
```shell
@@ -54,9 +53,10 @@ mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%';
mysql> CREATE USER '{user}'@'localhost' IDENTIFIED BY '{password}';
mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'localhost';
mysql> FLUSH PRIVILEGES;
-```
+```
+
+For PostgreSQL:
-For PostgreSQL:
```shell
# Use psql-tools to login PostgreSQL
psql
@@ -75,6 +75,7 @@ pg_ctl reload
Then, modify `./bin/env/dolphinscheduler_env.sh`, change {user} and {password} to what you set in the previous step.
For MySQL:
+
```shell
# for mysql
export DATABASE=${DATABASE:-mysql}
@@ -85,6 +86,7 @@ export SPRING_DATASOURCE_PASSWORD={password}
```
For PostgreSQL:
+
```shell
# for postgresql
export DATABASE=${DATABASE:-postgresql}
@@ -125,3 +127,4 @@ like Docker.
> But if you want to use MySQL as the metabase of DolphinScheduler, it only supports [8.0.16 and above](https:/ /repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar) version.
[mysql]: https://downloads.MySQL.com/archives/c-j/
+
diff --git a/docs/docs/en/guide/howto/general-setting.md b/docs/docs/en/guide/howto/general-setting.md
index e8a1a5f0119a..7a12460681c8 100644
--- a/docs/docs/en/guide/howto/general-setting.md
+++ b/docs/docs/en/guide/howto/general-setting.md
@@ -14,7 +14,7 @@ of to [language](#language) control button.
## Time Zone
-DolphinScheduler support time zone setting.
+DolphinScheduler support time zone setting.
Server Time Zone
diff --git a/docs/docs/en/guide/installation/cluster.md b/docs/docs/en/guide/installation/cluster.md
index d7054ac72126..14ae58a47978 100644
--- a/docs/docs/en/guide/installation/cluster.md
+++ b/docs/docs/en/guide/installation/cluster.md
@@ -36,4 +36,4 @@ Same as [pseudo-cluster](pseudo-cluster.md)
## Start and Stop Server
-Same as [pseudo-cluster](pseudo-cluster.md)
\ No newline at end of file
+Same as [pseudo-cluster](pseudo-cluster.md)
diff --git a/docs/docs/en/guide/installation/pseudo-cluster.md b/docs/docs/en/guide/installation/pseudo-cluster.md
index b01602e1dd50..23fbe341d20b 100644
--- a/docs/docs/en/guide/installation/pseudo-cluster.md
+++ b/docs/docs/en/guide/installation/pseudo-cluster.md
@@ -154,7 +154,7 @@ bash ./bin/install.sh
```
> **_Note:_** For the first time deployment, there maybe occur five times of `sh: bin/dolphinscheduler-daemon.sh: No such file or directory` in the terminal,
- this is non-important information that you can ignore.
+> this is non-important information that you can ignore.
## Login DolphinScheduler
@@ -190,11 +190,12 @@ bash ./bin/dolphinscheduler-daemon.sh stop alert-server
> for micro-services need. It means that you could start all servers by command `/bin/start.sh` with different
> environment variable from `/conf/dolphinscheduler_env.sh`. But it will use file `bin/env/dolphinscheduler_env.sh` overwrite
> `/conf/dolphinscheduler_env.sh` if you start server with command `/bin/dolphinscheduler-daemon.sh start `.
-
+>
> **_Note2:_**: Please refer to the section of "System Architecture Design" for service usage. Python gateway service is
> started along with the api-server, and if you do not want to start Python gateway service please disabled it by changing
-> the yaml config `python-gateway.enabled : false` in api-server's configuration path `api-server/conf/application.yaml`
+> the yaml config `python-gateway.enabled : false` in api-server's configuration path `api-server/conf/application.yaml`
[jdk]: https://www.oracle.com/technetwork/java/javase/downloads/index.html
[zookeeper]: https://zookeeper.apache.org/releases.html
[issue]: https://github.com/apache/dolphinscheduler/issues/6597
+
diff --git a/docs/docs/en/guide/installation/standalone.md b/docs/docs/en/guide/installation/standalone.md
index 1c3028f23879..98bb1f555a35 100644
--- a/docs/docs/en/guide/installation/standalone.md
+++ b/docs/docs/en/guide/installation/standalone.md
@@ -5,7 +5,7 @@ Standalone only for quick experience for DolphinScheduler.
If you are a new hand and want to experience DolphinScheduler functions, we recommend you install follow Standalone deployment. If you want to experience more complete functions and schedule massive tasks, we recommend you install follow [pseudo-cluster deployment](pseudo-cluster.md). If you want to deploy DolphinScheduler in production, we recommend you follow [cluster deployment](cluster.md) or [Kubernetes deployment](kubernetes.md).
> **_Note:_** Standalone only recommends the usage of fewer than 20 workflows, because it uses in-memory H2 Database in default, ZooKeeper Testing Server, too many tasks may cause instability.
-> When Standalone stops or restarts, in-memory H2 database will clear up. To use Standalone with external databases like mysql or postgresql, please see [`Database Configuration`](#database-configuration).
+> When Standalone stops or restarts, in-memory H2 database will clear up. To use Standalone with external databases like mysql or postgresql, please see [`Database Configuration`](#database-configuration).
## Preparation
diff --git a/docs/docs/en/guide/integration/rainbond.md b/docs/docs/en/guide/integration/rainbond.md
index 32b71f217d88..25d45dacbe3b 100644
--- a/docs/docs/en/guide/integration/rainbond.md
+++ b/docs/docs/en/guide/integration/rainbond.md
@@ -6,7 +6,7 @@ This section describes the one-click deployment of high availability DolphinSche
* Available Rainbond cloud native application management platform is a prerequisite,please refer to the official `Rainbond` documentation [Rainbond Quick install](https://www.rainbond.com/docs/quick-start/quick-install)
-## DolphinScheduler Cluster One-click Deployment
+## DolphinScheduler Cluster One-click Deployment
* Logging in and accessing the built-in open source app store, search the keyword `dolphinscheduler` to find the DolphinScheduler App.
@@ -14,12 +14,12 @@ This section describes the one-click deployment of high availability DolphinSche
* Click `install` on the right side of DolphinScheduler to go to the installation page. Fill in the corresponding information and click `OK` to start the installation. You will get automatically redirected to the application view.
-| Select item | Description |
-| ------------ | ------------------------------------ |
+| Select item | Description |
+|--------------|-------------------------------------|
| Team name | user workspace,Isolate by namespace |
-| Cluster name | select kubernetes cluster |
-| Select app | select application |
-| app version | select DolphinScheduler version |
+| Cluster name | select kubernetes cluster |
+| Select app | select application |
+| app version | select DolphinScheduler version |
![](../../../../img/rainbond/install-dolphinscheduler.png)
@@ -42,6 +42,7 @@ Take `worker` as an example: enter the `component -> Telescopic` page, and set t
To verify `worker` node, enter `DolphinScheduler UI -> Monitoring -> Worker` page to view detailed node information.
![](../../../../img/rainbond/monitor-dolphinscheduler.png)
+
## Configuration file
API and Worker Services share the configuration file `/opt/dolphinscheduler/conf/common.properties`. To modify the configurations, you only need to modify that of the API service.
@@ -60,5 +61,7 @@ Take `DataX` as an example:
* FILE_PATH:/opt/soft
* LOCK_PATH:/opt/soft
3. Update component, the plug-in `Datax` will be downloaded automatically and decompress to `/opt/soft`
-![](../../../../img/rainbond/plugin.png)
+ ![](../../../../img/rainbond/plugin.png)
+
---
+
diff --git a/docs/docs/en/guide/metrics/metrics.md b/docs/docs/en/guide/metrics/metrics.md
index 6e2730af6791..fa0f07bf3fa9 100644
--- a/docs/docs/en/guide/metrics/metrics.md
+++ b/docs/docs/en/guide/metrics/metrics.md
@@ -3,13 +3,13 @@
Apache DolphinScheduler exports metrics for system observability. We use [Micrometer](https://micrometer.io/) as application metrics facade.
Currently, we only support `Prometheus Exporter` but more are coming soon.
-## Quick Start
+## Quick Start
-- We enable Apache DolphinScheduler to export metrics in `standalone` mode to help users get hands dirty easily.
+- We enable Apache DolphinScheduler to export metrics in `standalone` mode to help users get hands dirty easily.
- After triggering tasks in `standalone` mode, you could access metrics list by visiting url `http://localhost:12345/dolphinscheduler/actuator/metrics`.
- After triggering tasks in `standalone` mode, you could access `prometheus-format` metrics by visiting url `http://localhost:12345/dolphinscheduler/actuator/prometheus`.
- For a better experience with `Prometheus` and `Grafana`, we have prepared the out-of-the-box `Grafana` configurations for you, you could find the `Grafana` dashboards
-at `dolphinscheduler-meter/resources/grafana` and directly import these dashboards to your `Grafana` instance.
+ at `dolphinscheduler-meter/resources/grafana` and directly import these dashboards to your `Grafana` instance.
- If you want to try with `docker`, you can use the following command to start the out-of-the-box `Prometheus` and `Grafana`:
```shell
@@ -17,12 +17,12 @@ cd dolphinscheduler-meter/src/main/resources/grafana-demo
docker compose up
```
-then access the `Grafana` by the url: `http://localhost/3001` for dashboards.
+then access the `Grafana` by the url: `http://localhost/3001` for dashboards.
![image.png](../../../../img/metrics/metrics-master.png)
![image.png](../../../../img/metrics/metrics-worker.png)
![image.png](../../../../img/metrics/metrics-datasource.png)
-
+
- If you prefer to have some experiments in `cluster` mode, please refer to the [Configuration](#configuration) section below:
## Configuration
@@ -48,7 +48,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
### Prometheus
- all dots mapped to underscores
-- metric name starting with number added with prefix `m_`
+- metric name starting with number added with prefix `m_`
- COUNTER: add `_total` suffix if not ending with it
- LONG_TASK_TIMER: `_timer_seconds` suffix added if not ending with them
- GAUGE: `_baseUnit` suffix added if not ending with it
@@ -56,7 +56,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
## Dolphin Scheduler Metrics Cheatsheet
- We categorize metrics by dolphin scheduler components such as `master server`, `worker server`, `api server` and `alert server`.
-- Although task / workflow related metrics exported by `master server` and `worker server`, we categorize them separately for users to query them more conveniently.
+- Although task / workflow related metrics exported by `master server` and `worker server`, we categorize them separately for users to query them more conveniently.
### Task Related Metrics
@@ -66,19 +66,18 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- success: the number of successful tasks
- fail: the number of failed tasks
- stop: the number of stopped tasks
- - retry: the number of retried tasks
+ - retry: the number of retried tasks
- submit: the number of submitted tasks
- failover: the number of task fail-overs
- ds.task.dispatch.count: (counter) the number of tasks dispatched to worker
- ds.task.dispatch.failure.count: (counter) the number of tasks failed to dispatch, retry failure included
- ds.task.dispatch.error.count: (counter) the number of task dispatch errors
- ds.task.execution.count.by.type: (counter) the number of task executions grouped by tag `task_type`
-- ds.task.running: (gauge) the number of running tasks
-- ds.task.prepared: (gauge) the number of tasks prepared for task queue
-- ds.task.execution.count: (counter) the number of executed tasks
+- ds.task.running: (gauge) the number of running tasks
+- ds.task.prepared: (gauge) the number of tasks prepared for task queue
+- ds.task.execution.count: (counter) the number of executed tasks
- ds.task.execution.duration: (histogram) duration of task executions
-
### Workflow Related Metrics
- ds.workflow.create.command.count: (counter) the number of commands created and inserted by workflows
@@ -88,14 +87,14 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- timeout: the number of timeout workflow instances
- finish: the number of finished workflow instances, both successes and failures included
- success: the number of successful workflow instances
- - fail: the number of failed workflow instances
- - stop: the number of stopped workflow instances
+ - fail: the number of failed workflow instances
+ - stop: the number of stopped workflow instances
- failover: the number of workflow instance fail-overs
### Master Server Metrics
- ds.master.overload.count: (counter) the number of times the master overloaded
-- ds.master.consume.command.count: (counter) the number of commands consumed by master
+- ds.master.consume.command.count: (counter) the number of commands consumed by master
- ds.master.scheduler.failover.check.count: (counter) the number of scheduler (master) fail-over checks
- ds.master.scheduler.failover.check.time: (histogram) the total time cost of scheduler (master) fail-over checks
- ds.master.quartz.job.executed: the total number of quartz jobs executed
@@ -111,7 +110,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
### Api Server Metrics
-- Currently, we have not embedded any metrics in Api Server.
+- Currently, we have not embedded any metrics in Api Server.
### Alert Server Related
@@ -124,7 +123,7 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- hikaricp.connections: the total number of connections
- hikaricp.connections.creation: connection creation time (max, count, sum included)
-- hikaricp.connections.acquire: connection acquirement time (max, count, sum included)
+- hikaricp.connections.acquire: connection acquirement time (max, count, sum included)
- hikaricp.connections.usage: connection usage time (max, count, sum included)
- hikaricp.connections.max: the max number of connections
- hikaricp.connections.min: the min number of connections
@@ -175,3 +174,4 @@ For example, you can get the master metrics by `curl http://localhost:5679/actua
- system.load.average.1m: the total number of runnable entities queued to available processors and runnable entities running on the available processors averaged over a period
- logback.events: the number of events that made it to the logs grouped by the tag `level`
- http.server.requests: total number of http requests
+
diff --git a/docs/docs/en/guide/monitor.md b/docs/docs/en/guide/monitor.md
index 327945310d49..eb8600d8b7da 100644
--- a/docs/docs/en/guide/monitor.md
+++ b/docs/docs/en/guide/monitor.md
@@ -28,16 +28,16 @@
![statistics](../../../img/new_ui/dev/monitor/statistics.png)
-| **Parameter** | **Description** |
-| ----- | ----- |
-| Number of commands wait to be executed | Statistics of the `t_ds_command` table data. |
-| The number of failed commands | Statistics of the `t_ds_error_command` table data. |
-| Number of tasks wait to run | Count the data of `task_queue` in the ZooKeeper. |
-| Number of tasks wait to be killed | Count the data of `task_kill` in the ZooKeeper. |
+| **Parameter** | **Description** |
+|----------------------------------------|----------------------------------------------------|
+| Number of commands wait to be executed | Statistics of the `t_ds_command` table data. |
+| The number of failed commands | Statistics of the `t_ds_error_command` table data. |
+| Number of tasks wait to run | Count the data of `task_queue` in the ZooKeeper. |
+| Number of tasks wait to be killed | Count the data of `task_kill` in the ZooKeeper. |
### Audit Log
The audit log provides information about who accesses the system and the operations made to the system and record related
time, which strengthen the security of the system and maintenance.
-![audit-log](../../../img/new_ui/dev/monitor/audit-log.jpg)
\ No newline at end of file
+![audit-log](../../../img/new_ui/dev/monitor/audit-log.jpg)
diff --git a/docs/docs/en/guide/parameter/built-in.md b/docs/docs/en/guide/parameter/built-in.md
index bdc7ea8f8d76..a19e0903bf94 100644
--- a/docs/docs/en/guide/parameter/built-in.md
+++ b/docs/docs/en/guide/parameter/built-in.md
@@ -2,11 +2,11 @@
## Basic Built-in Parameter
-| Variable | Declaration Method | Meaning |
-| ---- | ---- | -----------------------------|
-| system.biz.date | `${system.biz.date}` | The day before the schedule time of the daily scheduling instance, the format is `yyyyMMdd` |
-| system.biz.curdate | `${system.biz.curdate}` | The schedule time of the daily scheduling instance, the format is `yyyyMMdd` |
-| system.datetime | `${system.datetime}` | The schedule time of the daily scheduling instance, the format is `yyyyMMddHHmmss` |
+| Variable | Declaration Method | Meaning |
+|--------------------|-------------------------|---------------------------------------------------------------------------------------------|
+| system.biz.date | `${system.biz.date}` | The day before the schedule time of the daily scheduling instance, the format is `yyyyMMdd` |
+| system.biz.curdate | `${system.biz.curdate}` | The schedule time of the daily scheduling instance, the format is `yyyyMMdd` |
+| system.datetime | `${system.datetime}` | The schedule time of the daily scheduling instance, the format is `yyyyMMddHHmmss` |
## Extended Built-in Parameter
@@ -16,19 +16,19 @@
- Or define by the following two ways:
- 1. Use add_month(yyyyMMdd, offset) function to add or minus number of months.
- The first parameter of this function is [yyyyMMdd], represents the time format and the second parameter is offset, represents the number of months the user wants to add or minus.
- - Next N years:`$[add_months(yyyyMMdd,12*N)]`
- - N years before:`$[add_months(yyyyMMdd,-12*N)]`
- - Next N months:`$[add_months(yyyyMMdd,N)]`
- - N months before:`$[add_months(yyyyMMdd,-N)]`
-
- 2. Add or minus numbers directly after the time format.
- - Next N weeks:`$[yyyyMMdd+7*N]`
- - First N weeks:`$[yyyyMMdd-7*N]`
- - Next N days:`$[yyyyMMdd+N]`
- - N days before:`$[yyyyMMdd-N]`
- - Next N hours:`$[HHmmss+N/24]`
- - First N hours:`$[HHmmss-N/24]`
- - Next N minutes:`$[HHmmss+N/24/60]`
- - First N minutes:`$[HHmmss-N/24/60]`
\ No newline at end of file
+ 1. Use add_month(yyyyMMdd, offset) function to add or minus number of months.
+ The first parameter of this function is [yyyyMMdd], represents the time format and the second parameter is offset, represents the number of months the user wants to add or minus.
+ - Next N years:`$[add_months(yyyyMMdd,12*N)]`
+ - N years before:`$[add_months(yyyyMMdd,-12*N)]`
+ - Next N months:`$[add_months(yyyyMMdd,N)]`
+ - N months before:`$[add_months(yyyyMMdd,-N)]`
+ 2. Add or minus numbers directly after the time format.
+ - Next N weeks:`$[yyyyMMdd+7*N]`
+ - First N weeks:`$[yyyyMMdd-7*N]`
+ - Next N days:`$[yyyyMMdd+N]`
+ - N days before:`$[yyyyMMdd-N]`
+ - Next N hours:`$[HHmmss+N/24]`
+ - First N hours:`$[HHmmss-N/24]`
+ - Next N minutes:`$[HHmmss+N/24/60]`
+ - First N minutes:`$[HHmmss-N/24/60]`
+
diff --git a/docs/docs/en/guide/parameter/context.md b/docs/docs/en/guide/parameter/context.md
index 482a1cd8df74..2869ae3347df 100644
--- a/docs/docs/en/guide/parameter/context.md
+++ b/docs/docs/en/guide/parameter/context.md
@@ -49,7 +49,7 @@ When the SHELL task is completed, we can use the output passed upstream as the q
> Note: If the result of the SQL node has only one row, one or multiple fields, the name of the `prop` needs to be the same as the field name. The data type can choose structure except `LIST`. The parameter assigns the value according to the same column name in the SQL query result.
>
->If the result of the SQL node has multiple rows, one or more fields, the name of the `prop` needs to be the same as the field name. Choose the data type structure as `LIST`, and the SQL query result will be converted to `LIST`, and forward to convert to JSON as the parameter value.
+> If the result of the SQL node has multiple rows, one or more fields, the name of the `prop` needs to be the same as the field name. Choose the data type structure as `LIST`, and the SQL query result will be converted to `LIST`, and forward to convert to JSON as the parameter value.
#### Save the workflow and set the global parameters
diff --git a/docs/docs/en/guide/parameter/local.md b/docs/docs/en/guide/parameter/local.md
index 29a377e8e506..2dcae8d433eb 100644
--- a/docs/docs/en/guide/parameter/local.md
+++ b/docs/docs/en/guide/parameter/local.md
@@ -61,7 +61,7 @@ You could get this value in downstream task using syntax `echo '${set_val_param}
If you want to export parameters with bash variable instead of constants value, and then use them in downstream tasks,
you could use `setValue` in your task, which more flexible such as you can get variable for exists local or HTTP resource.
-You can use syntax like
+You can use syntax like
```shell
lines_num=$(wget https://raw.githubusercontent.com/apache/dolphinscheduler/dev/README.md -q -O - | wc -l | xargs)
diff --git a/docs/docs/en/guide/project/project-list.md b/docs/docs/en/guide/project/project-list.md
index 96c046981a26..fb7df9053b5f 100644
--- a/docs/docs/en/guide/project/project-list.md
+++ b/docs/docs/en/guide/project/project-list.md
@@ -1,15 +1,15 @@
-# Project
+# Project
This page describes details regarding Project screen in Apache DolphinScheduler. Here, you will see all the functions which can be handled in this screen. The following table explains commonly used terms in Apache DolphinScheduler:
-| Glossary | description |
-| ------ |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| DAG | Tasks in a workflow are assembled in form of Directed Acyclic Graph (DAG). A topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. |
-| Workflow Definition | Visualization formed by dragging task nodes and establishing task node associations (DAG). |
-| Workflow Instance | Instantiation of the workflow definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a workflow instance is generated. |
-| Workflow Relation | Shows dynamic status of all the workflows in a project. |
-| Task | Task is a discrete action in a Workflow. Apache DolphinScheduler supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT ( depends), and plans to support dynamic plug-in expansion, (SUB_PROCESS). It is also a separate process definition that can be started and executed separately. |
-| Task Instance | Instantiation of the task node in the process definition, which identifies the specific task execution status. |
+| Glossary | description |
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| DAG | Tasks in a workflow are assembled in form of Directed Acyclic Graph (DAG). A topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. |
+| Workflow Definition | Visualization formed by dragging task nodes and establishing task node associations (DAG). |
+| Workflow Instance | Instantiation of the workflow definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a workflow instance is generated. |
+| Workflow Relation | Shows dynamic status of all the workflows in a project. |
+| Task | Task is a discrete action in a Workflow. Apache DolphinScheduler supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT ( depends), and plans to support dynamic plug-in expansion, (SUB_PROCESS). It is also a separate process definition that can be started and executed separately. |
+| Task Instance | Instantiation of the task node in the process definition, which identifies the specific task execution status. |
## Project List
diff --git a/docs/docs/en/guide/project/task-definition.md b/docs/docs/en/guide/project/task-definition.md
index a08d3a3da88b..a48cc1a848ca 100644
--- a/docs/docs/en/guide/project/task-definition.md
+++ b/docs/docs/en/guide/project/task-definition.md
@@ -1,6 +1,7 @@
# Task Definition
## Batch Task Definition
+
Task definition allows to modify or operate tasks at the task level rather than modifying them in the workflow definition.
We already have workflow level task editor in [workflow definition](workflow-definition.md) which you can click the specific
workflow and then edit its task definition. It is depressing when you want to edit the task definition but do not remember
@@ -14,10 +15,11 @@ name but forget which workflow it belongs to. It is also supported query by the
`Workflow Name`
## Stream Task Definition
+
Stream task definitions are created in the workflow definition, and can be modified and executed.
![task-definition](../../../../img/new_ui/dev/project/stream-task-definition.png)
Click the execute button, check the execution parameters and click Confirm to submit the stream task.
-![task-definition](../../../../img/new_ui/dev/project/stream-task-execute.png)
\ No newline at end of file
+![task-definition](../../../../img/new_ui/dev/project/stream-task-execute.png)
diff --git a/docs/docs/en/guide/project/task-instance.md b/docs/docs/en/guide/project/task-instance.md
index 0371ed9d7925..e5532bba1ae8 100644
--- a/docs/docs/en/guide/project/task-instance.md
+++ b/docs/docs/en/guide/project/task-instance.md
@@ -1,6 +1,7 @@
# Task Instance
## Batch Task Instance
+
### Create Task Instance
Click `Project Management -> Workflow -> Task Instance` to enter the task instance page, as shown in the figure below, click the name of the workflow instance to jump to the DAG diagram of the workflow instance to view the task status.
@@ -21,3 +22,4 @@ Click the `View Log` button in the operation column to view the log of the task
- SavePoint: Click the `SavePoint` button in the operation column to do stream task savepoint.
- Stop: Click the `Stop` button in the operation column to stop the stream task.
+
diff --git a/docs/docs/en/guide/project/workflow-definition.md b/docs/docs/en/guide/project/workflow-definition.md
index f81669cbd44e..a19dc8d756b9 100644
--- a/docs/docs/en/guide/project/workflow-definition.md
+++ b/docs/docs/en/guide/project/workflow-definition.md
@@ -29,15 +29,16 @@ Drag from the toolbar , delete dependencies between tasks.
@@ -57,15 +58,15 @@ Click `Project Management -> Workflow -> Workflow Definition` to enter the workf
Workflow running parameter description:
-* **Failure strategy**: When a task node fails to execute, other parallel task nodes need to execute the strategy. "Continue" means: After a task fails, other task nodes execute normally; "End" means: Terminate all tasks being executed, and terminate the entire process.
-* **Notification strategy**: When the process ends, send process execution information notification emails according to the process status, including no status, success, failure, success or failure.
-* **Process priority**: the priority of process operation, divided into five levels: the highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), the lowest (LOWEST). When the number of master threads is insufficient, processes with higher levels will be executed first in the execution queue, and processes with the same priority will be executed in the order of first-in, first-out.
-* **Worker grouping**: This process can only be executed in the specified worker machine group. The default is Default, which can be executed on any worker.
-* **Notification Group**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or emails will be sent to all members in the notification group.
-* **Recipient**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or alarm email will be sent to the recipient list.
-* **Cc**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, the process information or alarm email will be copied to the Cc list.
-* **Startup parameters**: Set or override the value of global parameters when starting a new process instance.
-* **Complement**: There are 2 modes of serial complement and parallel complement. Serial complement: within the specified time range, perform complements in sequence from the start date to the end date, and generate N process instances in turn; parallel complement: within the specified time range, perform multiple complements at the same time, and generate N process instances at the same time .
+* **Failure strategy**: When a task node fails to execute, other parallel task nodes need to execute the strategy. "Continue" means: After a task fails, other task nodes execute normally; "End" means: Terminate all tasks being executed, and terminate the entire process.
+* **Notification strategy**: When the process ends, send process execution information notification emails according to the process status, including no status, success, failure, success or failure.
+* **Process priority**: the priority of process operation, divided into five levels: the highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), the lowest (LOWEST). When the number of master threads is insufficient, processes with higher levels will be executed first in the execution queue, and processes with the same priority will be executed in the order of first-in, first-out.
+* **Worker grouping**: This process can only be executed in the specified worker machine group. The default is Default, which can be executed on any worker.
+* **Notification Group**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or emails will be sent to all members in the notification group.
+* **Recipient**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, process information or alarm email will be sent to the recipient list.
+* **Cc**: Select Notification Policy||Timeout Alarm||When fault tolerance occurs, the process information or alarm email will be copied to the Cc list.
+* **Startup parameters**: Set or override the value of global parameters when starting a new process instance.
+* **Complement**: There are 2 modes of serial complement and parallel complement. Serial complement: within the specified time range, perform complements in sequence from the start date to the end date, and generate N process instances in turn; parallel complement: within the specified time range, perform multiple complements at the same time, and generate N process instances at the same time .
* **Complement**: Execute the workflow definition of the specified date, you can select the time range of the supplement (currently only supports the supplement for consecutive days), for example, the data from May 1st to May 10th needs to be supplemented, as shown in the following figure:
The following are the operation functions of the workflow definition list:
@@ -91,58 +92,58 @@ The following are the operation functions of the workflow definition list:
- Click the `Run` button to pop up the startup parameter setting window, as shown in the figure below, set the startup parameters, click the `Run` button in the pop-up box, the workflow starts running, and the workflow instance page generates a workflow instance.
![workflow-run](../../../../img/new_ui/dev/project/workflow-run.png)
-
- Description of workflow operating parameters:
-
- * Failure strategy: When a task node fails to execute, other parallel task nodes need to execute this strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks execution, and terminate the entire process.
- * Notification strategy: When the process is over, send the process execution result notification email according to the process status, options including no send, send if sucess, send of failure, send whatever result.
- * Process priority: The priority of process operation, divide into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high priority processes will execute first in the execution queue, and processes with the same priority will execute in the order of first in, first out.
- * Worker group: The process can only be executed in the specified worker machine group. The default is `Default`, which can execute on any worker.
- * Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process result information or email will send to all members in the notification group.
- * Recipient: select notification policy||timeout alarm||when fault tolerance occurs, process result information or alarm email will be sent to the recipient list.
- * Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list.
- * Startup parameter: Set or overwrite global parameter values when starting a new process instance.
- * Complement: refers to running the workflow definition within the specified date range and generating the corresponding workflow instance according to the complement policy. The complement policy includes two modes: **serial complement** and **parallel complement**. The date can be selected on the page or entered manually.
-
- * Serial complement: within the specified time range, complement is executed from the start date to the end date, and multiple process instances are generated in turn; Click Run workflow and select the serial complement mode: for example, from July 9 to July 10, execute in sequence, and generate two process instances in sequence on the process instance page.
-
- ![workflow-serial](../../../../img/new_ui/dev/project/workflow-serial.png)
-
- * Parallel Replenishment: within the specified time range, replenishment is performed simultaneously for multiple days, and multiple process instances are generated at the same time. Enter date manually: manually enter a date in the comma separated date format of 'yyyy MM DD hh:mm:ss'.Click Run workflow and select the parallel complement mode: for example, execute the workflow definition from July 9 to July 10 at the same time, and generate two process instances on the process instance page at the same time.
-
- ![workflow-parallel](../../../../img/new_ui/dev/project/workflow-parallel.png)
-
- * Concurrency: refers to the maximum number of instances executed in parallel in the parallel complement mode.For example, if tasks from July 6 to July 10 are executed at the same time, and the concurrency is 2, then the process instance is:
-
- ![workflow-concurrency-from](../../../../img/new_ui/dev/project/workflow-concurrency-from.png)
-
- ![workflow-concurrency](../../../../img/new_ui/dev/project/workflow-concurrency.png)
-
- * Dependency mode: whether to trigger the replenishment of workflow instances that downstream dependent nodes depend on the current workflow (the timing status of workflow instances that require the current replenishment is online, which will only trigger the replenishment of downstream directly dependent on the current workflow).
-
- ![workflow-dependency](../../../../img/new_ui/dev/project/workflow-dependency.png)
-
- * Date selection:
-
- 1. Select the date through the page:
-
- ![workflow-pageSelection](../../../../img/new_ui/dev/project/workflow-pageSelection.png)
-
- 2. Manual input:
-
- ![workflow-input](../../../../img/new_ui/dev/project/workflow-input.png)
-
- * Relationship between complement and timing configuration:
-
- 1. Unconfigured timing: When there is no timing configuration, the daily replenishment will be performed by default according to the selected time range. For example, the workflow scheduling date is July 7 to July 10. If timing is not configured, the process instance is:
-
- ![workflow-unconfiguredTimingResult](../../../../img/new_ui/dev/project/workflow-unconfiguredTimingResult.png)
-
- 2. Configured timing: If there is a timing configuration, it will be supplemented according to the selected time range in combination with the timing configuration. For example, the workflow scheduling date is July 7 to July 10, and the timing is configured (running every 5 a.m.). The process example is:
-
- ![workflow-configuredTiming](../../../../img/new_ui/dev/project/workflow-configuredTiming.png)
-
- ![workflow-configuredTimingResult](../../../../img/new_ui/dev/project/workflow-configuredTimingResult.png)
+
+Description of workflow operating parameters:
+
+* Failure strategy: When a task node fails to execute, other parallel task nodes need to execute this strategy. "Continue" means: after a certain task fails, other task nodes execute normally; "End" means: terminate all tasks execution, and terminate the entire process.
+* Notification strategy: When the process is over, send the process execution result notification email according to the process status, options including no send, send if sucess, send of failure, send whatever result.
+* Process priority: The priority of process operation, divide into five levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest (LOWEST). When the number of master threads is insufficient, high priority processes will execute first in the execution queue, and processes with the same priority will execute in the order of first in, first out.
+* Worker group: The process can only be executed in the specified worker machine group. The default is `Default`, which can execute on any worker.
+* Notification group: select notification strategy||timeout alarm||when fault tolerance occurs, process result information or email will send to all members in the notification group.
+* Recipient: select notification policy||timeout alarm||when fault tolerance occurs, process result information or alarm email will be sent to the recipient list.
+* Cc: select notification policy||timeout alarm||when fault tolerance occurs, the process result information or warning email will be copied to the CC list.
+* Startup parameter: Set or overwrite global parameter values when starting a new process instance.
+* Complement: refers to running the workflow definition within the specified date range and generating the corresponding workflow instance according to the complement policy. The complement policy includes two modes: **serial complement** and **parallel complement**. The date can be selected on the page or entered manually.
+ * Serial complement: within the specified time range, complement is executed from the start date to the end date, and multiple process instances are generated in turn; Click Run workflow and select the serial complement mode: for example, from July 9 to July 10, execute in sequence, and generate two process instances in sequence on the process instance page.
+
+ ![workflow-serial](../../../../img/new_ui/dev/project/workflow-serial.png)
+
+ * Parallel Replenishment: within the specified time range, replenishment is performed simultaneously for multiple days, and multiple process instances are generated at the same time. Enter date manually: manually enter a date in the comma separated date format of 'yyyy MM DD hh:mm:ss'.Click Run workflow and select the parallel complement mode: for example, execute the workflow definition from July 9 to July 10 at the same time, and generate two process instances on the process instance page at the same time.
+
+ ![workflow-parallel](../../../../img/new_ui/dev/project/workflow-parallel.png)
+
+ * Concurrency: refers to the maximum number of instances executed in parallel in the parallel complement mode.For example, if tasks from July 6 to July 10 are executed at the same time, and the concurrency is 2, then the process instance is:
+
+ ![workflow-concurrency-from](../../../../img/new_ui/dev/project/workflow-concurrency-from.png)
+
+ ![workflow-concurrency](../../../../img/new_ui/dev/project/workflow-concurrency.png)
+
+ * Dependency mode: whether to trigger the replenishment of workflow instances that downstream dependent nodes depend on the current workflow (the timing status of workflow instances that require the current replenishment is online, which will only trigger the replenishment of downstream directly dependent on the current workflow).
+
+ ![workflow-dependency](../../../../img/new_ui/dev/project/workflow-dependency.png)
+
+ * Date selection:
+
+ 1. Select the date through the page:
+
+ ![workflow-pageSelection](../../../../img/new_ui/dev/project/workflow-pageSelection.png)
+
+ 2. Manual input:
+
+ ![workflow-input](../../../../img/new_ui/dev/project/workflow-input.png)
+
+ * Relationship between complement and timing configuration:
+
+ 1. Unconfigured timing: When there is no timing configuration, the daily replenishment will be performed by default according to the selected time range. For example, the workflow scheduling date is July 7 to July 10. If timing is not configured, the process instance is:
+
+ ![workflow-unconfiguredTimingResult](../../../../img/new_ui/dev/project/workflow-unconfiguredTimingResult.png)
+
+ 2. Configured timing: If there is a timing configuration, it will be supplemented according to the selected time range in combination with the timing configuration. For example, the workflow scheduling date is July 7 to July 10, and the timing is configured (running every 5 a.m.). The process example is:
+
+ ![workflow-configuredTiming](../../../../img/new_ui/dev/project/workflow-configuredTiming.png)
+
+ ![workflow-configuredTimingResult](../../../../img/new_ui/dev/project/workflow-configuredTimingResult.png)
+
## Run the task alone
- Right-click the task and click the `Start` button (only online tasks can be clicked to run).
@@ -160,12 +161,15 @@ The following are the operation functions of the workflow definition list:
![workflow-time01](../../../../img/new_ui/dev/project/workflow-time01.png)
- Select a start and end time. Within the start and end time range, the workflow is run regularly; outside the start and end time range, no timed workflow instance will be generated.
+
- Add a timing that execute 5 minutes once, as shown in the following figure:
![workflow-time02](../../../../img/new_ui/dev/project/workflow-time02.png)
- Failure strategy, notification strategy, process priority, worker group, notification group, recipient, and CC are the same as workflow running parameters.
+
- Click the "Create" button to create the timing. Now the timing status is "**Offline**" and the timing needs to be **Online** to make effect.
+
- Schedule online: Click the `Timing Management` button , enter the timing management page, click the `online` button, the timing status will change to `online`, as shown in the below figure, the workflow makes effect regularly.
![workflow-time03](../../../../img/new_ui/dev/project/workflow-time03.png)
diff --git a/docs/docs/en/guide/project/workflow-instance.md b/docs/docs/en/guide/project/workflow-instance.md
index d9bffa239b42..6ef391ee6722 100644
--- a/docs/docs/en/guide/project/workflow-instance.md
+++ b/docs/docs/en/guide/project/workflow-instance.md
@@ -30,7 +30,7 @@ Double-click the task node, click `View History` to jump to the task instance pa
## View Running Parameters
-Click `Project Management -> Workflow -> Workflow Instance` to enter the workflow instance page, click the workflow name to enter the workflow DAG page;
+Click `Project Management -> Workflow -> Workflow Instance` to enter the workflow instance page, click the workflow name to enter the workflow DAG page;
Click the icon in the upper left corner to view the startup parameters of the workflow instance; click the icon to view the global parameters and local parameters of the workflow instance, as shown in the following figure:
@@ -43,15 +43,23 @@ Click `Project Management -> Workflow -> Workflow Instance`, enter the workflow
![workflow-instance](../../../../img/new_ui/dev/project/workflow-instance.png)
- **Edit:** Only processes with success/failed/stop status can be edited. Click the "Edit" button or the workflow instance name to enter the DAG edit page. After the edit, click the "Save" button to confirm, as shown in the figure below. In the pop-up box, check "Whether to update the workflow definition", after saving, the information modified by the instance will be updated to the workflow definition; if not checked, the workflow definition would not be updated.
+
+
- **Rerun:** Re-execute the terminated process
+
- **Recovery Failed:** For failed processes, you can perform failure recovery operations, starting from the failed node
+
- **Stop:** **Stop** the running process, the background code will first `kill` the worker process, and then execute `kill -9` operation
+
- **Pause:** **Pause** the running process, the system status will change to **waiting for execution**, it will wait for the task to finish, and pause the next sequence task.
+
- **Resume pause:** Resume the paused process, start running directly from the **paused node**
+
- **Delete:** Delete the workflow instance and the task instance under the workflow instance
+
- **Gantt Chart:** The vertical axis of the Gantt chart is the topological sorting of task instances of the workflow instance, and the horizontal axis is the running time of the task instances, as shown in the figure:
![instance-gantt](../../../../img/new_ui/dev/project/instance-gantt.png)
diff --git a/docs/docs/en/guide/project/workflow-relation.md b/docs/docs/en/guide/project/workflow-relation.md
index e386af38018c..e5ba10720b2a 100644
--- a/docs/docs/en/guide/project/workflow-relation.md
+++ b/docs/docs/en/guide/project/workflow-relation.md
@@ -1,3 +1,3 @@
Workflow Relation screen shows all the existing workflows in a project and their status.
-![](../../../../img/new_ui/dev/project/work-relation.png)
\ No newline at end of file
+![](../../../../img/new_ui/dev/project/work-relation.png)
diff --git a/docs/docs/en/guide/resource/file-manage.md b/docs/docs/en/guide/resource/file-manage.md
index 53a737166d41..992fbb2ed40d 100644
--- a/docs/docs/en/guide/resource/file-manage.md
+++ b/docs/docs/en/guide/resource/file-manage.md
@@ -6,11 +6,11 @@ When the third-party jar needs to be used in the scheduling process or the user
> **_Note:_**
>
-> * When you manage files as `admin`, remember to set up `tenant` for `admin` first.
+> * When you manage files as `admin`, remember to set up `tenant` for `admin` first.
## Basic Operations
-### Create File
+### Create File
The file format supports the following types: txt, log, sh, conf, cfg, py, java, sql, xml, hql, properties.
@@ -65,6 +65,7 @@ In the workflow definition module of project Manage, create a new workflow using
- Script: 'sh hello.sh'
- Resource: Select 'hello.sh'
+
> Notice: When using a resource file in the script, the file name needs to be the same as the full path of the selected resource:
> For example: if the resource path is `/resource/hello.sh`, you need to use the full path of `/resource/hello.sh` to use it in the script.
diff --git a/docs/docs/en/guide/resource/intro.md b/docs/docs/en/guide/resource/intro.md
index f4d70c2a3b0b..d067da673532 100644
--- a/docs/docs/en/guide/resource/intro.md
+++ b/docs/docs/en/guide/resource/intro.md
@@ -2,4 +2,4 @@
The Resource Center is typically used for uploading files, UDF functions, and task group management. For a stand-alone
environment, you can select the local file directory as the upload folder (**this operation does not require Hadoop or HDFS deployment**).
-Of course, you can also choose to upload to Hadoop or MinIO cluster. In this case, you need to have Hadoop (2.6+) or MinIOn and other related environments.
\ No newline at end of file
+Of course, you can also choose to upload to Hadoop or MinIO cluster. In this case, you need to have Hadoop (2.6+) or MinIOn and other related environments.
diff --git a/docs/docs/en/guide/resource/task-group.md b/docs/docs/en/guide/resource/task-group.md
index b8f62f0757c9..87e04b4cb0f8 100644
--- a/docs/docs/en/guide/resource/task-group.md
+++ b/docs/docs/en/guide/resource/task-group.md
@@ -1,16 +1,16 @@
# Task Group Settings
-The task group is mainly used to control the concurrency of task instances, and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
+The task group is mainly used to control the concurrency of task instances, and is designed to control the pressure of other resources (it can also control the pressure of the Hadoop cluster, the cluster will have queue control it). When creating a new task definition, you can configure the corresponding task group and configure the priority of the task running in the task group.
-## Task Group Configuration
+## Task Group Configuration
-### Create Task Group
+### Create Task Group
![create-taskGroup](../../../../img/new_ui/dev/resource/create-taskGroup.png)
-The user clicks `Resources -> Task Group Management -> Task Group option -> Create Task Group`
+The user clicks `Resources -> Task Group Management -> Task Group option -> Create Task Group`
-![create-taskGroup](../../../../img/new_ui/dev/resource/create-taskGroup.png)
+![create-taskGroup](../../../../img/new_ui/dev/resource/create-taskGroup.png)
You need to enter the information inside the picture:
@@ -18,39 +18,39 @@ You need to enter the information inside the picture:
- **Project name**: The project that the task group functions, this item is optional, if not selected, all the projects in the whole system can use this task group.
- **Resource pool size**: The maximum number of concurrent task instances allowed.
-### View Task Group Queue
+### View Task Group Queue
-![view-queue](../../../../img/new_ui/dev/resource/view-queue.png)
+![view-queue](../../../../img/new_ui/dev/resource/view-queue.png)
Click the button to view task group usage information:
-![view-queue](../../../../img/new_ui/dev/resource/view-groupQueue.png)
+![view-queue](../../../../img/new_ui/dev/resource/view-groupQueue.png)
-### Use of Task Groups
+### Use of Task Groups
**Note**: The use of task groups is applicable to tasks executed by workers, such as `switch` nodes, `condition` nodes, `sub_process` and other node types executed by the master are not controlled by the task group.
Let's take the shell node as an example:
-![use-queue](../../../../img/new_ui/dev/resource/use-queue.png)
+![use-queue](../../../../img/new_ui/dev/resource/use-queue.png)
Regarding the configuration of the task group, all you need to do is to configure these parts in the red box:
- Task group name: The task group name is displayed on the task group configuration page. Here you can only see the task group that the project has permission to access (the project is selected when creating a task group) or the task group that scope globally (no project is selected when creating a task group).
-- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
+- Priority: When there is a waiting resource, the task with high priority will be distributed to the worker by the master first. The larger the value of this part, the higher the priority.
-## Implementation Logic of Task Group
+## Implementation Logic of Task Group
### Get Task Group Resources
-The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
+The master judges whether the task is configured with a task group when distributing the task. If the task is not configured, it is normally thrown to the worker to run; if a task group is configured, it checks whether the remaining size of the task group resource pool meets the current task operation before throwing it to the worker for execution. , if the resource pool -1 is satisfied, continue to run; if not, exit the task distribution and wait for other tasks to wake up.
### Release and Wake Up
-When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
+When the task that has occupied the task group resource is finished, the task group resource will be released. After the release, it will check whether there is a task waiting in the current task group. If there is, mark the task with the best priority to run, and create a new executable event. The event stores the task ID that is marked to acquire the resource, and then the task obtains the task group resource and run.
-#### Task Group Flowchart
+#### Task Group Flowchart
![task_group](../../../../img/task_group_process.png)
-
+
diff --git a/docs/docs/en/guide/security.md b/docs/docs/en/guide/security.md
index b9484a1e7214..eaeb12d8a006 100644
--- a/docs/docs/en/guide/security.md
+++ b/docs/docs/en/guide/security.md
@@ -138,8 +138,8 @@ worker:
......
```
-- You can add new worker groups for the workers during runtime regardless of the configurations in `application.yaml` as below:
-`Security Center` -> `Worker Group Manage` -> `Create Worker Group` -> fill in `Group Name` and `Worker Addresses` -> click `confirm`.
+- You can add new worker groups for the workers during runtime regardless of the configurations in `application.yaml` as below:
+ `Security Center` -> `Worker Group Manage` -> `Create Worker Group` -> fill in `Group Name` and `Worker Addresses` -> click `confirm`.
## Environmental Management
@@ -164,10 +164,10 @@ Create a task node in the workflow definition, select the worker group and the e
## Cluster Management
> Add or update cluster
-- Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
-
+> - Each process can be related to zero or several clusters to support multiple environment, now just support k8s.
+>
> Usage cluster
-- After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
+> - After creation and authorization, k8s namespaces and processes will associate clusters. Each cluster will have separate workflows and task instances running independently.
![create-cluster](../../../img/new_ui/dev/security/create-cluster.png)
@@ -183,4 +183,3 @@ Create a task node in the workflow definition, select the worker group and the e
![create-environment](../../../img/new_ui/dev/security/create-namespace.png)
-
diff --git a/docs/docs/en/guide/start/docker.md b/docs/docs/en/guide/start/docker.md
index b09d63b0f412..5120382f83e8 100644
--- a/docs/docs/en/guide/start/docker.md
+++ b/docs/docs/en/guide/start/docker.md
@@ -42,8 +42,8 @@ modify docker-compose's free memory up to 4 GB.
- Mac:Click `Docker Desktop -> Preferences -> Resources -> Memory` modified it
- Windows Docker Desktop:
- - Hyper-V mode: Click `Docker Desktop -> Settings -> Resources -> Memory` modified it
- - WSL 2 mode: see [WSL 2 utility VM](https://docs.microsoft.com/zh-cn/windows/wsl/wsl-config#configure-global-options-with-wslconfig) for more detail.
+ - Hyper-V mode: Click `Docker Desktop -> Settings -> Resources -> Memory` modified it
+ - WSL 2 mode: see [WSL 2 utility VM](https://docs.microsoft.com/zh-cn/windows/wsl/wsl-config#configure-global-options-with-wslconfig) for more detail.
After complete the configuration, we can get the `docker-compose.yaml` file from [download page](/en-us/download/download.html)
form its source package, and make sure you get the right version. After download the package, you can run the commands as below.
@@ -71,7 +71,6 @@ $ docker-compose --profile all up -d
[Using docker-compose to start server](#using-docker-compose-to-start-server) will create new a database and the ZooKeeper
container when it up. You could start DolphinScheduler server separately if you want to reuse your exists services.
-
```shell
$ DOLPHINSCHEDULER_VERSION=
# Initialize the database, make sure database already exists
diff --git a/docs/docs/en/guide/start/quick-start.md b/docs/docs/en/guide/start/quick-start.md
index ff6c19749f42..0549f12152d5 100644
--- a/docs/docs/en/guide/start/quick-start.md
+++ b/docs/docs/en/guide/start/quick-start.md
@@ -42,7 +42,7 @@ This is a Quick Start guide to help you get a basic idea of working with Apache
![create-environment](../../../../img/new_ui/dev/quick-start/create-environment.png)
## Create a token
-
+
![create-token](../../../../img/new_ui/dev/quick-start/create-token.png)
## Login with regular users
diff --git a/docs/docs/en/guide/task/java.md b/docs/docs/en/guide/task/java.md
new file mode 100644
index 000000000000..c0a1a1cd3539
--- /dev/null
+++ b/docs/docs/en/guide/task/java.md
@@ -0,0 +1,48 @@
+# Overview
+
+This node is for executing java-type tasks and supports using files and jar packages as program entries.
+
+# Create Tasks
+
+- Click on `Project Management` -> `Project Name` -> `Workflow Definition`, click on the “Create workflow” button, go to the DAG edit page:
+
+- Drag the toolbar's Java task node to the palette.
+
+# Task Parameters
+
+| **Parameter** | **Description** |
+|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Node Name | The name of the set task. The node name in a workflow definition is unique. |
+| Run Flag | Indicates whether the node is scheduled properly and turns on the kill switch, if not needed. |
+| Description | Describes the functionality of the node. |
+| Task Priority | When the number of worker threads is insufficient, the worker executes tasks according to the priority. When the priority is the same, the worker executes tasks by order. |
+| Worker Group | The group of machines who execute the tasks. If selecting `Default`, DolphinScheduler will randomly choose a worker machine to execute the task. |
+| Environment Name | Configure the environment in which the task runs. |
+| Number Of Failed Retries | Number of resubmitted tasks that failed. You can choose the number in the drop-down menu or fill it manually. |
+| Failed Retry Interval | the interval between the failure and resubmission of a task. You can choose the number in the drop-down menu or fill it manually. |
+| Delayed Execution Time | the amount of time a task is delayed, in units. |
+| Timeout Alarm | Check timeout warning, timeout failure, when the task exceeds the“Timeout length”, send a warning message and the task execution fails. |
+| Module Path | pick Java 9 + 's modularity feature, put all resources into-module-path, and require that the JDK version in your worker supports modularity. |
+| Main Parameter | Java program main method entry parameter. |
+| Java VM Parameters | JVM startup parameters. |
+| Script | You need to write Java code if you use the Java run type. The public class must exist in the code without writing a package statement. |
+| Resources | External JAR packages or other resource files that are added to the classpath or module path and can be easily retrieved in your JAVA script. |
+| Custom parameter | A user-defined parameter that is part of HTTP and replaces `${ variable }` in the script . |
+| Pre Tasks | Selects a pre-task for the current task and sets the pre-task as the upstream of the current task. |
+
+## Example
+
+Java type tasks have two modes of execution, here is a demonstration of executing tasks in Java mode.
+
+The main configuration parameters are as follows:
+- Run Type
+- Module Path
+- Main Parameters
+- Java VM Parameters
+- Script
+
+![java_task](../../../../img/tasks/demo/java_task02.png)
+
+## Note
+
+When you run the task in JAVA execution mode, the public class must exist in the code, and you could omit writing a package statement.
diff --git a/docs/docs/en/guide/upgrade/incompatible.md b/docs/docs/en/guide/upgrade/incompatible.md
index d1043983c994..fcdd7dd19922 100644
--- a/docs/docs/en/guide/upgrade/incompatible.md
+++ b/docs/docs/en/guide/upgrade/incompatible.md
@@ -1,9 +1,10 @@
# Incompatible
-This document records the incompatible updates between each version. You need to check this document before you upgrade to related version.
+This document records the incompatible updates between each version. You need to check this document before you upgrade to related version.
## dev
## 3.0.0
-* Copy and import workflow without 'copy' suffix [#10607](https://github.com/apache/dolphinscheduler/pull/10607)
\ No newline at end of file
+* Copy and import workflow without 'copy' suffix [#10607](https://github.com/apache/dolphinscheduler/pull/10607)
+
diff --git a/docs/docs/en/guide/upgrade/upgrade.md b/docs/docs/en/guide/upgrade/upgrade.md
index f1a518e64402..b2b302117ae6 100644
--- a/docs/docs/en/guide/upgrade/upgrade.md
+++ b/docs/docs/en/guide/upgrade/upgrade.md
@@ -28,13 +28,13 @@ Change configuration in `./bin/env/dolphinscheduler_env.sh` ({user} and {passwor
Using MySQL as an example, change the value if you use other databases. Please manually download the [mysql-connector-java driver jar](https://downloads.MySQL.com/archives/c-j/)
jar package and add it to the `./tools/libs` directory, then change `./bin/ env/dolphinscheduler_env.sh` file
- ```shell
- export DATABASE=${DATABASE:-mysql}
- export SPRING_PROFILES_ACTIVE=${DATABASE}
- export SPRING_DATASOURCE_URL="jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"
- export SPRING_DATASOURCE_USERNAME={user}
- export SPRING_DATASOURCE_PASSWORD={password}
- ```
+ ```shell
+ export DATABASE=${DATABASE:-mysql}
+ export SPRING_PROFILES_ACTIVE=${DATABASE}
+ export SPRING_DATASOURCE_URL="jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"
+ export SPRING_DATASOURCE_USERNAME={user}
+ export SPRING_DATASOURCE_PASSWORD={password}
+ ```
Execute database upgrade script: `sh ./tools/bin/upgrade-schema.sh`
@@ -45,7 +45,7 @@ Execute database upgrade script: `sh ./tools/bin/upgrade-schema.sh`
- If you deploy with Pseudo-Cluster deployment, change it according to [Pseudo-Cluster](../installation/pseudo-cluster.md) section "Modify Configuration".
- If you deploy with Cluster deployment, change it according to [Cluster](../installation/cluster.md) section "Modify Configuration".
-And them run command `sh ./bin/start-all.sh` to start all services.
+And them run command `sh ./bin/start-all.sh` to start all services.
## Notice
@@ -54,26 +54,26 @@ And them run command `sh ./bin/start-all.sh` to start all services.
The architecture of worker group is different between version before version 1.3.1 until version 2.0.0
- Before version 1.3.1(include itself) worker group can be created through UI interface.
-- Since version 1.3.1 and before version 2.0.0, worker group can be created by modifying the worker configuration.
+- Since version 1.3.1 and before version 2.0.0, worker group can be created by modifying the worker configuration.
#### How Can I Do When I Upgrade from 1.3.1 to version before 2.0.0
* Check the backup database, search records in table `t_ds_worker_group` table and mainly focus on three columns: `id, name and IP`.
-| id | name | ip_list |
-| :--- | :---: | ---: |
-| 1 | service1 | 192.168.xx.10 |
-| 2 | service2 | 192.168.xx.11,192.168.xx.12 |
+| id | name | ip_list |
+|:---|:--------:|----------------------------:|
+| 1 | service1 | 192.168.xx.10 |
+| 2 | service2 | 192.168.xx.11,192.168.xx.12 |
* Modify worker related configuration in `bin/env/install_config.conf`.
Assume bellow are the machine worker service to be deployed:
-| hostname | ip |
-| :--- | :---: |
-| ds1 | 192.168.xx.10 |
-| ds2 | 192.168.xx.11 |
-| ds3 | 192.168.xx.12 |
+| hostname | ip |
+|:---------|:-------------:|
+| ds1 | 192.168.xx.10 |
+| ds2 | 192.168.xx.11 |
+| ds3 | 192.168.xx.12 |
To keep worker group config consistent with the previous version, we need to modify workers configuration as below:
@@ -84,7 +84,7 @@ workers="ds1:service1,ds2:service2,ds3:service2"
#### The Worker Group has Been Enhanced in Version 1.3.2
-Workers in 1.3.1 can only belong to one worker group, but after version 1.3.2 and before version 2.0.0 worker support more than one worker group.
+Workers in 1.3.1 can only belong to one worker group, but after version 1.3.2 and before version 2.0.0 worker support more than one worker group.
```sh
workers="ds1:service1,ds1:service2"
diff --git a/docs/docs/en/history-versions.md b/docs/docs/en/history-versions.md
index f879431286f2..2ece5358cd7b 100644
--- a/docs/docs/en/history-versions.md
+++ b/docs/docs/en/history-versions.md
@@ -79,3 +79,4 @@
### Versions:Dev
#### Links:[Dev Document](../dev/user_doc/about/introduction.md)
+
diff --git a/docs/docs/zh/DSIP.md b/docs/docs/zh/DSIP.md
index 520ffc11b47e..a24a9041593e 100644
--- a/docs/docs/zh/DSIP.md
+++ b/docs/docs/zh/DSIP.md
@@ -52,11 +52,11 @@ integer in [All DSIPs][all-DSIPs] issues.
```text
Hi community,
-
+
-
+
I already add a GitHub Issue for my proposal, which you could see in .
-
+
Looking forward any feedback for this thread.
```
@@ -83,3 +83,4 @@ integer in [All DSIPs][all-DSIPs] issues.
[github-issue-choose]: https://github.com/apache/dolphinscheduler/issues/new/choose
[mail-to-dev]: mailto:dev@dolphinscheduler.apache.org
[DSIP-1]: https://github.com/apache/dolphinscheduler/issues/6407
+
diff --git a/docs/docs/zh/about/features.md b/docs/docs/zh/about/features.md
index 1348a545809f..25aa47915eb4 100644
--- a/docs/docs/zh/about/features.md
+++ b/docs/docs/zh/about/features.md
@@ -17,3 +17,4 @@
## High Scalability
- **高扩展性**: 支持多租户和在线资源管理。支持每天10万个数据任务的稳定运行。
+
diff --git a/docs/docs/zh/about/glossary.md b/docs/docs/zh/about/glossary.md
index 2b9f967661aa..7642a4a4c1d3 100644
--- a/docs/docs/zh/about/glossary.md
+++ b/docs/docs/zh/about/glossary.md
@@ -50,4 +50,3 @@
- dolphinscheduler-ui 前端模块
-
diff --git a/docs/docs/zh/about/hardware.md b/docs/docs/zh/about/hardware.md
index 1ec2f2477519..ce5d3269e6ce 100644
--- a/docs/docs/zh/about/hardware.md
+++ b/docs/docs/zh/about/hardware.md
@@ -4,39 +4,39 @@ DolphinScheduler 作为一款开源分布式工作流任务调度系统,可以
## 1. Linux 操作系统版本要求
-| 操作系统 | 版本 |
-| :----------------------- | :----------: |
-| Red Hat Enterprise Linux | 7.0 及以上 |
-| CentOS | 7.0 及以上 |
-| Oracle Enterprise Linux | 7.0 及以上 |
+| 操作系统 | 版本 |
+|:-------------------------|:---------:|
+| Red Hat Enterprise Linux | 7.0 及以上 |
+| CentOS | 7.0 及以上 |
+| Oracle Enterprise Linux | 7.0 及以上 |
| Ubuntu LTS | 16.04 及以上 |
> **注意:**
->以上 Linux 操作系统可运行在物理服务器以及 VMware、KVM、XEN 主流虚拟化环境上
+> 以上 Linux 操作系统可运行在物理服务器以及 VMware、KVM、XEN 主流虚拟化环境上
## 2. 服务器建议配置
+
DolphinScheduler 支持运行在 Intel x86-64 架构的 64 位通用硬件服务器平台。对生产环境的服务器硬件配置有以下建议:
+
### 生产环境
| **CPU** | **内存** | **硬盘类型** | **网络** | **实例数量** |
-| --- | --- | --- | --- | --- |
-| 4核+ | 8 GB+ | SAS | 千兆网卡 | 1+ |
+|---------|--------|----------|--------|----------|
+| 4核+ | 8 GB+ | SAS | 千兆网卡 | 1+ |
> **注意:**
> - 以上建议配置为部署 DolphinScheduler 的最低配置,生产环境强烈推荐使用更高的配置
> - 硬盘大小配置建议 50GB+ ,系统盘和数据盘分开
-
## 3. 网络要求
DolphinScheduler正常运行提供如下的网络端口配置:
-| 组件 | 默认端口 | 说明 |
-| --- | --- | --- |
-| MasterServer | 5678 | 非通信端口,只需本机端口不冲突即可 |
-| WorkerServer | 1234 | 非通信端口,只需本机端口不冲突即可 |
-| ApiApplicationServer | 12345 | 提供后端通信端口 |
-
+| 组件 | 默认端口 | 说明 |
+|----------------------|-------|-------------------|
+| MasterServer | 5678 | 非通信端口,只需本机端口不冲突即可 |
+| WorkerServer | 1234 | 非通信端口,只需本机端口不冲突即可 |
+| ApiApplicationServer | 12345 | 提供后端通信端口 |
> **注意:**
> - MasterServer 和 WorkerServer 不需要开启网络间通信,只需本机端口不冲突即可
@@ -44,4 +44,4 @@ DolphinScheduler正常运行提供如下的网络端口配置:
## 4. 客户端 Web 浏览器要求
-DolphinScheduler 推荐 Chrome 以及使用 Chromium 内核的较新版本浏览器访问前端可视化操作界面
\ No newline at end of file
+DolphinScheduler 推荐 Chrome 以及使用 Chromium 内核的较新版本浏览器访问前端可视化操作界面
diff --git a/docs/docs/zh/about/introduction.md b/docs/docs/zh/about/introduction.md
index f4e9ab0ddd0c..250f72e82dec 100644
--- a/docs/docs/zh/about/introduction.md
+++ b/docs/docs/zh/about/introduction.md
@@ -5,4 +5,4 @@ Apache DolphinScheduler 是一个分布式易扩展的可视化DAG工作流任
Apache DolphinScheduler 旨在解决复杂的大数据任务依赖关系,并为应用程序提供数据和各种 OPS 编排中的关系。 解决数据研发ETL依赖错综复杂,无法监控任务健康状态的问题。
DolphinScheduler 以 DAG(Directed Acyclic Graph,DAG)流式方式组装任务,可以及时监控任务的执行状态,支持重试、指定节点恢复失败、暂停、恢复、终止任务等操作。
-![Apache DolphinScheduler](../../../img/introduction_ui.png)
\ No newline at end of file
+![Apache DolphinScheduler](../../../img/introduction_ui.png)
diff --git a/docs/docs/zh/architecture/cache.md b/docs/docs/zh/architecture/cache.md
index e5a55842c46e..6926eddfa107 100644
--- a/docs/docs/zh/architecture/cache.md
+++ b/docs/docs/zh/architecture/cache.md
@@ -39,4 +39,4 @@ spring:
时序图如下图所示:
-
\ No newline at end of file
+
diff --git a/docs/docs/zh/architecture/configuration.md b/docs/docs/zh/architecture/configuration.md
index fb985a96d38c..be822c89a25c 100644
--- a/docs/docs/zh/architecture/configuration.md
+++ b/docs/docs/zh/architecture/configuration.md
@@ -1,9 +1,11 @@
# 前言
+
本文档为dolphinscheduler配置文件说明文档。
# 目录结构
+
DolphinScheduler的目录结构如下:
```
@@ -98,11 +100,13 @@ DolphinScheduler的目录结构如下:
# 配置文件详解
## dolphinscheduler-daemon.sh [启动/关闭DolphinScheduler服务脚本]
+
dolphinscheduler-daemon.sh脚本负责DolphinScheduler的启动&关闭.
start-all.sh/stop-all.sh最终也是通过dolphinscheduler-daemon.sh对集群进行启动/关闭操作.
目前DolphinScheduler只是做了一个基本的设置,JVM参数请根据各自资源的实际情况自行设置.
默认简化参数如下:
+
```bash
export DOLPHINSCHEDULER_OPTS="
-server
@@ -120,6 +124,7 @@ export DOLPHINSCHEDULER_OPTS="
> 不建议设置"-XX:DisableExplicitGC" , DolphinScheduler使用Netty进行通讯,设置该参数,可能会导致内存泄漏.
## 数据库连接相关配置
+
在DolphinScheduler中使用Spring Hikari对数据库连接进行管理,配置文件位置:
|服务名称| 配置文件 |
@@ -149,8 +154,8 @@ export DOLPHINSCHEDULER_OPTS="
DolphinScheduler同样可以通过`bin/env/dolphinscheduler_env.sh`进行数据库连接相关的配置。
-
## Zookeeper相关配置
+
DolphinScheduler使用Zookeeper进行集群管理、容错、事件监听等功能,配置文件位置:
|服务名称| 配置文件 |
|--|--|
@@ -175,6 +180,7 @@ DolphinScheduler使用Zookeeper进行集群管理、容错、事件监听等功
DolphinScheduler同样可以通过`bin/env/dolphinscheduler_env.sh`进行Zookeeper相关的配置。
## common.properties [hadoop、s3、yarn配置]
+
common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置,配置文件位置:
|服务名称| 配置文件 |
|--|--|
@@ -217,6 +223,7 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|zeppelin.rest.url | http://localhost:8080 | zeppelin RESTful API 接口地址|
## Api-server相关配置
+
位置:`api-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
@@ -245,6 +252,7 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|traffic.control.customize-tenant-qps-rate||自定义租户最大请求数/秒限制|
## Master Server相关配置
+
位置:`master-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
@@ -266,6 +274,7 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|master.registry-disconnect-strategy.max-waiting-time|100s|当Master与注册中心失联之后重连时间, 之后当strategy为waiting时,该值生效。 该值表示当Master与注册中心失联时会在给定时间之内进行重连, 在给定时间之内重连失败将会停止自己,在重连时,Master会丢弃目前正在执行的工作流,值为0表示会无限期等待 |
## Worker Server相关配置
+
位置:`worker-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
@@ -282,16 +291,16 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|worker.registry-disconnect-strategy.strategy|stop|当Worker与注册中心失联之后采取的策略, 默认值是: stop. 可选值包括: stop, waiting|
|worker.registry-disconnect-strategy.max-waiting-time|100s|当Worker与注册中心失联之后重连时间, 之后当strategy为waiting时,该值生效。 该值表示当Worker与注册中心失联时会在给定时间之内进行重连, 在给定时间之内重连失败将会停止自己,在重连时,Worker会丢弃kill正在执行的任务。值为0表示会无限期等待 |
-
## Alert Server相关配置
+
位置:`alert-server/conf/application.yaml`
|参数 |默认值| 描述|
|--|--|--|
|server.port|50053|Alert Server监听端口|
|alert.port|50052|alert监听端口|
-
## Quartz相关配置
+
这里面主要是quartz配置,请结合实际业务场景&资源进行配置,本文暂时不做展开,配置文件位置:
|服务名称| 配置文件 |
@@ -319,7 +328,6 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn相关的配置
|spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate|
|spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000|
-
## dolphinscheduler_env.sh [环境变量配置]
通过类似shell方式提交任务的的时候,会加载该配置文件中的环境变量到主机中。涉及到的 `JAVA_HOME`、元数据库、注册中心和任务类型配置,其中任务类型主要有: Shell任务、Python任务、Spark任务、Flink任务、Datax任务等等。
@@ -358,6 +366,7 @@ export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:
```
## 日志相关配置
+
|服务名称| 配置文件 |
|--|--|
|Master Server | `master-server/conf/logback-spring.xml`|
diff --git a/docs/docs/zh/architecture/design.md b/docs/docs/zh/architecture/design.md
index c8910e642aab..f3368a7609f6 100644
--- a/docs/docs/zh/architecture/design.md
+++ b/docs/docs/zh/architecture/design.md
@@ -3,6 +3,7 @@
## 系统架构
### 系统架构图
+