From 0d16d7b323589fc2fcf049a55d702b80d3b39f21 Mon Sep 17 00:00:00 2001 From: Eric Gao Date: Thu, 22 Sep 2022 14:11:43 +0800 Subject: [PATCH] [Cherry-Pick][3.1.0-Prepare] Cherry pick docs format fix for 3.1.0 (#12095) * Cherry pick docs format fix for 3.1.0 * Fix doc dead link * Fix sphinx dead link --- .github/PULL_REQUEST_TEMPLATE.md | 10 +- CONTRIBUTING.md | 12 +- README.md | 49 ++- README_zh_CN.md | 31 +- deploy/README.md | 1 + docs/docs/en/DSIP.md | 7 +- docs/docs/en/about/features.md | 3 +- docs/docs/en/about/glossary.md | 1 - docs/docs/en/about/hardware.md | 28 +- docs/docs/en/about/introduction.md | 2 +- docs/docs/en/architecture/cache.md | 2 +- docs/docs/en/architecture/configuration.md | 13 +- docs/docs/en/architecture/design.md | 93 +++-- docs/docs/en/architecture/load-balance.md | 1 + docs/docs/en/architecture/metadata.md | 9 +- docs/docs/en/architecture/task-structure.md | 288 +++++++-------- docs/docs/en/contribute/api-standard.md | 25 +- docs/docs/en/contribute/api-test.md | 4 +- .../docs/en/contribute/architecture-design.md | 80 ++--- .../backend/mechanism/global-parameter.md | 1 + .../contribute/backend/mechanism/overview.md | 2 +- .../backend/mechanism/task/switch.md | 1 + docs/docs/en/contribute/backend/spi/alert.md | 27 +- .../en/contribute/backend/spi/datasource.md | 2 +- .../en/contribute/backend/spi/registry.md | 5 +- docs/docs/en/contribute/backend/spi/task.md | 2 +- docs/docs/en/contribute/e2e-test.md | 72 ++-- .../en/contribute/frontend-development.md | 77 +++- docs/docs/en/contribute/have-questions.md | 3 +- docs/docs/en/contribute/join/DS-License.md | 2 +- .../en/contribute/join/become-a-committer.md | 2 +- docs/docs/en/contribute/join/code-conduct.md | 109 +++--- docs/docs/en/contribute/join/contribute.md | 6 +- docs/docs/en/contribute/join/document.md | 6 +- docs/docs/en/contribute/join/issue.md | 9 +- docs/docs/en/contribute/join/pull-request.md | 8 +- docs/docs/en/contribute/join/review.md | 81 ++--- docs/docs/en/contribute/join/submit-code.md | 31 +- docs/docs/en/contribute/join/subscribe.md | 1 + docs/docs/en/contribute/join/unit-test.md | 55 +-- docs/docs/en/contribute/log-specification.md | 5 +- .../en/contribute/release/release-prepare.md | 7 +- docs/docs/en/contribute/release/release.md | 24 +- .../en/guide/alert/alert_plugin_user_guide.md | 4 +- docs/docs/en/guide/alert/dingtalk.md | 27 +- docs/docs/en/guide/alert/email.md | 5 +- .../en/guide/alert/enterprise-webexteams.md | 17 +- docs/docs/en/guide/alert/enterprise-wechat.md | 4 +- docs/docs/en/guide/alert/feishu.md | 1 + docs/docs/en/guide/alert/http.md | 16 +- docs/docs/en/guide/alert/script.md | 14 +- docs/docs/en/guide/alert/telegram.md | 25 +- docs/docs/en/guide/data-quality.md | 312 +++++++++-------- docs/docs/en/guide/datasource/athena.md | 19 +- docs/docs/en/guide/datasource/clickhouse.md | 22 +- docs/docs/en/guide/datasource/db2.md | 22 +- docs/docs/en/guide/datasource/hive.md | 28 +- docs/docs/en/guide/datasource/mysql.md | 20 +- docs/docs/en/guide/datasource/oracle.md | 22 +- docs/docs/en/guide/datasource/postgresql.md | 22 +- docs/docs/en/guide/datasource/presto.md | 23 +- docs/docs/en/guide/datasource/redshift.md | 22 +- docs/docs/en/guide/datasource/spark.md | 22 +- docs/docs/en/guide/datasource/sqlserver.md | 22 +- docs/docs/en/guide/expansion-reduction.md | 124 ++++--- docs/docs/en/guide/healthcheck.md | 1 + .../docs/en/guide/howto/datasource-setting.md | 11 +- docs/docs/en/guide/howto/general-setting.md | 2 +- docs/docs/en/guide/installation/cluster.md | 2 +- .../en/guide/installation/pseudo-cluster.md | 7 +- docs/docs/en/guide/installation/standalone.md | 2 +- docs/docs/en/guide/integration/rainbond.md | 17 +- docs/docs/en/guide/metrics/metrics.md | 34 +- docs/docs/en/guide/monitor.md | 14 +- docs/docs/en/guide/parameter/built-in.md | 42 +-- docs/docs/en/guide/parameter/context.md | 2 +- docs/docs/en/guide/parameter/local.md | 2 +- docs/docs/en/guide/project/project-list.md | 18 +- docs/docs/en/guide/project/task-definition.md | 4 +- docs/docs/en/guide/project/task-instance.md | 2 + .../en/guide/project/workflow-definition.md | 130 +++---- .../en/guide/project/workflow-instance.md | 10 +- .../en/guide/project/workflow-relation.md | 2 +- docs/docs/en/guide/resource/file-manage.md | 5 +- docs/docs/en/guide/resource/intro.md | 2 +- docs/docs/en/guide/resource/task-group.md | 32 +- docs/docs/en/guide/security.md | 11 +- docs/docs/en/guide/start/docker.md | 5 +- docs/docs/en/guide/start/quick-start.md | 2 +- docs/docs/en/guide/task/java.md | 48 +++ docs/docs/en/guide/upgrade/incompatible.md | 5 +- docs/docs/en/guide/upgrade/upgrade.md | 38 +- docs/docs/en/history-versions.md | 1 + docs/docs/zh/DSIP.md | 7 +- docs/docs/zh/about/features.md | 1 + docs/docs/zh/about/glossary.md | 1 - docs/docs/zh/about/hardware.md | 32 +- docs/docs/zh/about/introduction.md | 2 +- docs/docs/zh/architecture/cache.md | 2 +- docs/docs/zh/architecture/configuration.md | 17 +- docs/docs/zh/architecture/design.md | 109 +++--- docs/docs/zh/architecture/load-balance.md | 2 + docs/docs/zh/architecture/metadata.md | 6 +- docs/docs/zh/architecture/task-structure.md | 329 +++++++++--------- docs/docs/zh/contribute/api-standard.md | 26 +- docs/docs/zh/contribute/api-test.md | 4 +- .../docs/zh/contribute/architecture-design.md | 191 +++++----- .../contribute/backend/mechanism/overview.md | 2 +- .../backend/mechanism/task/switch.md | 1 + docs/docs/zh/contribute/backend/spi/alert.md | 9 +- .../zh/contribute/backend/spi/registry.md | 6 +- docs/docs/zh/contribute/e2e-test.md | 67 ++-- .../zh/contribute/frontend-development.md | 93 ++++- docs/docs/zh/contribute/have-questions.md | 1 + docs/docs/zh/contribute/join/DS-License.md | 10 +- .../zh/contribute/join/become-a-committer.md | 2 +- docs/docs/zh/contribute/join/code-conduct.md | 111 +++--- .../docs/zh/contribute/join/commit-message.md | 17 +- docs/docs/zh/contribute/join/contribute.md | 5 +- docs/docs/zh/contribute/join/document.md | 4 +- docs/docs/zh/contribute/join/issue.md | 10 +- docs/docs/zh/contribute/join/microbench.md | 48 +-- docs/docs/zh/contribute/join/pull-request.md | 16 +- docs/docs/zh/contribute/join/review.md | 69 ++-- docs/docs/zh/contribute/join/submit-code.md | 56 +-- docs/docs/zh/contribute/join/subscribe.md | 1 + docs/docs/zh/contribute/join/unit-test.md | 16 +- docs/docs/zh/contribute/log-specification.md | 3 +- .../zh/contribute/release/release-post.md | 2 +- .../zh/contribute/release/release-prepare.md | 9 +- docs/docs/zh/contribute/release/release.md | 15 +- docs/docs/zh/guide/alert/dingtalk.md | 18 +- docs/docs/zh/guide/alert/email.md | 3 +- .../zh/guide/alert/enterprise-webexteams.md | 11 + docs/docs/zh/guide/alert/enterprise-wechat.md | 2 +- docs/docs/zh/guide/alert/feishu.md | 1 + docs/docs/zh/guide/alert/http.md | 9 + docs/docs/zh/guide/alert/script.md | 5 + docs/docs/zh/guide/alert/telegram.md | 37 +- docs/docs/zh/guide/data-quality.md | 229 +++++++----- docs/docs/zh/guide/datasource/athena.md | 2 +- docs/docs/zh/guide/expansion-reduction.md | 134 +++---- docs/docs/zh/guide/healthcheck.md | 1 + .../docs/zh/guide/howto/datasource-setting.md | 6 +- .../zh/guide/installation/pseudo-cluster.md | 3 +- docs/docs/zh/guide/integration/rainbond.md | 14 +- docs/docs/zh/guide/metrics/metrics.md | 32 +- docs/docs/zh/guide/monitor.md | 4 +- docs/docs/zh/guide/parameter/built-in.md | 37 +- docs/docs/zh/guide/project/task-definition.md | 4 +- docs/docs/zh/guide/project/task-instance.md | 2 + .../zh/guide/project/workflow-definition.md | 119 ++++--- .../zh/guide/project/workflow-instance.md | 22 +- docs/docs/zh/guide/resource/file-manage.md | 4 +- docs/docs/zh/guide/resource/intro.md | 2 +- docs/docs/zh/guide/resource/task-group.md | 10 +- docs/docs/zh/guide/resource/udf-manage.md | 9 +- docs/docs/zh/guide/security.md | 30 +- docs/docs/zh/guide/start/quick-start.md | 21 +- docs/docs/zh/guide/upgrade/incompatible.md | 3 +- docs/docs/zh/guide/upgrade/upgrade.md | 22 +- docs/docs/zh/history-versions.md | 2 + docs/img/tasks/demo/java_task02.png | Bin 0 -> 286501 bytes dolphinscheduler-api-test/README.md | 1 + dolphinscheduler-bom/README.md | 11 +- dolphinscheduler-e2e/README.md | 1 + .../pydolphinscheduler/DEVELOP.md | 43 ++- .../pydolphinscheduler/README.md | 36 +- .../pydolphinscheduler/RELEASE.md | 32 +- .../pydolphinscheduler/UPDATING.md | 35 +- dolphinscheduler-registry/README.md | 24 +- .../dolphinscheduler-registry-etcd/README.md | 3 +- .../dolphinscheduler-registry-mysql/README.md | 4 +- .../README.md | 2 +- dolphinscheduler-ui/README.md | 3 +- 175 files changed, 2556 insertions(+), 2180 deletions(-) create mode 100644 docs/docs/en/guide/task/java.md create mode 100644 docs/img/tasks/demo/java_task02.png diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 5417822bc666..ea2405e86f94 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1,6 +1,5 @@ - ## Purpose of the pull request @@ -8,8 +7,9 @@ ## Brief change log + ## Verify this pull request @@ -25,9 +25,9 @@ This pull request is already covered by existing tests, such as *(please describ This change added tests and can be verified as follows: +- *Added dolphinscheduler-dao tests for end-to-end.* +- *Added CronUtilsTest to verify the change.* +- *Manually verified the change by testing locally.* --> (or) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1ea7469b386c..c5802c5d0161 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,11 +6,11 @@ Start by forking the dolphinscheduler GitHub repository, make changes in a branc There are three branches in the remote repository currently: - - `master` : normal delivery branch. After the stable version is released, the code for the stable version branch is merged into the master branch. - - - `dev` : daily development branch. The daily development branch, the newly submitted code can pull requests to this branch. - - - `x.x.x-release` : the stable release version. +- `master` : normal delivery branch. After the stable version is released, the code for the stable version branch is merged into the master branch. + +- `dev` : daily development branch. The daily development branch, the newly submitted code can pull requests to this branch. + +- `x.x.x-release` : the stable release version. So, you should fork the `dev` branch. @@ -40,7 +40,6 @@ There will be two repositories at this time: origin (your own warehouse) and ups Get/update remote repository code (already the latest code, skip it). - ```sh git fetch upstream ``` @@ -91,7 +90,6 @@ After submitting changes to your remote repository, you should click on the new

- Select the modified local branch and the branch to merge past to create a pull request.

diff --git a/README.md b/README.md index a2e6c3c2fdd7..a14bce1bc6a7 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ Dolphin Scheduler Official Website [dolphinscheduler.apache.org](https://dolphinscheduler.apache.org) -============ +================================================================== [![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html) [![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)](https://codecov.io/gh/apache/dolphinscheduler/branch/dev) @@ -8,9 +8,6 @@ Dolphin Scheduler Official Website [![Twitter Follow](https://img.shields.io/twitter/follow/dolphinschedule.svg?style=social&label=Follow)](https://twitter.com/dolphinschedule) [![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://s.apache.org/dolphinscheduler-slack) - - - [![Stargazers over time](https://starchart.cc/apache/dolphinscheduler.svg)](https://starchart.cc/apache/dolphinscheduler) [![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md) @@ -21,35 +18,35 @@ Dolphin Scheduler Official Website DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`. Its main objectives are as follows: - - Highly Reliable, +- Highly Reliable, DolphinScheduler adopts a decentralized multi-master and multi-worker architecture design, which naturally supports easy expansion and high availability (not restricted by a single point of bottleneck), and its performance increases linearly with the increase of machines - - High performance, supporting tens of millions of tasks every day - - Support multi-tenant. - - Cloud Native, DolphinScheduler supports multi-cloud/data center workflow management, also +- High performance, supporting tens of millions of tasks every day +- Support multi-tenant. +- Cloud Native, DolphinScheduler supports multi-cloud/data center workflow management, also supports Kubernetes, Docker deployment and custom task types, distributed scheduling, with overall scheduling capability increased linearly with the scale of the cluster - - Support various task types: Shell, MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Sub_Process, Procedure, etc. - - Support scheduling of workflows and dependencies, manual scheduling to pause/stop/recover task, support failure task retry/alarm, recover specified nodes from failure, kill task, etc. - - Associate the tasks according to the dependencies of the tasks in a DAG graph, which can visualize the running state of the task in real-time. - - WYSIWYG online editing tasks - - Support the priority of workflows & tasks, task failover, and task timeout alarm or failure. - - Support workflow global parameters and node customized parameter settings. - - Support online upload/download/management of resource files, etc. Support online file creation and editing. - - Support task log online viewing and scrolling and downloading, etc. - - Support the viewing of Master/Worker CPU load, memory, and CPU usage metrics. - - Support displaying workflow history in tree/Gantt chart, as well as statistical analysis on the task status & process status in each workflow. - - Support back-filling data. - - Support internationalization. - - More features waiting for partners to explore... +- Support various task types: Shell, MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Sub_Process, Procedure, etc. +- Support scheduling of workflows and dependencies, manual scheduling to pause/stop/recover task, support failure task retry/alarm, recover specified nodes from failure, kill task, etc. +- Associate the tasks according to the dependencies of the tasks in a DAG graph, which can visualize the running state of the task in real-time. +- WYSIWYG online editing tasks +- Support the priority of workflows & tasks, task failover, and task timeout alarm or failure. +- Support workflow global parameters and node customized parameter settings. +- Support online upload/download/management of resource files, etc. Support online file creation and editing. +- Support task log online viewing and scrolling and downloading, etc. +- Support the viewing of Master/Worker CPU load, memory, and CPU usage metrics. +- Support displaying workflow history in tree/Gantt chart, as well as statistical analysis on the task status & process status in each workflow. +- Support back-filling data. +- Support internationalization. +- More features waiting for partners to explore... ## What's in DolphinScheduler - Stability | Accessibility | Features | Scalability | - --------- | ------------- | -------- | ------------| -Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance.  |  Support pause, recover operation | Support customized task types -support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. -Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | | +| Stability | Accessibility | Features | Scalability | +|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Decentralized multi-master and multi-worker | Visualization of workflow key information, such as task status, task type, retry times, task operation machine information, visual variables, and so on at a glance.  |  Support pause, recover operation | Support customized task types | +| support HA | Visualization of all workflow operations, dragging tasks to draw DAGs, configuring data sources and resources. At the same time, for third-party systems, provide API mode operations. | Users on DolphinScheduler can achieve many-to-one or one-to-one mapping relationship through tenants and Hadoop users, which is very important for scheduling large data jobs. | The scheduler supports distributed scheduling, and the overall scheduling capability will increase linearly with the scale of the cluster. Master and Worker support dynamic adjustment. | +| Overload processing: By using the task queue mechanism, the number of schedulable tasks on a single machine can be flexibly configured. Machine jam can be avoided with high tolerance to numbers of tasks cached in task queue. | One-click deployment | Support traditional shell tasks, and big data platform task scheduling: MR, Spark, SQL (MySQL, PostgreSQL, hive, spark SQL), Python, Procedure, Sub_Process | | ## User Interface Screenshots diff --git a/README_zh_CN.md b/README_zh_CN.md index c5058eac1524..2226b9edbaf0 100644 --- a/README_zh_CN.md +++ b/README_zh_CN.md @@ -1,12 +1,11 @@ Dolphin Scheduler Official Website [dolphinscheduler.apache.org](https://dolphinscheduler.apache.org) -============ +================================================================== [![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html) [![codecov](https://codecov.io/gh/apache/dolphinscheduler/branch/dev/graph/badge.svg)](https://codecov.io/gh/apache/dolphinscheduler/branch/dev) [![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=apache-dolphinscheduler&metric=alert_status)](https://sonarcloud.io/dashboard?id=apache-dolphinscheduler) - [![Stargazers over time](https://starchart.cc/apache/dolphinscheduler.svg)](https://starchart.cc/apache/dolphinscheduler) [![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_zh_CN.md) @@ -18,20 +17,20 @@ Dolphin Scheduler Official Website 其主要目标如下: - - 以DAG图的方式将Task按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态 - - 支持丰富的任务类型:Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql)、Python、Sub_Process、Procedure等 - - 支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill任务等操作 - - 支持工作流优先级、任务优先级及任务的故障转移及任务超时告警/失败 - - 支持工作流全局参数及节点自定义参数设置 - - 支持资源文件的在线上传/下载,管理等,支持在线文件创建、编辑 - - 支持任务日志在线查看及滚动、在线下载日志等 - - 实现集群HA,通过Zookeeper实现Master集群和Worker集群去中心化 - - 支持对`Master/Worker` cpu load,memory,cpu在线查看 - - 支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计 - - 支持补数 - - 支持多租户 - - 支持国际化 - - 还有更多等待伙伴们探索 +- 以DAG图的方式将Task按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态 +- 支持丰富的任务类型:Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql)、Python、Sub_Process、Procedure等 +- 支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill任务等操作 +- 支持工作流优先级、任务优先级及任务的故障转移及任务超时告警/失败 +- 支持工作流全局参数及节点自定义参数设置 +- 支持资源文件的在线上传/下载,管理等,支持在线文件创建、编辑 +- 支持任务日志在线查看及滚动、在线下载日志等 +- 实现集群HA,通过Zookeeper实现Master集群和Worker集群去中心化 +- 支持对`Master/Worker` cpu load,memory,cpu在线查看 +- 支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计 +- 支持补数 +- 支持多租户 +- 支持国际化 +- 还有更多等待伙伴们探索 ## 系统部分截图 diff --git a/deploy/README.md b/deploy/README.md index c1b8fa543403..925c40530c8b 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -2,3 +2,4 @@ * [Start Up DolphinScheduler with Docker](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/start/docker.html) * [Start Up DolphinScheduler with Kubernetes](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/installation/kubernetes.html) + diff --git a/docs/docs/en/DSIP.md b/docs/docs/en/DSIP.md index 07d617875e86..69475f0804de 100644 --- a/docs/docs/en/DSIP.md +++ b/docs/docs/en/DSIP.md @@ -55,11 +55,11 @@ Here is the template for mail ```text Hi community, - + - + I already add a GitHub Issue for my proposal, which you could see in . - + Looking forward any feedback for this thread. ``` @@ -89,3 +89,4 @@ closed and transfer from [current DSIPs][current-DSIPs] to [past DSIPs][past-DSI [github-issue-choose]: https://github.com/apache/dolphinscheduler/issues/new/choose [mail-to-dev]: mailto:dev@dolphinscheduler.apache.org [DSIP-1]: https://github.com/apache/dolphinscheduler/issues/6407 + diff --git a/docs/docs/en/about/features.md b/docs/docs/en/about/features.md index 75393ce142d4..e45f75d565a1 100644 --- a/docs/docs/en/about/features.md +++ b/docs/docs/en/about/features.md @@ -16,4 +16,5 @@ ## High Scalability -- **Scalability**: Supports multitenancy and online resource management. Stable operation of 100,000 data tasks per day is supported. \ No newline at end of file +- **Scalability**: Supports multitenancy and online resource management. Stable operation of 100,000 data tasks per day is supported. + diff --git a/docs/docs/en/about/glossary.md b/docs/docs/en/about/glossary.md index f8ad9355bcb4..dc3df7bb5c9c 100644 --- a/docs/docs/en/about/glossary.md +++ b/docs/docs/en/about/glossary.md @@ -71,4 +71,3 @@ process fails and ends From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued - diff --git a/docs/docs/en/about/hardware.md b/docs/docs/en/about/hardware.md index f67066e8c9cf..b10a0b688096 100644 --- a/docs/docs/en/about/hardware.md +++ b/docs/docs/en/about/hardware.md @@ -6,15 +6,15 @@ This section briefs about the hardware requirements for DolphinScheduler. Dolphi The Linux operating systems specified below can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN. -| Operating System | Version | -| :----------------------- | :----------: | -| Red Hat Enterprise Linux | 7.0 and above | -| CentOS | 7.0 and above | -| Oracle Enterprise Linux | 7.0 and above | +| Operating System | Version | +|:-------------------------|:---------------:| +| Red Hat Enterprise Linux | 7.0 and above | +| CentOS | 7.0 and above | +| Oracle Enterprise Linux | 7.0 and above | | Ubuntu LTS | 16.04 and above | > **Note:** ->The above Linux operating systems can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN. +> The above Linux operating systems can run on physical servers and mainstream virtualization environments such as VMware, KVM, and XEN. ## Server Configuration @@ -23,8 +23,8 @@ DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architectu ### Production Environment | **CPU** | **MEM** | **HD** | **NIC** | **Num** | -| --- | --- | --- | --- | --- | -| 4 core+ | 8 GB+ | SAS | GbE | 1+ | +|---------|---------|--------|---------|---------| +| 4 core+ | 8 GB+ | SAS | GbE | 1+ | > **Note:** > - The above recommended configuration is the minimum configuration for deploying DolphinScheduler. Higher configuration is strongly recommended for production environments. @@ -34,11 +34,11 @@ DolphinScheduler supports 64-bit hardware platforms with Intel x86-64 architectu DolphinScheduler provides the following network port configurations for normal operation: -| Server | Port | Desc | -| --- | --- | --- | -| MasterServer | 5678 | not the communication port, require the native ports do not conflict | -| WorkerServer | 1234 | not the communication port, require the native ports do not conflict | -| ApiApplicationServer | 12345 | backend communication port | +| Server | Port | Desc | +|----------------------|-------|----------------------------------------------------------------------| +| MasterServer | 5678 | not the communication port, require the native ports do not conflict | +| WorkerServer | 1234 | not the communication port, require the native ports do not conflict | +| ApiApplicationServer | 12345 | backend communication port | > **Note:** > - MasterServer and WorkerServer do not need to enable communication between the networks. As long as the local ports do not conflict. @@ -46,4 +46,4 @@ DolphinScheduler provides the following network port configurations for normal o ## Browser Requirements -The minimum supported version of Google Chrome is version 85, but version 90 or above is recommended. \ No newline at end of file +The minimum supported version of Google Chrome is version 85, but version 90 or above is recommended. diff --git a/docs/docs/en/about/introduction.md b/docs/docs/en/about/introduction.md index 059401a4ac6c..4bc7ee49af0a 100644 --- a/docs/docs/en/about/introduction.md +++ b/docs/docs/en/about/introduction.md @@ -4,4 +4,4 @@ Apache DolphinScheduler provides a distributed and easy to expand visual workflo Apache DolphinScheduler aims to solve complex big data task dependencies and to trigger relationships in data OPS orchestration for various big data applications. Solves the intricate dependencies of data R&D ETL and the inability to monitor the health status of tasks. DolphinScheduler assembles tasks in the Directed Acyclic Graph (DAG) streaming mode, which can monitor the execution status of tasks in time, and supports operations like retry, recovery failure from specified nodes, pause, resume, and kill tasks, etc. -![Apache DolphinScheduler](../../../img/introduction_ui.png) \ No newline at end of file +![Apache DolphinScheduler](../../../img/introduction_ui.png) diff --git a/docs/docs/en/architecture/cache.md b/docs/docs/en/architecture/cache.md index 3885dddd2447..6084a5cc6569 100644 --- a/docs/docs/en/architecture/cache.md +++ b/docs/docs/en/architecture/cache.md @@ -39,4 +39,4 @@ Note: the final strategy for cache update comes from the expiration strategy con The sequence diagram shows below: -cache-evict \ No newline at end of file +cache-evict diff --git a/docs/docs/en/architecture/configuration.md b/docs/docs/en/architecture/configuration.md index cfa853c8960b..279411ef75b2 100644 --- a/docs/docs/en/architecture/configuration.md +++ b/docs/docs/en/architecture/configuration.md @@ -101,8 +101,6 @@ The directory structure of DolphinScheduler is as follows: ## Configurations in Details - - ### dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application] dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown. @@ -110,6 +108,7 @@ Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via do Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources. Default simplified parameters are: + ```bash export DOLPHINSCHEDULER_OPTS=" -server @@ -157,8 +156,8 @@ The default configuration is as follows: Note that DolphinScheduler also supports database configuration through `bin/env/dolphinscheduler_env.sh`. - ### Zookeeper related configuration + DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location: |Service| Configuration file | |--|--| @@ -226,8 +225,8 @@ The default configuration is as follows: |alert.rpc.port | 50052 | the RPC port of Alert Server| |zeppelin.rest.url | http://localhost:8080 | the RESTful API url of zeppelin| - ### Api-server related configuration + Location: `api-server/conf/application.yaml` |Parameters | Default value| Description| @@ -257,6 +256,7 @@ Location: `api-server/conf/application.yaml` |traffic.control.customize-tenant-qps-rate||customize tenant max request number per second| ### Master Server related configuration + Location: `master-server/conf/application.yaml` |Parameters | Default value| Description| @@ -278,8 +278,8 @@ Location: `master-server/conf/application.yaml` |master.registry-disconnect-strategy.strategy|stop|Used when the master disconnect from registry, default value: stop. Optional values include stop, waiting| |master.registry-disconnect-strategy.max-waiting-time|100s|Used when the master disconnect from registry, and the disconnect strategy is waiting, this config means the master will waiting to reconnect to registry in given times, and after the waiting times, if the master still cannot connect to registry, will stop itself, if the value is 0s, the Master will waitting infinitely| - ### Worker Server related configuration + Location: `worker-server/conf/application.yaml` |Parameters | Default value| Description| @@ -298,6 +298,7 @@ Location: `worker-server/conf/application.yaml` |worker.registry-disconnect-strategy.max-waiting-time|100s|Used when the worker disconnect from registry, and the disconnect strategy is waiting, this config means the worker will waiting to reconnect to registry in given times, and after the waiting times, if the worker still cannot connect to registry, will stop itself, if the value is 0s, will waitting infinitely | ### Alert Server related configuration + Location: `alert-server/conf/application.yaml` |Parameters | Default value| Description| @@ -305,7 +306,6 @@ Location: `alert-server/conf/application.yaml` |server.port|50053|the port of Alert Server| |alert.port|50052|the port of alert| - ### Quartz related configuration This part describes quartz configs and configure them based on your practical situation and resources. @@ -335,7 +335,6 @@ The default configuration is as follows: |spring.quartz.properties.org.quartz.jobStore.driverDelegateClass | org.quartz.impl.jdbcjobstore.PostgreSQLDelegate| |spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval | 5000| - ### dolphinscheduler_env.sh [load environment variables configs] When using shell to commit tasks, DolphinScheduler will export environment variables from `bin/env/dolphinscheduler_env.sh`. The diff --git a/docs/docs/en/architecture/design.md b/docs/docs/en/architecture/design.md index 9e09e15948c0..9579ab36517f 100644 --- a/docs/docs/en/architecture/design.md +++ b/docs/docs/en/architecture/design.md @@ -22,58 +22,58 @@ ### Architecture Description -* **MasterServer** +* **MasterServer** - MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time. - When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper. - MasterServer provides monitoring services based on netty. + MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time. + When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper. + MasterServer provides monitoring services based on netty. - #### The Service Mainly Includes: - - - **DistributedQuartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task; + #### The Service Mainly Includes: - - **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**; + - **DistributedQuartz** distributed scheduling component, which is mainly responsible for the start and stop operations of scheduled tasks. When quartz start the task, there will be a thread pool inside the Master responsible for the follow-up operation of the processing task; - - **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types; + - **MasterSchedulerService** is a scanning thread that regularly scans the `t_ds_command` table in the database, runs different business operations according to different **command types**; - - **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance; + - **WorkflowExecuteRunnable** is mainly responsible for DAG task segmentation, task submission monitoring, and logical processing of different event types; - - **EventExecuteService** is mainly responsible for the polling of the event queue of the process instances; + - **TaskExecuteRunnable** is mainly responsible for the processing and persistence of tasks, and generates task events and submits them to the event queue of the process instance; - - **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance; + - **EventExecuteService** is mainly responsible for the polling of the event queue of the process instances; - - **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance; + - **StateWheelExecuteThread** is mainly responsible for process instance and task timeout, task retry, task-dependent polling, and generates the corresponding process instance or task event and submits it to the event queue of the process instance; -* **WorkerServer** + - **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance; - WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services. +* **WorkerServer** - When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat. - WorkerServer provides monitoring services based on netty. - - #### The Service Mainly Includes: + WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services. - - **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing; + When the WorkerServer service starts, register a temporary node with ZooKeeper and maintain a heartbeat. + WorkerServer provides monitoring services based on netty. - - **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types; + #### The Service Mainly Includes: - - **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status; + - **WorkerManagerThread** is mainly responsible for the submission of the task queue, continuously receives tasks from the task queue, and submits them to the thread pool for processing; -* **ZooKeeper** + - **TaskExecuteThread** is mainly responsible for the process of task execution, and the actual processing of tasks according to different task types; - ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper. + - **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status; - We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation. +* **ZooKeeper** -* **AlertServer** + ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper. + + We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation. + +* **AlertServer** Provides alarm services, and implements rich alarm methods through alarm plugins. -* **API** +* **API** - The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external. + The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external. -* **UI** +* **UI** The front-end page of the system provides various visual operation interfaces of the system, see more at [Introduction to Functions](../guide/homepage.md) section. @@ -84,6 +84,7 @@ ##### Centralized Thinking The centralized design concept is relatively simple. The nodes in the distributed cluster are roughly divided into two roles according to responsibilities: +

master-slave character

@@ -120,8 +121,6 @@ The service fault-tolerance design relies on ZooKeeper's Watcher mechanism, and

Among them, the Master monitors the directories of other Masters and Workers. If the remove event is triggered, perform fault tolerance of the process instance or task instance according to the specific business logic. - - - Master fault tolerance:

@@ -146,7 +145,7 @@ Fault-tolerant content: When sending the remove event of the Worker node, the Ma Fault-tolerant post-processing: Once the Master Scheduler thread finds that the task instance is in the "fault-tolerant" state, it takes over the task and resubmits it. - Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service. +Note: Due to "network jitter", the node may lose heartbeat with ZooKeeper in a short period of time, and the node's remove event may occur. For this situation, we use the simplest way, that is, once the node and ZooKeeper timeout connection occurs, then directly stop the Master or Worker service. ##### Task Failed and Try Again @@ -170,26 +169,26 @@ If there is a task failure in the workflow that reaches the maximum retry times, In the early schedule design, if there is no priority design and use the fair scheduling, the task submitted first may complete at the same time with the task submitted later, thus invalid the priority of process or task. So we have re-designed this, and the following is our current design: -- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest. - - The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string. +- According to **the priority of different process instances** prior over **priority of the same process instance** prior over **priority of tasks within the same process** prior over **tasks within the same process**, process task submission order from highest to Lowest. + - The specific implementation is to parse the priority according to the JSON of the task instance, and then save the **process instance priority_process instance id_task priority_task id** information to the ZooKeeper task queue. When obtain from the task queue, we can get the highest priority task by comparing string. + - The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below - - The priority of the process definition is to consider that some processes need to process before other processes. Configure the priority when the process starts or schedules. There are 5 levels in total, which are HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below -

- Process priority configuration -

+

+ Process priority configuration +

- - The priority of the task is also divides into 5 levels, ordered by HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below: -

- Task priority configuration -

+ - The priority of the task is also divides into 5 levels, ordered by HIGHEST, HIGH, MEDIUM, LOW, LOWEST. As shown below: -#### Logback and Netty Implement Log Access +

+ Task priority configuration +

-- Since Web (UI) and Worker are not always on the same machine, to view the log cannot be like querying a local file. There are two options: - - Put logs on the ES search engine. - - Obtain remote log information through netty communication. +#### Logback and Netty Implement Log Access -- In consideration of the lightness of DolphinScheduler as much as possible, so choose gRPC to achieve remote access to log information. +- Since Web (UI) and Worker are not always on the same machine, to view the log cannot be like querying a local file. There are two options: +- Put logs on the ES search engine. +- Obtain remote log information through netty communication. +- In consideration of the lightness of DolphinScheduler as much as possible, so choose gRPC to achieve remote access to log information.

grpc remote access diff --git a/docs/docs/en/architecture/load-balance.md b/docs/docs/en/architecture/load-balance.md index 5cf2d4e34ad3..f58f864128e1 100644 --- a/docs/docs/en/architecture/load-balance.md +++ b/docs/docs/en/architecture/load-balance.md @@ -57,3 +57,4 @@ You can customise the configuration by changing the following properties in work - worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2) - worker.reserved.memory=0.3 (worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G) + diff --git a/docs/docs/en/architecture/metadata.md b/docs/docs/en/architecture/metadata.md index 2e55e1d9258f..b4633707f532 100644 --- a/docs/docs/en/architecture/metadata.md +++ b/docs/docs/en/architecture/metadata.md @@ -1,8 +1,8 @@ # MetaData ## Table Schema -see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql` +see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql` --- @@ -15,7 +15,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql` - One tenant can own Multiple users. - The queue field in the `t_ds_user` table stores the `queue_name` information in the `t_ds_queue` table, `t_ds_tenant` stores queue information using `queue_id` column. During the execution of the process definition, the user queue has the highest priority. If the user queue is null, use the tenant queue. - The `user_id` field in the `t_ds_datasource` table shows the user who create the data source. The user_id in `t_ds_relation_datasource_user` shows the user who has permission to the data source. - + ### Project Resource Alert ![image.png](../../../img/metadata-erd/project-resource-alert.png) @@ -26,6 +26,7 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql` - The `user_id` in the `t_ds_udfs` table represents the user who create the UDF, and the `user_id` in the `t_ds_relation_udfs_user` table represents a user who has permission to the UDF. ### Project - Tenant - ProcessDefinition - Schedule + ![image.png](../../../img/metadata-erd/project_tenant_process_definition_schedule.png) - A project can have multiple process definitions, and each process definition belongs to only one project. @@ -33,8 +34,10 @@ see sql files in `dolphinscheduler/dolphinscheduler-dao/src/main/resources/sql` - A workflow definition can have one or more schedules. ### Process Definition Execution + ![image.png](../../../img/metadata-erd/process_definition.png) - A process definition corresponds to multiple task definitions, which are associated through `t_ds_process_task_relation` and the associated key is `code + version`. When the pre-task of the task is empty, the corresponding `pre_task_node` and `pre_task_version` are 0. - A process definition can have multiple process instances `t_ds_process_instance`, one process instance corresponds to one or more task instances `t_ds_task_instance`. -- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively. \ No newline at end of file +- The data stored in the `t_ds_relation_process_instance` table is used to handle the case that the process definition contains sub-processes. `parent_process_instance_id` represents the id of the main process instance containing the sub-process, `process_instance_id` represents the id of the sub-process instance, `parent_task_instance_id` represents the task instance id of the sub-process node. The process instance table and the task instance table correspond to the `t_ds_process_instance` table and the `t_ds_task_instance` table, respectively. + diff --git a/docs/docs/en/architecture/task-structure.md b/docs/docs/en/architecture/task-structure.md index bd6482401695..73042e3dccbf 100644 --- a/docs/docs/en/architecture/task-structure.md +++ b/docs/docs/en/architecture/task-structure.md @@ -6,28 +6,28 @@ All tasks in DolphinScheduler are saved in the `t_ds_process_definition` table. The following shows the `t_ds_process_definition` table structure: -No. | field | type | description --------- | ---------| -------- | --------- -1|id|int(11)|primary key -2|name|varchar(255)|process definition name -3|version|int(11)|process definition version -4|release_state|tinyint(4)|release status of process definition: 0 not released, 1 released -5|project_id|int(11)|project id -6|user_id|int(11)|user id of the process definition -7|process_definition_json|longtext|process definition JSON -8|description|text|process definition description -9|global_params|text|global parameters -10|flag|tinyint(4)|specify whether the process is available: 0 is not available, 1 is available -11|locations|text|node location information -12|connects|text|node connectivity info -13|receivers|text|receivers -14|receivers_cc|text|CC receivers -15|create_time|datetime|create time -16|timeout|int(11) |timeout -17|tenant_id|int(11) |tenant id -18|update_time|datetime|update time -19|modify_by|varchar(36)|specify the user that made the modification -20|resource_ids|varchar(255)|resource ids +| No. | field | type | description | +|-----|-------------------------|--------------|------------------------------------------------------------------------------| +| 1 | id | int(11) | primary key | +| 2 | name | varchar(255) | process definition name | +| 3 | version | int(11) | process definition version | +| 4 | release_state | tinyint(4) | release status of process definition: 0 not released, 1 released | +| 5 | project_id | int(11) | project id | +| 6 | user_id | int(11) | user id of the process definition | +| 7 | process_definition_json | longtext | process definition JSON | +| 8 | description | text | process definition description | +| 9 | global_params | text | global parameters | +| 10 | flag | tinyint(4) | specify whether the process is available: 0 is not available, 1 is available | +| 11 | locations | text | node location information | +| 12 | connects | text | node connectivity info | +| 13 | receivers | text | receivers | +| 14 | receivers_cc | text | CC receivers | +| 15 | create_time | datetime | create time | +| 16 | timeout | int(11) | timeout | +| 17 | tenant_id | int(11) | tenant id | +| 18 | update_time | datetime | update time | +| 19 | modify_by | varchar(36) | specify the user that made the modification | +| 20 | resource_ids | varchar(255) | resource ids | The `process_definition_json` field is the core field, which defines the task information in the DAG diagram, and it is stored in JSON format. @@ -40,6 +40,7 @@ No. | field | type | description 4|timeout|int|timeout Data example: + ```bash { "globalParams":[ @@ -74,7 +75,7 @@ No.|parameter name||type|description |notes 9|runFlag | |String |execution flag| | 10|conditionResult | |Object|condition branch | | 11| | successNode| Array|jump to node if success| | -12| | failedNode|Array|jump to node if failure| +12| | failedNode|Array|jump to node if failure| 13| dependence| |Object |task dependency |mutual exclusion with params 14|maxRetryTimes | |String|max retry times | | 15|retryInterval | |String |retry interval| | @@ -159,7 +160,7 @@ No.|parameter name||type|description |note 19|runFlag | |String |execution flag| | 20|conditionResult | |Object|condition branch | | 21| | successNode| Array|jump to node if success| | -22| | failedNode|Array|jump to node if failure| +22| | failedNode|Array|jump to node if failure| 23| dependence| |Object |task dependency |mutual exclusion with params 24|maxRetryTimes | |String|max retry times | | 25|retryInterval | |String |retry interval| | @@ -238,38 +239,38 @@ No.|parameter name||type|description |note **The following shows the node data structure:** -No.|parameter name||type|description |notes --------- | ---------| ---------| -------- | --------- | --------- -1|id | |String| task Id| -2|type ||String |task type |SPARK -3| name| |String|task name | -4| params| |Object|customized parameters |JSON format -5| |mainClass |String | main class -6| |mainArgs | String| execution arguments -7| |others | String| other arguments -8| |mainJar |Object | application jar package -9| |deployMode |String |deployment mode |local,client,cluster -10| |driverCores | String| driver cores -11| |driverMemory | String| driver memory -12| |numExecutors |String | executor count -13| |executorMemory |String | executor memory -14| |executorCores |String | executor cores -15| |programType | String| program type|JAVA,SCALA,PYTHON -16| | sparkVersion| String| Spark version| SPARK1 , SPARK2 -17| | localParams| Array|customized local parameters -18| | resourceList| Array|resource files -19|description | |String|description | | -20|runFlag | |String |execution flag| | -21|conditionResult | |Object|condition branch| | -22| | successNode| Array|jump to node if success| | -23| | failedNode|Array|jump to node if failure| -24| dependence| |Object |task dependency |mutual exclusion with params -25|maxRetryTimes | |String|max retry times | | -26|retryInterval | |String |retry interval| | -27|timeout | |Object|timeout | | -28| taskInstancePriority| |String|task priority | | -29|workerGroup | |String |Worker group| | -30|preTasks | |Array|preposition tasks| | +| No. | parameter name || type | description | notes | +|-----|----------------------|----------------|--------|-----------------------------|------------------------------| +| 1 | id | | String | task Id | +| 2 | type || String | task type | SPARK | +| 3 | name | | String | task name | +| 4 | params | | Object | customized parameters | JSON format | +| 5 | | mainClass | String | main class | +| 6 | | mainArgs | String | execution arguments | +| 7 | | others | String | other arguments | +| 8 | | mainJar | Object | application jar package | +| 9 | | deployMode | String | deployment mode | local,client,cluster | +| 10 | | driverCores | String | driver cores | +| 11 | | driverMemory | String | driver memory | +| 12 | | numExecutors | String | executor count | +| 13 | | executorMemory | String | executor memory | +| 14 | | executorCores | String | executor cores | +| 15 | | programType | String | program type | JAVA,SCALA,PYTHON | +| 16 | | sparkVersion | String | Spark version | SPARK1 , SPARK2 | +| 17 | | localParams | Array | customized local parameters | +| 18 | | resourceList | Array | resource files | +| 19 | description | | String | description | | +| 20 | runFlag | | String | execution flag | | +| 21 | conditionResult | | Object | condition branch | | +| 22 | | successNode | Array | jump to node if success | | +| 23 | | failedNode | Array | jump to node if failure | +| 24 | dependence | | Object | task dependency | mutual exclusion with params | +| 25 | maxRetryTimes | | String | max retry times | | +| 26 | retryInterval | | String | retry interval | | +| 27 | timeout | | Object | timeout | | +| 28 | taskInstancePriority | | String | task priority | | +| 29 | workerGroup | | String | Worker group | | +| 30 | preTasks | | Array | preposition tasks | | **Node data example:** @@ -336,31 +337,31 @@ No.|parameter name||type|description |notes **The following shows the node data structure:** -No.|parameter name||type|description |notes --------- | ---------| ---------| -------- | --------- | --------- -1|id | |String| task Id| -2|type ||String |task type |MR -3| name| |String|task name | -4| params| |Object|customized parameters |JSON format -5| |mainClass |String | main class -6| |mainArgs | String|execution arguments -7| |others | String|other arguments -8| |mainJar |Object | application jar package -9| |programType | String|program type|JAVA,PYTHON -10| | localParams| Array|customized local parameters -11| | resourceList| Array|resource files -12|description | |String|description | | -13|runFlag | |String |execution flag| | -14|conditionResult | |Object|condition branch| | -15| | successNode| Array|jump to node if success| | -16| | failedNode|Array|jump to node if failure| -17| dependence| |Object |task dependency |mutual exclusion with params -18|maxRetryTimes | |String|max retry times | | -19|retryInterval | |String |retry interval| | -20|timeout | |Object|timeout | | -21| taskInstancePriority| |String|task priority| | -22|workerGroup | |String |Worker group| | -23|preTasks | |Array|preposition tasks| | +| No. | parameter name || type | description | notes | +|-----|----------------------|--------------|--------|-----------------------------|------------------------------| +| 1 | id | | String | task Id | +| 2 | type || String | task type | MR | +| 3 | name | | String | task name | +| 4 | params | | Object | customized parameters | JSON format | +| 5 | | mainClass | String | main class | +| 6 | | mainArgs | String | execution arguments | +| 7 | | others | String | other arguments | +| 8 | | mainJar | Object | application jar package | +| 9 | | programType | String | program type | JAVA,PYTHON | +| 10 | | localParams | Array | customized local parameters | +| 11 | | resourceList | Array | resource files | +| 12 | description | | String | description | | +| 13 | runFlag | | String | execution flag | | +| 14 | conditionResult | | Object | condition branch | | +| 15 | | successNode | Array | jump to node if success | | +| 16 | | failedNode | Array | jump to node if failure | +| 17 | dependence | | Object | task dependency | mutual exclusion with params | +| 18 | maxRetryTimes | | String | max retry times | | +| 19 | retryInterval | | String | retry interval | | +| 20 | timeout | | Object | timeout | | +| 21 | taskInstancePriority | | String | task priority | | +| 22 | workerGroup | | String | Worker group | | +| 23 | preTasks | | Array | preposition tasks | | **Node data example:** @@ -432,7 +433,7 @@ No.|parameter name||type|description |notes 9|runFlag | |String |execution flag| | 10|conditionResult | |Object|condition branch| | 11| | successNode| Array|jump to node if success| | -12| | failedNode|Array|jump to node if failure | +12| | failedNode|Array|jump to node if failure | 13| dependence| |Object |task dependency |mutual exclusion with params 14|maxRetryTimes | |String|max retry times | | 15|retryInterval | |String |retry interval| | @@ -493,36 +494,36 @@ No.|parameter name||type|description |notes **The following shows the node data structure:** -No.|parameter name||type|description |notes --------- | ---------| ---------| -------- | --------- | --------- -1|id | |String|task Id| -2|type ||String |task type|FLINK -3| name| |String|task name| -4| params| |Object|customized parameters |JSON format -5| |mainClass |String |main class -6| |mainArgs | String|execution arguments -7| |others | String|other arguments -8| |mainJar |Object |application jar package -9| |deployMode |String |deployment mode |local,client,cluster -10| |slot | String| slot count -11| |taskManager |String | taskManager count -12| |taskManagerMemory |String |taskManager memory size -13| |jobManagerMemory |String | jobManager memory size -14| |programType | String| program type|JAVA,SCALA,PYTHON -15| | localParams| Array|local parameters -16| | resourceList| Array|resource files -17|description | |String|description | | -18|runFlag | |String |execution flag| | -19|conditionResult | |Object|condition branch| | -20| | successNode| Array|jump node if success| | -21| | failedNode|Array|jump node if failure| -22| dependence| |Object |task dependency |mutual exclusion with params -23|maxRetryTimes | |String|max retry times| | -24|retryInterval | |String |retry interval| | -25|timeout | |Object|timeout | | -26| taskInstancePriority| |String|task priority| | -27|workerGroup | |String |Worker group| | -38|preTasks | |Array|preposition tasks| | +| No. | parameter name || type | description | notes | +|-----|----------------------|-------------------|--------|-------------------------|------------------------------| +| 1 | id | | String | task Id | +| 2 | type || String | task type | FLINK | +| 3 | name | | String | task name | +| 4 | params | | Object | customized parameters | JSON format | +| 5 | | mainClass | String | main class | +| 6 | | mainArgs | String | execution arguments | +| 7 | | others | String | other arguments | +| 8 | | mainJar | Object | application jar package | +| 9 | | deployMode | String | deployment mode | local,client,cluster | +| 10 | | slot | String | slot count | +| 11 | | taskManager | String | taskManager count | +| 12 | | taskManagerMemory | String | taskManager memory size | +| 13 | | jobManagerMemory | String | jobManager memory size | +| 14 | | programType | String | program type | JAVA,SCALA,PYTHON | +| 15 | | localParams | Array | local parameters | +| 16 | | resourceList | Array | resource files | +| 17 | description | | String | description | | +| 18 | runFlag | | String | execution flag | | +| 19 | conditionResult | | Object | condition branch | | +| 20 | | successNode | Array | jump node if success | | +| 21 | | failedNode | Array | jump node if failure | +| 22 | dependence | | Object | task dependency | mutual exclusion with params | +| 23 | maxRetryTimes | | String | max retry times | | +| 24 | retryInterval | | String | retry interval | | +| 25 | timeout | | Object | timeout | | +| 26 | taskInstancePriority | | String | task priority | | +| 27 | workerGroup | | String | Worker group | | +| 38 | preTasks | | Array | preposition tasks | | **Node data example:** @@ -588,30 +589,30 @@ No.|parameter name||type|description |notes **The following shows the node data structure:** -No.|parameter name||type|description |notes --------- | ---------| ---------| -------- | --------- | --------- -1|id | |String|task Id| -2|type ||String |task type|HTTP -3| name| |String|task name| -4| params| |Object|customized parameters |JSON format -5| |url |String |request url -6| |httpMethod | String|http method|GET,POST,HEAD,PUT,DELETE -7| | httpParams| Array|http parameters -8| |httpCheckCondition | String|validation of HTTP code status|default code 200 -9| |condition |String |validation conditions -10| | localParams| Array|customized local parameters -11|description | |String|description| | -12|runFlag | |String |execution flag| | -13|conditionResult | |Object|condition branch| | -14| | successNode| Array|jump node if success| | -15| | failedNode|Array|jump node if failure| -16| dependence| |Object |task dependency |mutual exclusion with params -17|maxRetryTimes | |String|max retry times | | -18|retryInterval | |String |retry interval| | -19|timeout | |Object|timeout | | -20| taskInstancePriority| |String|task priority| | -21|workerGroup | |String |Worker group| | -22|preTasks | |Array|preposition tasks| | +| No. | parameter name || type | description | notes | +|-----|----------------------|--------------------|--------|--------------------------------|------------------------------| +| 1 | id | | String | task Id | +| 2 | type || String | task type | HTTP | +| 3 | name | | String | task name | +| 4 | params | | Object | customized parameters | JSON format | +| 5 | | url | String | request url | +| 6 | | httpMethod | String | http method | GET,POST,HEAD,PUT,DELETE | +| 7 | | httpParams | Array | http parameters | +| 8 | | httpCheckCondition | String | validation of HTTP code status | default code 200 | +| 9 | | condition | String | validation conditions | +| 10 | | localParams | Array | customized local parameters | +| 11 | description | | String | description | | +| 12 | runFlag | | String | execution flag | | +| 13 | conditionResult | | Object | condition branch | | +| 14 | | successNode | Array | jump node if success | | +| 15 | | failedNode | Array | jump node if failure | +| 16 | dependence | | Object | task dependency | mutual exclusion with params | +| 17 | maxRetryTimes | | String | max retry times | | +| 18 | retryInterval | | String | retry interval | | +| 19 | timeout | | Object | timeout | | +| 20 | taskInstancePriority | | String | task priority | | +| 21 | workerGroup | | String | Worker group | | +| 22 | preTasks | | Array | preposition tasks | | **Node data example:** @@ -682,7 +683,7 @@ No.|parameter name||type|description |notes 6| |dsType |String | datasource type 7| |dataSource |Int | datasource ID 8| |dtType | String|target database type -9| |dataTarget | Int|target database ID +9| |dataTarget | Int|target database ID 10| |sql |String | SQL statements 11| |targetTable |String |target table 12| |jobSpeedByte |Int |job speed limiting(bytes) @@ -695,7 +696,7 @@ No.|parameter name||type|description |notes 19|runFlag | |String |execution flag| | 20|conditionResult | |Object|condition branch| | 21| | successNode| Array|jump node if success| | -22| | failedNode|Array|jump node if failure| +22| | failedNode|Array|jump node if failure| 23| dependence| |Object |task dependency |mutual exclusion with params 24|maxRetryTimes | |String|max retry times| | 25|retryInterval | |String |retry interval| | @@ -776,7 +777,7 @@ No.|parameter name||type|description |notes 13|runFlag | |String |execution flag| | 14|conditionResult | |Object|condition branch| | 15| | successNode| Array|jump node if success| | -16| | failedNode|Array|jump node if failure| +16| | failedNode|Array|jump node if failure| 17| dependence| |Object |task dependency |mutual exclusion with params 18|maxRetryTimes | |String|max retry times| | 19|retryInterval | |String |retry interval| | @@ -844,7 +845,7 @@ No.|parameter name||type|description |notes 6|runFlag | |String |execution flag| | 7|conditionResult | |Object|condition branch | | 8| | successNode| Array|jump to node if success| | -9| | failedNode|Array|jump to node if failure| +9| | failedNode|Array|jump to node if failure| 10| dependence| |Object |task dependency |mutual exclusion with params 11|maxRetryTimes | |String|max retry times | | 12|retryInterval | |String |retry interval| | @@ -909,7 +910,7 @@ No.|parameter name||type|description |notes 7|runFlag | |String |execution flag| | 8|conditionResult | |Object|condition branch | | 9| | successNode| Array|jump to node if success| | -10| | failedNode|Array|jump to node if failure| +10| | failedNode|Array|jump to node if failure| 11| dependence| |Object |task dependency |mutual exclusion with params 12|maxRetryTimes | |String|max retry times| | 13|retryInterval | |String |retry interval| | @@ -970,7 +971,7 @@ No.|parameter name||type|description |notes 9|runFlag | |String |execution flag| | 10|conditionResult | |Object|condition branch| | 11| | successNode| Array|jump to node if success| | -12| | failedNode|Array|jump to node if failure| +12| | failedNode|Array|jump to node if failure| 13| dependence| |Object |task dependency |mutual exclusion with params 14| | relation|String |relation|AND,OR 15| | dependTaskList|Array |dependent task list| @@ -1111,4 +1112,5 @@ No.|parameter name||type|description |notes ] } -``` \ No newline at end of file +``` + diff --git a/docs/docs/en/contribute/api-standard.md b/docs/docs/en/contribute/api-standard.md index 61d6622165c3..cebde7f3a751 100644 --- a/docs/docs/en/contribute/api-standard.md +++ b/docs/docs/en/contribute/api-standard.md @@ -1,9 +1,11 @@ # API design standard + A standardized and unified API is the cornerstone of project design.The API of DolphinScheduler follows the REST ful standard. REST ful is currently the most popular Internet software architecture. It has a clear structure, conforms to standards, is easy to understand and extend. This article uses the DolphinScheduler API as an example to explain how to construct a Restful API. ## 1. URI design + REST is "Representational State Transfer".The design of Restful URI is based on resources.The resource corresponds to an entity on the network, for example: a piece of text, a picture, and a service. And each resource corresponds to a URI. + One Kind of Resource: expressed in the plural, such as `task-instances`、`groups` ; @@ -12,36 +14,43 @@ REST is "Representational State Transfer".The design of Restful URI is based on + A Sub Resource:`/instances/{instanceId}/tasks/{taskId}`; ## 2. Method design + We need to locate a certain resource by URI, and then use Method or declare actions in the path suffix to reflect the operation of the resource. ### ① Query - GET + Use URI to locate the resource, and use GET to indicate query. + When the URI is a type of resource, it means to query a type of resource. For example, the following example indicates paging query `alter-groups`. + ``` Method: GET /dolphinscheduler/alert-groups ``` + When the URI is a single resource, it means to query this resource. For example, the following example means to query the specified `alter-group`. + ``` Method: GET /dolphinscheduler/alter-groups/{id} ``` + In addition, we can also express query sub-resources based on URI, as follows: + ``` Method: GET /dolphinscheduler/projects/{projectId}/tasks ``` **The above examples all represent paging query. If we need to query all data, we need to add `/list` after the URI to distinguish. Do not mix the same API for both paged query and query.** + ``` Method: GET /dolphinscheduler/alert-groups/list ``` ### ② Create - POST + Use URI to locate the resource, use POST to indicate create, and then return the created id to requester. + create an `alter-group`: @@ -52,35 +61,42 @@ Method: POST ``` + create sub-resources is also the same as above. + ``` Method: POST /dolphinscheduler/alter-groups/{alterGroupId}/tasks ``` ### ③ Modify - PUT + Use URI to locate the resource, use PUT to indicate modify. + modify an `alert-group` + ``` Method: PUT /dolphinscheduler/alter-groups/{alterGroupId} ``` ### ④ Delete -DELETE + Use URI to locate the resource, use DELETE to indicate delete. + delete an `alert-group` + ``` Method: DELETE /dolphinscheduler/alter-groups/{alterGroupId} ``` + batch deletion: batch delete the id array,we should use POST. **(Do not use the DELETE method, because the body of the DELETE request has no semantic meaning, and it is possible that some gateways, proxies, and firewalls will directly strip off the request body after receiving the DELETE request.)** + ``` Method: POST /dolphinscheduler/alter-groups/batch-delete ``` ### ⑤ Partial Modifications -PATCH + Use URI to locate the resource, use PATCH to partial modifications. ``` @@ -89,20 +105,27 @@ Method: PATCH ``` ### ⑥ Others + In addition to creating, deleting, modifying and quering, we also locate the corresponding resource through url, and then append operations to it after the path, such as: + ``` /dolphinscheduler/alert-groups/verify-name /dolphinscheduler/projects/{projectCode}/process-instances/{code}/view-gantt ``` ## 3. Parameter design + There are two types of parameters, one is request parameter and the other is path parameter. And the parameter must use small hump. In the case of paging, if the parameter entered by the user is less than 1, the front end needs to automatically turn to 1, indicating that the first page is requested; When the backend finds that the parameter entered by the user is greater than the total number of pages, it should directly return to the last page. ## 4. Others design + ### base URL + The URI of the project needs to use `/` as the base path, so as to identify that these APIs are under this project. + ``` /dolphinscheduler -``` \ No newline at end of file +``` + diff --git a/docs/docs/en/contribute/api-test.md b/docs/docs/en/contribute/api-test.md index 7953e9dbd86c..c7005e954077 100644 --- a/docs/docs/en/contribute/api-test.md +++ b/docs/docs/en/contribute/api-test.md @@ -10,7 +10,6 @@ In contrast, API testing focuses on whether a complete operation chain can be co For example, the API test of the tenant management interface focuses on whether users can log in normally; If the login fails, whether the error message can be displayed correctly. After logging in, you can perform tenant management operations through the sessionid you carry. - ## API Test ### API-Pages @@ -49,7 +48,6 @@ In addition, during the testing process, the interface are not requested directl On the login page, only the input parameter specification of the interface request is defined. For the output parameter of the interface request, only the unified basic response structure is defined. The data actually returned by the interface is tested in the actual test case. Whether the input and output of main test interfaces can meet the requirements of test cases. - ### API-Cases The following is an example of a tenant management test. As explained earlier, we use docker-compose for deployment, so for each test case, we need to import the corresponding file in the form of an annotation. @@ -86,7 +84,7 @@ https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-api-test/do ## Supplements -When running API tests locally, First, you need to start the local service, you can refer to this page: +When running API tests locally, First, you need to start the local service, you can refer to this page: [development-environment-setup](./development-environment-setup.md) When running API tests locally, the `-Dlocal=true` parameter can be configured to connect locally and facilitate changes to the UI. diff --git a/docs/docs/en/contribute/architecture-design.md b/docs/docs/en/contribute/architecture-design.md index a46bfb285932..1e50f2592d0c 100644 --- a/docs/docs/en/contribute/architecture-design.md +++ b/docs/docs/en/contribute/architecture-design.md @@ -1,4 +1,5 @@ ## Architecture Design + Before explaining the architecture of the schedule system, let us first understand the common nouns of the schedule system. ### 1.Noun Interpretation @@ -12,7 +13,7 @@ Before explaining the architecture of the schedule system, let us first understa

-**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes +**Process definition**: Visualization **DAG** by dragging task nodes and establishing associations of task nodes **Process instance**: A process instance is an instantiation of a process definition, which can be generated by manual startup or scheduling. The process definition runs once, a new process instance is generated @@ -34,11 +35,10 @@ Before explaining the architecture of the schedule system, let us first understa **Complement**: Complement historical data, support **interval parallel and serial** two complement methods - - ### 2.System architecture #### 2.1 System Architecture Diagram +

System Architecture Diagram

@@ -46,60 +46,51 @@ Before explaining the architecture of the schedule system, let us first understa

- - #### 2.2 Architectural description -* **MasterServer** +* **MasterServer** - MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer. - When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing. + MasterServer adopts the distributed non-central design concept. MasterServer is mainly responsible for DAG task split, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer. + When the MasterServer service starts, it registers a temporary node with Zookeeper, and listens to the Zookeeper temporary node state change for fault tolerance processing. - + ##### The service mainly contains: - ##### The service mainly contains: + - **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task. - - **Distributed Quartz** distributed scheduling component, mainly responsible for the start and stop operation of the scheduled task. When the quartz picks up the task, the master internally has a thread pool to be responsible for the subsequent operations of the task. + - **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types** - - **MasterSchedulerThread** is a scan thread that periodically scans the **command** table in the database for different business operations based on different **command types** + - **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types - - **MasterExecThread** is mainly responsible for DAG task segmentation, task submission monitoring, logic processing of various command types + - **MasterTaskExecThread** is mainly responsible for task persistence - - **MasterTaskExecThread** is mainly responsible for task persistence +* **WorkerServer** - + - WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat. -* **WorkerServer** + ##### This service contains: - - WorkerServer also adopts a distributed, non-central design concept. WorkerServer is mainly responsible for task execution and providing log services. When the WorkerServer service starts, it registers the temporary node with Zookeeper and maintains the heartbeat. + - **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types. + - **ZooKeeper** - ##### This service contains: + The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper. + We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation. - - **FetchTaskThread** is mainly responsible for continuously receiving tasks from **Task Queue** and calling **TaskScheduleThread** corresponding executors according to different task types. + - **Task Queue** - - **ZooKeeper** + The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance. - The ZooKeeper service, the MasterServer and the WorkerServer nodes in the system all use the ZooKeeper for cluster management and fault tolerance. In addition, the system also performs event monitoring and distributed locking based on ZooKeeper. - We have also implemented queues based on Redis, but we hope that DolphinScheduler relies on as few components as possible, so we finally removed the Redis implementation. + - **Alert** - - **Task Queue** + Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**. - The task queue operation is provided. Currently, the queue is also implemented based on Zookeeper. Since there is less information stored in the queue, there is no need to worry about too much data in the queue. In fact, we have over-measured a million-level data storage queue, which has no effect on system stability and performance. + - **API** - - **Alert** + The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally. + Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more. - Provides alarm-related interfaces. The interfaces mainly include **Alarms**. The storage, query, and notification functions of the two types of alarm data. The notification function has two types: **mail notification** and **SNMP (not yet implemented)**. + - **UI** - - **API** - - The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service provides a RESTful api to provide request services externally. - Interfaces include workflow creation, definition, query, modification, release, offline, manual start, stop, pause, resume, start execution from this node, and more. - - - **UI** - - The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section. - - + The front-end page of the system provides various visual operation interfaces of the system. For details, see the [quick start](https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/about/introduction.html) section. #### 2.3 Architectural Design Ideas @@ -130,10 +121,9 @@ Problems in the design of centralized : - In the decentralized design, there is usually no Master/Slave concept, all roles are the same, the status is equal, the global Internet is a typical decentralized distributed system, networked arbitrary node equipment down machine , all will only affect a small range of features. - The core design of decentralized design is that there is no "manager" that is different from other nodes in the entire distributed system, so there is no single point of failure problem. However, since there is no "manager" node, each node needs to communicate with other nodes to get the necessary machine information, and the unreliable line of distributed system communication greatly increases the difficulty of implementing the above functions. - In fact, truly decentralized distributed systems are rare. Instead, dynamic centralized distributed systems are constantly emerging. Under this architecture, the managers in the cluster are dynamically selected, rather than preset, and when the cluster fails, the nodes of the cluster will spontaneously hold "meetings" to elect new "managers". Go to preside over the work. The most typical case is the Etcd implemented in ZooKeeper and Go. - - Decentralization of DolphinScheduler is the registration of Master/Worker to ZooKeeper. The Master Cluster and the Worker Cluster are not centered, and the Zookeeper distributed lock is used to elect one Master or Worker as the “manager” to perform the task. -##### 二、Distributed lock practice +##### 二、Distributed lock practice DolphinScheduler uses ZooKeeper distributed locks to implement only one Master to execute the Scheduler at the same time, or only one Worker to perform task submission. @@ -184,8 +174,6 @@ Service fault tolerance design relies on ZooKeeper's Watcher mechanism. The impl The Master monitors the directories of other Masters and Workers. If the remove event is detected, the process instance is fault-tolerant or the task instance is fault-tolerant according to the specific business logic. - - - Master fault tolerance flow chart:

@@ -194,8 +182,6 @@ The Master monitors the directories of other Masters and Workers. If the remove After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler thread in DolphinScheduler. It traverses the DAG to find the "Running" and "Submit Successful" tasks, and monitors the status of its task instance for the "Running" task. You need to determine whether the Task Queue already exists. If it exists, monitor the status of the task instance. If it does not exist, resubmit the task instance. - - - Worker fault tolerance flow chart:

@@ -204,7 +190,7 @@ After the ZooKeeper Master is fault-tolerant, it is rescheduled by the Scheduler Once the Master Scheduler thread finds the task instance as "need to be fault tolerant", it takes over the task and resubmits. - Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service. +Note: Because the "network jitter" may cause the node to lose the heartbeat of ZooKeeper in a short time, the node's remove event occurs. In this case, we use the easiest way, that is, once the node has timeout connection with ZooKeeper, it will directly stop the Master or Worker service. ###### 2. Task failure retry @@ -214,8 +200,6 @@ Here we must first distinguish between the concept of task failure retry, proces - Process failure recovery is process level, is done manually, recovery can only be performed **from the failed node** or **from the current node** - Process failure rerun is also process level, is done manually, rerun is from the start node - - Next, let's talk about the topic, we divided the task nodes in the workflow into two types. - One is a business node, which corresponds to an actual script or processing statement, such as a Shell node, an MR node, a Spark node, a dependent node, and so on. @@ -225,16 +209,12 @@ Each **service node** can configure the number of failed retries. When the task If there is a task failure in the workflow that reaches the maximum number of retries, the workflow will fail to stop, and the failed workflow can be manually rerun or process resumed. - - ##### V. Task priority design In the early scheduling design, if there is no priority design and fair scheduling design, it will encounter the situation that the task submitted first may be completed simultaneously with the task submitted subsequently, but the priority of the process or task cannot be set. We have redesigned this, and we are currently designing it as follows: - According to **different process instance priority** prioritizes **same process instance priority** prioritizes **task priority within the same process** takes precedence over **same process** commit order from high Go to low for task processing. - - The specific implementation is to resolve the priority according to the json of the task instance, and then save the **process instance priority _ process instance id_task priority _ task id** information in the ZooKeeper task queue, when obtained from the task queue, Through string comparison, you can get the task that needs to be executed first. - - The priority of the process definition is that some processes need to be processed before other processes. This can be configured at the start of the process or at the time of scheduled start. There are 5 levels, followed by HIGHEST, HIGH, MEDIUM, LOW, and LOWEST. As shown below

@@ -308,8 +288,6 @@ Public class TaskLogFilter extends Filter { } ``` - - ### summary Starting from the scheduling, this paper introduces the architecture principle and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued diff --git a/docs/docs/en/contribute/backend/mechanism/global-parameter.md b/docs/docs/en/contribute/backend/mechanism/global-parameter.md index 53b73747d86d..b7e1f0897df4 100644 --- a/docs/docs/en/contribute/backend/mechanism/global-parameter.md +++ b/docs/docs/en/contribute/backend/mechanism/global-parameter.md @@ -59,3 +59,4 @@ Assign the parameters with matching values to varPool (List, which contains the * Format the varPool as json and pass it to master. * The parameters that are OUT would be written into the localParam after the master has received the varPool. + diff --git a/docs/docs/en/contribute/backend/mechanism/overview.md b/docs/docs/en/contribute/backend/mechanism/overview.md index 4f0d592c46da..2054f283da91 100644 --- a/docs/docs/en/contribute/backend/mechanism/overview.md +++ b/docs/docs/en/contribute/backend/mechanism/overview.md @@ -1,6 +1,6 @@ # Overview - * [Global Parameter](global-parameter.md) * [Switch Task type](task/switch.md) + diff --git a/docs/docs/en/contribute/backend/mechanism/task/switch.md b/docs/docs/en/contribute/backend/mechanism/task/switch.md index 490510405ee8..fcff2643d629 100644 --- a/docs/docs/en/contribute/backend/mechanism/task/switch.md +++ b/docs/docs/en/contribute/backend/mechanism/task/switch.md @@ -6,3 +6,4 @@ Switch task workflow step as follows * `SwitchTaskExecThread` processes the expressions defined in `switch` from top to bottom, obtains the value of the variable from `varPool`, and parses the expression through `javascript`. If the expression returns true, stop checking and record The order of the expression, here we record as resultConditionLocation. The task of SwitchTaskExecThread is over * After the `switch` task runs, if there is no error (more commonly, the user-defined expression is out of specification or there is a problem with the parameter name), then `MasterExecThread.submitPostNode` will obtain the downstream node of the `DAG` to continue execution. * If it is found in `DagHelper.parsePostNodes` that the current node (the node that has just completed the work) is a `switch` node, the `resultConditionLocation` will be obtained, and all branches except `resultConditionLocation` in the SwitchParameters will be skipped. In this way, only the branches that need to be executed are left + diff --git a/docs/docs/en/contribute/backend/spi/alert.md b/docs/docs/en/contribute/backend/spi/alert.md index 9b6c45e54753..b7934242e196 100644 --- a/docs/docs/en/contribute/backend/spi/alert.md +++ b/docs/docs/en/contribute/backend/spi/alert.md @@ -6,7 +6,7 @@ DolphinScheduler is undergoing a microkernel + plug-in architecture change. All For alarm-related codes, please refer to the `dolphinscheduler-alert-api` module. This module defines the extension interface of the alarm plug-in and some basic codes. When we need to realize the plug-inization of related functions, it is recommended to read the code of this block first. Of course, it is recommended that you read the document. This will reduce a lot of time, but the document There is a certain degree of lag. When the document is missing, it is recommended to take the source code as the standard (if you are interested, we also welcome you to submit related documents). In addition, we will hardly make changes to the extended interface (excluding new additions) , Unless there is a major structural adjustment, there is an incompatible upgrade version, so the existing documents can generally be satisfied. -We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple. +We use the native JAVA-SPI, when you need to extend, in fact, you only need to pay attention to the extension of the `org.apache.dolphinscheduler.alert.api.AlertChannelFactory` interface, the underlying logic such as plug-in loading, and other kernels have been implemented, Which makes our development more focused and simple. In additional, the `AlertChannelFactory` extends from `PrioritySPI`, this means you can set the plugin priority, when you have two plugin has the same name, you can customize the priority by override the `getIdentify` method. The high priority plugin will be load, but if you have two plugin with the same name and same priority, the server will throw `IllegalArgumentException` when load the plugin. @@ -26,8 +26,8 @@ If you don't care about its internal design, but simply want to know how to deve This module is currently a plug-in provided by us, and now we have supported dozens of plug-ins, such as Email, DingTalk, Script, etc. - #### Alert SPI Main class information. + AlertChannelFactory Alarm plug-in factory interface. All alarm plug-ins need to implement this interface. This interface is used to define the name of the alarm plug-in and the required parameters. The create method is used to create a specific alarm plug-in instance. @@ -56,36 +56,40 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design] * Email - Email alert notification + Email alert notification * DingTalk - Alert for DingTalk group chat bots - - Related parameter configuration can refer to the DingTalk robot document. + Alert for DingTalk group chat bots + + Related parameter configuration can refer to the DingTalk robot document. * EnterpriseWeChat - EnterpriseWeChat alert notifications + EnterpriseWeChat alert notifications - Related parameter configuration can refer to the EnterpriseWeChat robot document. + Related parameter configuration can refer to the EnterpriseWeChat robot document. * Script - We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications. + We have implemented a shell script for alerting. We will pass the relevant alert parameters to the script and you can implement your alert logic in the shell. This is a good way to interface with internal alerting applications. * SMS - SMS alerts + SMS alerts + * FeiShu FeiShu alert notification + * Slack Slack alert notification + * PagerDuty PagerDuty alert notification + * WebexTeams WebexTeams alert notification @@ -95,9 +99,10 @@ The specific design of alert_spi can be seen in the issue: [Alert Plugin Design] * Telegram Telegram alert notification - + Related parameter configuration can refer to the Telegram document. * Http We have implemented a Http script for alerting. And calling most of the alerting plug-ins end up being Http requests, if we not support your alert plug-in yet, you can use Http to realize your alert login. Also welcome to contribute your common plug-ins to the community :) + diff --git a/docs/docs/en/contribute/backend/spi/datasource.md b/docs/docs/en/contribute/backend/spi/datasource.md index 9738e073309e..caf8a5be46ec 100644 --- a/docs/docs/en/contribute/backend/spi/datasource.md +++ b/docs/docs/en/contribute/backend/spi/datasource.md @@ -22,4 +22,4 @@ In additional, the `DataSourceChannelFactory` extends from `PrioritySPI`, this m #### **Future plan** -Support data sources such as kafka, http, files, sparkSQL, FlinkSQL, etc. \ No newline at end of file +Support data sources such as kafka, http, files, sparkSQL, FlinkSQL, etc. diff --git a/docs/docs/en/contribute/backend/spi/registry.md b/docs/docs/en/contribute/backend/spi/registry.md index 0957ff3cdd26..b612ba5dcda2 100644 --- a/docs/docs/en/contribute/backend/spi/registry.md +++ b/docs/docs/en/contribute/backend/spi/registry.md @@ -6,9 +6,10 @@ Make the following configuration (take zookeeper as an example) * Registry plug-in configuration, take Zookeeper as an example (registry.properties) dolphinscheduler-service/src/main/resources/registry.properties + ```registry.properties - registry.plugin.name=zookeeper - registry.servers=127.0.0.1:2181 + registry.plugin.name=zookeeper + registry.servers=127.0.0.1:2181 ``` For specific configuration information, please refer to the parameter information provided by the specific plug-in, for example zk: `org/apache/dolphinscheduler/plugin/registry/zookeeper/ZookeeperConfiguration.java` diff --git a/docs/docs/en/contribute/backend/spi/task.md b/docs/docs/en/contribute/backend/spi/task.md index f909d42fa8c9..91ee108bad3f 100644 --- a/docs/docs/en/contribute/backend/spi/task.md +++ b/docs/docs/en/contribute/backend/spi/task.md @@ -14,4 +14,4 @@ In additional, the `TaskChannelFactory` extends from `PrioritySPI`, this means y Since the task plug-in involves the front-end page, the front-end SPI has not yet been implemented, so you need to implement the front-end page corresponding to the plug-in separately. -If there is a class conflict in the task plugin, you can use [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) to solve this problem. \ No newline at end of file +If there is a class conflict in the task plugin, you can use [Shade-Relocating Classes](https://maven.apache.org/plugins/maven-shade-plugin/) to solve this problem. diff --git a/docs/docs/en/contribute/e2e-test.md b/docs/docs/en/contribute/e2e-test.md index 82affec552a2..c6c49168a839 100644 --- a/docs/docs/en/contribute/e2e-test.md +++ b/docs/docs/en/contribute/e2e-test.md @@ -77,31 +77,31 @@ In addition, during the testing process, the elements are not manipulated direct The SecurityPage provides goToTab methods to test the corresponding sidebar jumps, mainly including TenantPage, UserPage, WorkerGroupPage and QueuePage. These pages are implemented in the same way, mainly to test whether the input, add and delete buttons of the form can return to the corresponding page. ```java - public T goToTab(Class tab) { - if (tab == TenantPage.class) { - WebElement menuTenantManageElement = new WebDriverWait(driver, 60) - .until(ExpectedConditions.elementToBeClickable(menuTenantManage)); - ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menuTenantManageElement); - return tab.cast(new TenantPage(driver)); - } - if (tab == UserPage.class) { - WebElement menUserManageElement = new WebDriverWait(driver, 60) - .until(ExpectedConditions.elementToBeClickable(menUserManage)); - ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menUserManageElement); - return tab.cast(new UserPage(driver)); - } - if (tab == WorkerGroupPage.class) { - WebElement menWorkerGroupManageElement = new WebDriverWait(driver, 60) - .until(ExpectedConditions.elementToBeClickable(menWorkerGroupManage)); - ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menWorkerGroupManageElement); - return tab.cast(new WorkerGroupPage(driver)); - } - if (tab == QueuePage.class) { - menuQueueManage().click(); - return tab.cast(new QueuePage(driver)); - } - throw new UnsupportedOperationException("Unknown tab: " + tab.getName()); - } +public T goToTab(Class tab) { + if (tab == TenantPage.class) { + WebElement menuTenantManageElement = new WebDriverWait(driver, 60) + .until(ExpectedConditions.elementToBeClickable(menuTenantManage)); + ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menuTenantManageElement); + return tab.cast(new TenantPage(driver)); + } + if (tab == UserPage.class) { + WebElement menUserManageElement = new WebDriverWait(driver, 60) + .until(ExpectedConditions.elementToBeClickable(menUserManage)); + ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menUserManageElement); + return tab.cast(new UserPage(driver)); + } + if (tab == WorkerGroupPage.class) { + WebElement menWorkerGroupManageElement = new WebDriverWait(driver, 60) + .until(ExpectedConditions.elementToBeClickable(menWorkerGroupManage)); + ((JavascriptExecutor)driver).executeScript("arguments[0].click();", menWorkerGroupManageElement); + return tab.cast(new WorkerGroupPage(driver)); + } + if (tab == QueuePage.class) { + menuQueueManage().click(); + return tab.cast(new QueuePage(driver)); + } + throw new UnsupportedOperationException("Unknown tab: " + tab.getName()); + } ``` ![SecurityPage](../../../img/e2e-test/SecurityPage.png) @@ -146,14 +146,14 @@ The following is an example of a tenant management test. As explained earlier, w The browser is loaded using the RemoteWebDriver provided with Selenium. Before each test case is started there is some preparation work that needs to be done. For example: logging in the user, jumping to the corresponding page (depending on the specific test case). ```java - @BeforeAll - public static void setup() { - new LoginPage(browser) - .login("admin", "dolphinscheduler123") - .goToNav(SecurityPage.class) - .goToTab(TenantPage.class) - ; - } +@BeforeAll +public static void setup() { + new LoginPage(browser) + .login("admin", "dolphinscheduler123") + .goToNav(SecurityPage.class) + .goToTab(TenantPage.class) + ; +} ``` When the preparation is complete, it is time for the formal test case writing. We use a form of @Order() annotation for modularity, to confirm the order of the tests. After the tests have been run, assertions are used to determine if the tests were successful, and if the assertion returns true, the tenant creation was successful. The following code can be used as a reference: @@ -176,14 +176,14 @@ The rest are similar cases and can be understood by referring to the specific so https://github.com/apache/dolphinscheduler/tree/dev/dolphinscheduler-e2e/dolphinscheduler-e2e-case/src/test/java/org/apache/dolphinscheduler/e2e/cases -## III. Supplements +## III. Supplements -When running E2E tests locally, First, you need to start the local service, you can refer to this page: +When running E2E tests locally, First, you need to start the local service, you can refer to this page: [development-environment-setup](./development-environment-setup.md) When running E2E tests locally, the `-Dlocal=true` parameter can be configured to connect locally and facilitate changes to the UI. -When running E2E tests with `M1` chip, you can use `-Dm1_chip=true` parameter to configure containers supported by +When running E2E tests with `M1` chip, you can use `-Dm1_chip=true` parameter to configure containers supported by `ARM64`. ![Dlocal](../../../img/e2e-test/Dlocal.png) diff --git a/docs/docs/en/contribute/frontend-development.md b/docs/docs/en/contribute/frontend-development.md index 297a7ccee0da..9ab23cc5be29 100644 --- a/docs/docs/en/contribute/frontend-development.md +++ b/docs/docs/en/contribute/frontend-development.md @@ -1,6 +1,7 @@ # Front-end development documentation ### Technical selection + ``` Vue mvvm framework @@ -17,10 +18,16 @@ Lodash high performance JavaScript utility library ### Development environment -- #### Node installation -Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/` +- + +#### Node installation + +Node package download (note version v12.20.2) `https://nodejs.org/download/release/v12.20.2/` + +- + +#### Front-end project construction -- #### Front-end project construction Use the command line mode `cd` enter the `dolphinscheduler-ui` project directory and execute `npm install` to pull the project dependency package. > If `npm install` is very slow, you can set the taobao mirror @@ -36,13 +43,16 @@ npm config set registry http://registry.npm.taobao.org/ API_BASE = http://127.0.0.1:12345 ``` -> ##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution. +##### ! ! ! Special attention here. If the project reports a "node-sass error" error while pulling the dependency package, execute the following command again after execution. ```bash npm install node-sass --unsafe-perm #Install node-sass dependency separately ``` -- #### Development environment operation +- + +#### Development environment operation + - `npm start` project development environment (after startup address http://localhost:8888) #### Front-end project release @@ -140,6 +150,7 @@ Public module and utill `src/js/module` Home => `http://localhost:8888/#/home` Project Management => `http://localhost:8888/#/projects/list` + ``` | Project Home | Workflow @@ -149,6 +160,7 @@ Project Management => `http://localhost:8888/#/projects/list` ``` Resource Management => `http://localhost:8888/#/resource/file` + ``` | File Management | udf Management @@ -159,6 +171,7 @@ Resource Management => `http://localhost:8888/#/resource/file` Data Source Management => `http://localhost:8888/#/datasource/list` Security Center => `http://localhost:8888/#/security/tenant` + ``` | Tenant Management | User Management @@ -174,16 +187,19 @@ User Center => `http://localhost:8888/#/user/account` The project `src/js/conf/home` is divided into `pages` => route to page directory + ``` - The page file corresponding to the routing address +The page file corresponding to the routing address ``` `router` => route management + ``` vue router, the entry file index.js in each page will be registered. Specific operations: https://router.vuejs.org/zh/ ``` `store` => status management + ``` The page corresponding to each route has a state management file divided into: @@ -201,9 +217,13 @@ Specific action:https://vuex.vuejs.org/zh/ ``` ## specification + ## Vue specification + ##### 1.Component name + The component is named multiple words and is connected with a wire (-) to avoid conflicts with HTML tags and a clearer structure. + ``` // positive example export default { @@ -212,7 +232,9 @@ export default { ``` ##### 2.Component files + The internal common component of the `src/js/module/components` project writes the folder name with the same name as the file name. The subcomponents and util tools that are split inside the common component are placed in the internal `_source` folder of the component. + ``` └── components ├── header @@ -228,6 +250,7 @@ The internal common component of the `src/js/module/components` project writes t ``` ##### 3.Prop + When you define Prop, you should always name it in camel format (camelCase) and use the connection line (-) when assigning values to the parent component. This follows the characteristics of each language, because it is case-insensitive in HTML tags, and the use of links is more friendly; in JavaScript, the more natural is the hump name. @@ -270,7 +293,9 @@ props: { ``` ##### 4.v-for + When performing v-for traversal, you should always bring a key value to make rendering more efficient when updating the DOM. + ```