Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: data quality may fail in docker mode #15563

Merged
merged 14 commits into from
Feb 5, 2024
2 changes: 1 addition & 1 deletion deploy/kubernetes/dolphinscheduler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ Please refer to the [Quick Start in Kubernetes](../../../docs/docs/en/guide/inst
| conf.common."alert.rpc.port" | int | `50052` | rpc port |
| conf.common."appId.collect" | string | `"log"` | way to collect applicationId: log, aop |
| conf.common."conda.path" | string | `"/opt/anaconda3/etc/profile.d/conda.sh"` | set path of conda.sh |
| conf.common."data-quality.jar.name" | string | `"dolphinscheduler-data-quality-dev-SNAPSHOT.jar"` | data quality option |
| conf.common."data-quality.jar.dir" | string | `nil` | data quality option |
| conf.common."data.basedir.path" | string | `"/tmp/dolphinscheduler"` | user data local directory path, please make sure the directory exists and have read write permissions |
| conf.common."datasource.encryption.enable" | bool | `false` | datasource encryption enable |
| conf.common."datasource.encryption.salt" | string | `"!@#$%^&*"` | datasource encryption salt |
Expand Down
2 changes: 1 addition & 1 deletion deploy/kubernetes/dolphinscheduler/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ conf:
datasource.encryption.salt: '!@#$%^&*'

# -- data quality option
data-quality.jar.name: dolphinscheduler-data-quality-dev-SNAPSHOT.jar
data-quality.jar.dir:

# -- Whether hive SQL is executed in the same session
support.hive.oneSession: false
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/en/architecture/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ The default configuration is as follows:
| yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | job history status url of yarn |
| datasource.encryption.enable | false | whether to enable datasource encryption |
| datasource.encryption.salt | !@#$%^&* | the salt of the datasource encryption |
| data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | the jar of data quality |
| data-quality.jar.dir | | the jar of data quality |
| support.hive.oneSession | false | specify whether hive SQL is executed in the same session |
| sudo.enable | true | whether to enable sudo |
| alert.rpc.port | 50052 | the RPC port of Alert Server |
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/en/guide/data-quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The execution logic of the data quality task is as follows:
- The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user.
- If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.
- If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.name` in `common.properties` with attribute name `data-quality.jar.name`
- If you package `data-quality` separately, remember to modify the package name to be consistent with `data-quality.jar.dir` in `common.properties` with attribute name `data-quality.jar.dir`
- If the old version is upgraded and used, you need to execute the `sql` update script to initialize the database before running.
- `dolphinscheduler-data-quality-dev-SNAPSHOT.jar` was built with no dependencies. If a `JDBC` driver is required, you can set the `-jars` parameter in the `node settings` `Option Parameters`, e.g. `--jars /lib/jars/mysql-connector-java-8.0.16.jar`.
- Currently only `MySQL`, `PostgreSQL` and `HIVE` data sources have been tested, other data sources have not been tested yet.
Expand Down
7 changes: 4 additions & 3 deletions docs/docs/en/guide/resource/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,10 @@ datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality absolute path, it would auto discovery from libs directory. You can also specific the jar name in libs directory
# if you re-build it alone, or auto discovery mechanism fail
data-quality.jar.name=
# data quality jar directory path, it would auto discovery data quality jar from this given dir. You should keep it empty if you do not change anything in
# data-quality, it will auto discovery by dolphinscheduler itself. Change it only if you want to use your own data-quality jar and it is not in worker-server
# libs directory(but may sure your jar name start with `dolphinscheduler-data-quality`).
data-quality.jar.dir=

#data-quality.error.output.path=/tmp/data-quality-error-data

Expand Down
1 change: 1 addition & 0 deletions docs/docs/en/guide/upgrade/incompatible.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ This document records the incompatible updates between each version. You need to
* Change the default unix shell executor from sh to bash ([#12180](https://github.com/apache/dolphinscheduler/pull/12180)).
* Remove `deleteSource` in `download()` of `StorageOperate` ([#14084](https://github.com/apache/dolphinscheduler/pull/14084))
* Remove default key for attribute `data-quality.jar.name` in `common.properties` ([#15551](https://github.com/apache/dolphinscheduler/pull/15551))
* Rename attribute `data-quality.jar.name` to `data-quality.jar.dir` in `common.properties` and represent for directory ([#15563](https://github.com/apache/dolphinscheduler/pull/15563))

## 3.2.0

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/zh/architecture/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn/applicationId
| yarn.job.history.status.address | http://ds1:19888/ws/v1/history/mapreduce/jobs/%s | yarn的作业历史状态URL |
| datasource.encryption.enable | false | 是否启用datasource 加密 |
| datasource.encryption.salt | !@#$%^&* | datasource加密使用的salt |
| data-quality.jar.name | dolphinscheduler-data-quality-dev-SNAPSHOT.jar | 配置数据质量使用的jar包 |
| data-quality.jar.dir | | 配置数据质量使用的jar包 |
| support.hive.oneSession | false | 设置hive SQL是否在同一个session中执行 |
| sudo.enable | true | 是否开启sudo |
| alert.rpc.port | 50052 | Alert Server的RPC端口 |
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/zh/guide/data-quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
>
## 注意事项

- 如果单独打包`data-quality`的话,记得修改包名和`data-quality.jar.name`一致,配置内容在 `common.properties` 中的 `data-quality.jar.name`
- 如果单独打包`data-quality`的话,记得修改包路径和`data-quality.jar.dir`一致,配置内容在 `common.properties` 中的 `data-quality.jar.dir`
- 如果是老版本升级使用,运行之前需要先执行`SQL`更新脚本进行数据库初始化。
- 当前 `dolphinscheduler-data-quality-dev-SNAPSHOT.jar` 是瘦包,不包含任何 `JDBC` 驱动。
如果有 `JDBC` 驱动需要,可以在`节点设置` `选项参数`处设置 `--jars` 参数,
Expand Down
7 changes: 4 additions & 3 deletions docs/docs/zh/guide/resource/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,10 @@ datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality absolute path, it would auto discovery from libs directory. You can also specific the jar name in libs directory
# if you re-build it alone, or auto discovery mechanism fail
data-quality.jar.name=
# data quality jar directory path, it would auto discovery data quality jar from this given dir. You should keep it empty if you do not change anything in
# data-quality, it will auto discovery by dolphinscheduler itself. Change it only if you want to use your own data-quality jar and it is not in worker-server
# libs directory(but may sure your jar name start with `dolphinscheduler-data-quality`).
data-quality.jar.dir=

#data-quality.error.output.path=/tmp/data-quality-error-data

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,10 @@ datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality absolute path, it would auto discovery from libs directory. You can also specific the jar name in libs directory
# if you re-build it alone, or auto discovery mechanism fail
data-quality.jar.name=
# data quality jar directory path, it would auto discovery data quality jar from this given dir. You should keep it empty if you do not change anything in
# data-quality, it will auto discovery by dolphinscheduler itself. Change it only if you want to use your own data-quality jar and it is not in worker-server
# libs directory(but may sure your jar name start with `dolphinscheduler-data-quality`).
data-quality.jar.dir=

#data-quality.error.output.path=/tmp/data-quality-error-data

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,10 @@ datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality absolute path, it would auto discovery from libs directory. You can also specific the jar name in libs directory
# if you re-build it alone, or auto discovery mechanism fail
data-quality.jar.name=
# data quality jar directory path, it would auto discovery data quality jar from this given dir. You should keep it empty if you do not change anything in
# data-quality, it will auto discovery by dolphinscheduler itself. Change it only if you want to use your own data-quality jar and it is not in worker-server
# libs directory(but may sure your jar name start with `dolphinscheduler-data-quality`).
data-quality.jar.dir=

#data-quality.error.output.path=/tmp/data-quality-error-data

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,10 @@ datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality absolute path, it would auto discovery from libs directory. You can also specific the jar name in libs directory
# if you re-build it alone, or auto discovery mechanism fail
data-quality.jar.name=
# data quality jar directory path, it would auto discovery data quality jar from this given dir. You should keep it empty if you do not change anything in
# data-quality, it will auto discovery by dolphinscheduler itself. Change it only if you want to use your own data-quality jar and it is not in worker-server
# libs directory(but may sure your jar name start with `dolphinscheduler-data-quality`).
data-quality.jar.dir=

#data-quality.error.output.path=/tmp/data-quality-error-data

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
package org.apache.dolphinscheduler.plugin.datasource.api.utils;

import static org.apache.dolphinscheduler.common.constants.Constants.RESOURCE_STORAGE_TYPE;
import static org.apache.dolphinscheduler.plugin.task.api.TaskConstants.DATA_QUALITY_JAR_NAME;
import static org.apache.dolphinscheduler.plugin.task.api.TaskConstants.DATA_QUALITY_JAR_DIR;
import static org.apache.dolphinscheduler.plugin.task.api.TaskConstants.HADOOP_SECURITY_AUTHENTICATION;
import static org.apache.dolphinscheduler.plugin.task.api.TaskConstants.HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATE;
import static org.apache.dolphinscheduler.plugin.task.api.TaskConstants.JAVA_SECURITY_KRB5_CONF;
Expand Down Expand Up @@ -133,14 +133,28 @@ public static boolean loadKerberosConf(String javaSecurityKrb5Conf, String login
}

public static String getDataQualityJarPath() {
String dqsJarPath = PropertyUtils.getString(DATA_QUALITY_JAR_NAME);
log.info("Trying to get data quality jar in path");
String dqJarDir = PropertyUtils.getString(DATA_QUALITY_JAR_DIR);

if (StringUtils.isNotEmpty(dqJarDir)) {
log.info(
"Configuration data-quality.jar.dir is not empty, will try to get data quality jar from directory {}",
dqJarDir);
getDataQualityJarPathFromPath(dqJarDir).ifPresent(jarName -> DEFAULT_DATA_QUALITY_JAR_PATH = jarName);
}

if (StringUtils.isEmpty(DEFAULT_DATA_QUALITY_JAR_PATH)) {
log.info("data quality jar path is empty, will try to auto discover it from build-in rules.");
getDefaultDataQualityJarPath();
}

if (StringUtils.isEmpty(dqsJarPath)) {
log.info("data quality jar path is empty, will try to get it from data quality jar name");
return getDefaultDataQualityJarPath();
if (StringUtils.isEmpty(DEFAULT_DATA_QUALITY_JAR_PATH)) {
log.error(
"Can not find data quality jar in both configuration and auto discover, please check your configuration or report a bug.");
throw new RuntimeException("data quality jar path is empty");
}

return dqsJarPath;
return DEFAULT_DATA_QUALITY_JAR_PATH;
}

private static String getDefaultDataQualityJarPath() {
Expand Down Expand Up @@ -173,14 +187,15 @@ private static Optional<String> getDataQualityJarPathFromPath(String path) {
log.info("Try to get data quality jar from path {}", path);
File[] jars = new File(path).listFiles();
if (jars == null) {
log.warn("No data quality related jar found from path {}", path);
log.warn("No any files find given path {}", path);
return Optional.empty();
}
for (File jar : jars) {
if (jar.getName().startsWith("dolphinscheduler-data-quality")) {
return Optional.of(jar.getAbsolutePath());
}
}
log.warn("No data quality related jar found from path {}", path);
return Optional.empty();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,10 @@ datasource.encryption.enable=false
# datasource encryption salt
datasource.encryption.salt=!@#$%^&*

# data quality option, it would auto discovery from libs directory. You can also specific the jar name in libs directory
# if you re-build it alone, or auto discovery mechanism fail
data-quality.jar.name=
# data quality jar directory path, it would auto discovery data quality jar from this given dir. You should keep it empty if you do not change anything in
# data-quality, it will auto discovery by dolphinscheduler itself. Change it only if you want to use your own data-quality jar and it is not in worker-server
# libs directory(but may sure your jar name start with `dolphinscheduler-data-quality`).
data-quality.jar.dir=

#data-quality.error.output.path=/tmp/data-quality-error-data

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ FROM eclipse-temurin:8-jdk
ENV DOCKER true
ENV TZ Asia/Shanghai
ENV DOLPHINSCHEDULER_HOME /opt/dolphinscheduler
ENV DATA_QUALITY_JAR_DIR /opt/dolphinscheduler/libs/worker-server

RUN apt update ; \
apt install -y sudo ; \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -358,9 +358,9 @@ private TaskConstants() {
public static final String RESOURCE_UPLOAD_PATH = "resource.storage.upload.base.path";

/**
* data.quality.jar.name
* data.quality.jar.dir
*/
public static final String DATA_QUALITY_JAR_NAME = "data-quality.jar.name";
public static final String DATA_QUALITY_JAR_DIR = "data-quality.jar.dir";

public static final String TASK_TYPE_CONDITIONS = "CONDITIONS";

Expand Down
Loading
Loading