-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement][common] get application id in SHELL scripts #4025
Comments
I think this is a good idea |
This is a public repo which can achieve this function, https://github.com/gabrywu/Aop2YarnClient |
it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu |
Do you have any good ideas to resolve it? @xiejiajun |
I thought about writing the appId to a public storage such as Mysql, but it will introduce additional third-party service configuration such as JdbcUrl , so we still need to think about it carefully. |
Yes, so the example project just put it to a local file |
Describe the question
For now, if we execute a yarn job in a SHELL script, we find the application IDs in the logs by regex 'application_\d+_\d+'.
I think it's so ugly and has performance issues. So I suggest that we register an aspect when executing 'yarn jar' command,
we can weave a join point to org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication, where we can get the submitted application id and the tracking URL, and output them into one local file
What are the current deficiencies and the benefits of improvement
need the aspectjweaver-1.9.6.jar file, which size is about 2MB
no need to retrieve the whole log with the regex 'application_\d+_\d+'.
no need to restrict yarn client log level to INFO
Which version of DolphinScheduler:
Describe alternatives you've considered
add the following two env to global envs
export YARN_CLIENT_OPTS="-javaagent:/pathto/aspectjweaver-1.9.6.jar"
export YARN_USER_CLASSPATH=/pathto/Aop2YarnClient-1.0-SNAPSHOT.jar
Then when submitting applications to the yarn cluster, the aspect in Aop2YarnClient-1.0-SNAPSHOT.jar will be registered, and we can get the submitted application id and the tracking URL
This is an example, I just output the application id to console
Here is the sample code
The solution is suitable for Hive, Spark, Flink, and other tools running the yarn cluster. 'hive -e 'hive sql'' test passed
The text was updated successfully, but these errors were encountered: