Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement][common] get application id in SHELL scripts #4025

Closed
gabrywu opened this issue Nov 4, 2020 · 7 comments
Closed

[Improvement][common] get application id in SHELL scripts #4025

gabrywu opened this issue Nov 4, 2020 · 7 comments
Assignees
Labels
discussion discussion enhancement New feature or request

Comments

@gabrywu
Copy link
Member

gabrywu commented Nov 4, 2020

Describe the question
For now, if we execute a yarn job in a SHELL script, we find the application IDs in the logs by regex 'application_\d+_\d+'.
I think it's so ugly and has performance issues. So I suggest that we register an aspect when executing 'yarn jar' command,
we can weave a join point to org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication, where we can get the submitted application id and the tracking URL, and output them into one local file

What are the current deficiencies and the benefits of improvement

  • deficiency:
    need the aspectjweaver-1.9.6.jar file, which size is about 2MB
  • benefit:
    no need to retrieve the whole log with the regex 'application_\d+_\d+'.
    no need to restrict yarn client log level to INFO

Which version of DolphinScheduler:

  • all version

Describe alternatives you've considered

add the following two env to global envs
export YARN_CLIENT_OPTS="-javaagent:/pathto/aspectjweaver-1.9.6.jar"

export YARN_USER_CLASSPATH=/pathto/Aop2YarnClient-1.0-SNAPSHOT.jar
Then when submitting applications to the yarn cluster, the aspect in Aop2YarnClient-1.0-SNAPSHOT.jar will be registered, and we can get the submitted application id and the tracking URL

This is an example, I just output the application id to console
image

Here is the sample code
image

The solution is suitable for Hive, Spark, Flink, and other tools running the yarn cluster. 'hive -e 'hive sql'' test passed

@gabrywu gabrywu self-assigned this Nov 4, 2020
@gabrywu gabrywu added discussion discussion enhancement New feature or request suggestion labels Nov 4, 2020
@CalvinKirs
Copy link
Member

I think this is a good idea

@gabrywu gabrywu changed the title [Improvement][common] Improvement title [Improvement][common] get application id in SHELL script Nov 5, 2020
@gabrywu gabrywu changed the title [Improvement][common] get application id in SHELL script [Improvement][common] get application id in SHELL scripts Nov 5, 2020
@gabrywu
Copy link
Member Author

gabrywu commented Nov 7, 2020

This is a public repo which can achieve this function, https://github.com/gabrywu/Aop2YarnClient

@xiejiajun
Copy link

it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu

@gabrywu
Copy link
Member Author

gabrywu commented Nov 18, 2020

it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu

Do you have any good ideas to resolve it? @xiejiajun

@xiejiajun
Copy link

it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu

Do you have any good ideas to resolve it? @xiejiajun

I thought about writing the appId to a public storage such as Mysql, but it will introduce additional third-party service configuration such as JdbcUrl , so we still need to think about it carefully.

@gabrywu
Copy link
Member Author

gabrywu commented Nov 29, 2020

it will not be able to fetch the applicationId in the case of use HiveServer2 submitting the SQL, should we consider storing the appId information in public storage? @gabrywu

Do you have any good ideas to resolve it? @xiejiajun

I thought about writing the appId to a public storage such as Mysql, but it will introduce additional third-party service configuration such as JdbcUrl , so we still need to think about it carefully.

Yes, so the example project just put it to a local file

@ruanwenjun
Copy link
Member

@caishunfeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants