-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement][Task] Improved way to collect yarn job's appIds #11262
Comments
Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
|
Do you have any good idea? AFAIK, we can use xx task SDK to submit task, and can get the appId from SDK, then we don't need to parse from log. Or we can optimize the currently |
@ruanwenjun Yeh, maybe a practicable solution, we can simply talk about it. Before submitting a yarn job, the client apply the application context from RM first, and get appId which will be then written into NM's environment variable. We can use java agent to read it before executing yarn job's JAR file, also, can take taskInstanceId as input of agent program. However, where to store this mapping relationship need to be further considered. Please let me know if you have any good suggestions! |
In fact, there is already a issue(#4025) talk about use agent to collect the appId, but I think it isn't a good way 😢 , we need to maintain a agent and we may need to maintain different version agant. |
I think there's no need to maintain different version agent, for example, we can parse the appId from some environment variables such as So, it seems like yarn jobs submitted by shell command can all get appId in this way. Anyway, there are some other design problems, like where to store the mapping relationship, as mentioned in issue(#4025). I'll carefully think about that. |
@ruanwenjun Feel sorry that i don't have the production environment, so I'm not sure whether it's a bug or i understand it wrong. |
This is a history issue, in the before, there exist a LogServer deploy at the worker's machine. |
You need to make sure the agent can work for all yarn client. |
@ruanwenjun I think most of yarn clients can share the same agent, cuz in these clients, AOP will intercept func |
Hi, @Radeity @ruanwenjun I agree that the current way of getting the yarn application id from the log is not elegant.
What do you think? Any comments or discussions are welcome. |
Hi, @rickchengx First, thanks for your idea! However, i think this way have two problems as follow:
|
@Radeity , thanks for the reply. Here is more info about the way by tagging:
In addition, as for the AOP way, in general I feel that an additional jar package is required, and the way of outputting the application id to a separate file is kind of odd. Is there a more elegant way to implement it ? ( As much as possible to lighten things up that DS needs extra maintenance) After all, we just need to get the |
Thanks for your detailed explanation. Compared with the tag way, aop can handle shell task, in addition, not invade into DS task definition code. Also, an additional jar package is required, you're right, however, this temporary appInfo log file is just for fetching applicationId in time, when the task is done, appId will be written into TaskExecutionContext as same as original way. Moreover, extra maintenance is only need when compute engines change their supported way to add configuration like java-opts or yarn client change its submit function which i really think not a big deal, cuz they have remained unchanged for many years. Think of, for example, Wechat pay has been used for many years and we can scan QR code to pay for something, it's already in widely use and will not suffer a sudden change. Anyway, i have to say, yarn client may update, new compute engine will come out, but for this aop way in DS, the cost of potential maintenance is relatively smaller enough than other code part, such as generated command line to submit spark task. For the last point, i agree with you, stability is worth considering. For smooth transmition, my opinion is to keep both original and new aop way, provide extra configuration for user to choose how to fetch applicationId. If the aop way is stable enough, we can then consider whether to complete replace the original way. What do you think of it? Any more elegant idea would be appreciated! |
import aop way to collect yarn job's applicationId add new environment configuration for each type of yarn tasks to support aop add user property `appId.collect` for user to decide how to collect applicationId This closes apache#11262
Search before asking
Description
Current way to collect appIds is scan log files and parse them, it's inefficient and will cause OOM if log file is large, which has been mentioned in issue#11214. This potential problem can only be permanently solved by changing a new way to collect appIds which avoid reading log files.
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: