Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor AppBase to use common AppMetaData between Q/P tools #983

Merged
merged 3 commits into from
May 2, 2024

Conversation

amahussein
Copy link
Collaborator

@amahussein amahussein commented May 1, 2024

Signed-off-by: Ahmed Hussein (amahussein) [email protected]

Contributes to #980

  • this code change aims at using common logic to create and update the application info inside Tools.

Changes

  • No changes in unit tests was needed.
  • Created a new class AppMetaData that holds the basic information of a spark application: startTime, user, appId, appName, endTime, duration and flag to indicate whether duration is estimated.
  • Handling ApplicationStart became common between the EventParser classes: this means that the following functionswere removed:
    • QualificationEventProcessor.doSparkListenerApplicationStart
    • EventsProcessor.doSparkListenerApplicationStart
    • EventsProcessor.doSparkListenerApplicationEnd
    • EventsProcessor.doSparkListenerTaskStart
  • Removed unused case classes
    • case class ApplicationCase
    • case class PlanNodeAccumCase
    • case QualSQLExecutionInfo and
    • case QualApplicationInfo
  • Moved the ML flags mlEventLogType and pysparkLogFlag to the trait. This goes along having all the predicates in one place such as "isHive, isGpu...etc"
  • For AppBase:
    • added appMetaData field
    • appId becomes a method. thevalue can be retrieved from the AppMetadata
    • removed appEndTime because it is redundant as we hold the endTime inside appMetaData
    • Created estimateAppEndTime which is used between Qualification and Profiling to estimate the endTime if it is missing.
    • Temporary, I moved checkMLOps to the QualificationAppInfo because it is not being used by the profiling and we need to reconsider how to etract ML anyway.
    • There is a bunch of methods defined inside AppBase but they do not have anything to do with its fields or methods. I moved those to the object for now until we find a btter class member for them.
      • getPlanMetaWithSchema
      • getPlanInfoWithHiveScan
      • trimSchema

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

Contributes to NVIDIA#980

- this code change aims at using common logic to create and update the application
  info inside Tools.
Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein ! I am half done with the review. Need to go over the callback functions and EventProcessor refactor.


// Returns a boolean true/false. This is used to check whether processing an eventlog was
// successful.
def isAppMetaDefined: Boolean = appMetaData.isDefined
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the appMetaData honestly seems a bit weird to me. I know we had ApplicationCase before, but if you are accessing things from AppBase should those just exist in the AppBase, we have this weird split of things where some are in both but we seem to wrap but we can still access appMetaData directly. I think its fine to leave for now but its confusing and we should better define the encapsulation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tgravescs

My thought about the AppMetadata that those fields were modified and updated in different places
Do you think an inner class would better? or just getting rid of the AppMetadata by encapsulating operations on fields inside AppBase methods?

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @amahussein !

@amahussein amahussein merged commit 377103e into NVIDIA:dev May 2, 2024
15 checks passed
@amahussein amahussein deleted the spark-rapids-tools-980-appinfo branch May 2, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants