Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Missing Metrics in Photon Event Logs Affecting QualX Predictions #1388

Closed
Tracked by #251
parthosa opened this issue Oct 18, 2024 · 0 comments · Fixed by #1390
Closed
Tracked by #251

[BUG] Missing Metrics in Photon Event Logs Affecting QualX Predictions #1388

parthosa opened this issue Oct 18, 2024 · 0 comments · Fixed by #1390
Assignees
Labels
bug Something isn't working core_tools Scope the core module (scala)

Comments

@parthosa
Copy link
Collaborator

Describe the bug

Photon event logs do not store certain metrics, such as scan time, shuffle write time, and peak execution memory, in the same format as CPU Spark event logs. These metrics are used by QualX for prediction purposes.

Missing Metrics/Features

Feature Type
scan_time Spark Metric
sw_writeTime_mean Spark Metric
peakExecutionMemory_max Spark Metric
sqlOp_SubqueryBroadcast Exec
sqlOp_RunningWindowFunction Exec
sqlOp_Expand Exec

Solution

After investigation we found alternative ways to calculate some of these metrics:

  1. PhotonScan nodes provide a cumulative time metric that can be used as a replacement for the scan time metric.
  2. shuffle write time can be reconstructed using the following metrics:
    1. time taken waiting on file write IO (part of shuffle file write)
    2. time taken to sort rows by partition ID (part of shuffle file write)
    3. time taken to convert columns to rows (part of shuffle file write)
  3. Photon nodes provide a peak memory usage metric, which can be used for the peak execution memory metric.

cc: @amahussein @leewyang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants