Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display Spark read metrics on Spark SQL UI #7447

Merged
merged 14 commits into from
Jul 20, 2023

Conversation

karuppayya
Copy link
Contributor

Screen Shot 2023-04-27 at 6 42 38 AM

seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
Map<String, SQLMetric> metrics = sparkPlans.get(0).metrics();

Assert.assertTrue(metrics.contains(TotalFileSize.name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assert.assertTrue(metrics.contains(TotalFileSize.name));
Assertions.assertThat(metrics).contains(TotalFileSize.name);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as that will show the content of the map in case the assertion ever fails

public CustomTaskMetric[] reportDriverMetrics() {
List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
MetricsReport metricsReport = sparkReadMetricReporter.getMetricsReport();
ScanReport scanReport = (ScanReport) metricsReport;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than casting here I think what would be better is to provide the necessary type of metrics report in the SparkReadMetricsReporter? This is because report(..) doesn't guarantee that you will only get a ScanReport


public class TotalPlanningDuration implements CustomMetric {

static final String name = "planningDuration";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be totalPlanningDuration instead? Same in the description

}

public Optional<ScanReport> getScanReport() {
return Optional.ofNullable((ScanReport) metricsReport);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Optional.ofNullable((ScanReport) metricsReport);
return metricsReport instanceof ScanReport
? Optional.of((ScanReport) metricsReport)
: Optional.empty();

@@ -256,4 +275,56 @@ public String toString() {
runtimeFilterExpressions,
caseSensitive());
}

@Override
public CustomTaskMetric[] reportDriverMetrics() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why implement this only in SparkBatchQueryScan? Can we do this in SparkScan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -65,6 +82,7 @@ class SparkBatchQueryScan extends SparkPartitioningAwareScan<PartitionScanTask>
private final Long endSnapshotId;
private final Long asOfTimestamp;
private final String tag;
private final SparkReadMetricReporter sparkReadMetricReporter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using Supplier<ScanReport> scanReportSupplier to make this independent from a particular metrics reporter? We can use metricsReporter::scanReport closure to construct it.

InMemoryMetricsReporter metricsReporter = new InMemoryMetricsReporter();
...
scan.metricsReporter(metricsReporter)
...
return new SparkBatchQueryScan(..., metricsReporter::scanReport);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Override
public CustomTaskMetric[] reportDriverMetrics() {
List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
Optional<ScanReport> scanReportOptional = sparkReadMetricReporter.getScanReport();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg historically uses null instead of Optional and I think we should continue to follow that. Also, Spotless makes these closures really hard to heard.

What about something like this?

@Override
public CustomTaskMetric[] reportDriverMetrics() {
  ScanReport scanReport = scanReportSupplier.get();

  if (scanReport == null) {
    return new CustomTaskMetric[0];
  }

  List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
  driverMetrics.add(TaskTotalFileSize.from(scanReport));
  ...
  return driverMetrics.toArray(new CustomTaskMetric[0]);
}

Where TaskTotalFileSize would be defined as follows:

public class TaskTotalFileSize implements CustomTaskMetric {

  private final long value;

  public static TaskTotalFileSize from(ScanReport scanReport) {
    CounterResult counter = scanReport.scanMetrics().totalFileSizeInBytes();
    long value = counter != null ? counter.value() : -1;
    return new TotalFileSizeTaskMetric(value);
  }

  private TaskTotalFileSize(long value) {
    this.value = value;
  }

  @Override
  public String name() {
    return TotalFileSize.NAME;
  }

  @Override
  public long value() {
    return value;
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Override
public CustomMetric[] supportedCustomMetrics() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overrides supportedCustomMetrics() in SparkScan and breaks it. Can we move this logic there and combine with existing metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -420,12 +423,15 @@ private Scan buildBatchScan() {
private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch, String tag) {
Schema expectedSchema = schemaWithMetadataColumns();

sparkReadMetricReporter = new SparkReadMetricReporter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why init it here again? We should either use local vars or not init it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


@Override
public String description() {
return "Result data files";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow what Spark does in other places. Specifically, its name does not with capital letters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.util.Arrays;
import org.apache.spark.sql.connector.metric.CustomMetric;

public class ScannedDataManifests implements CustomMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments above apply to all metrics below.

import org.apache.iceberg.metrics.MetricsReporter;
import org.apache.iceberg.metrics.ScanReport;

public class SparkReadMetricReporter implements MetricsReporter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name? Shall we call it InMemoryMetricsReporter and more to core?

this.metricsReport = report;
}

public Optional<ScanReport> getScanReport() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use Optional and getXXX prefix.

public ScanReport scanReport() {
  return (ScanReport) metricsReport;
}

We can check if the value is null later.


@Override
public ScanReport get() {
return (ScanReport) metricsReport;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no guarantee that this will always return a ScanReport. There are also other types of reports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code to check type

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems close. I did another round.


super(spark, table, scan, readConf, expectedSchema, filters);
List<Expression> filters,
Supplier<ScanReport> metricsReportSupplier) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling git scanReportSupplier to be a bit specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually called scanReportSupplier in other places, let's make it consistent.

List<Expression> filters) {
List<Expression> filters,
Supplier<ScanReport> metricsReportSupplier) {
this.metricsReportSupplier = metricsReportSupplier;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this assignment as the last line in the constructors to follow the existing style?
Also, let's call it scanReportSupplier.

Table table,
SparkReadConf readConf,
Supplier<ScanReport> scanReportSupplier) {
super(spark, table, readConf, table.schema(), ImmutableList.of(), scanReportSupplier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we simply pass null here and remove the supplier from the SparkStagedScan constructor? We know there would be no metrics report available as it is a staged scan.

@@ -39,6 +40,7 @@ class SparkStagedScanBuilder implements ScanBuilder {

@Override
public Scan build() {
return new SparkStagedScan(spark, table, readConf);
return new SparkStagedScan(
spark, table, readConf, (new InMemoryReadMetricReporter())::scanReport);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would not be needed if we pass null to parent constructor in SparkStagedScan.

import java.util.Arrays;
import org.apache.spark.sql.connector.metric.CustomMetric;

public class TotalFileSize implements CustomMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether I already asked, can we extend CustomSumMetric and rely on built-in method for aggregating the result? It applies to all our CustomMetric implementations.

}

List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
driverMetrics.add(TaskTotalFileSize.from(scanReport));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we don't include all metrics from ScanMetricsResult?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added all metrics from ScanMetricsResult at the point in time when the change was done,
Should we add the remaining now or take in a different PR(since this cange is already reviewed almost)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a list of the metrics:

TimerResult totalPlanningDuration(); // DONE
CounterResult resultDataFiles(); // DONE
CounterResult resultDeleteFiles(); // MISSING
CounterResult totalDataManifests(); // MISSING
CounterResult totalDeleteManifests(); // MISSING
CounterResult scannedDataManifests(); // DONE
CounterResult skippedDataManifests(); // DONE
CounterResult totalFileSizeInBytes(); // DONE
CounterResult totalDeleteFileSizeInBytes(); // MISSING
CounterResult skippedDataFiles(); // DONE
CounterResult skippedDeleteFiles(); // MISSING
CounterResult scannedDeleteManifests(); // MISSING
CounterResult skippedDeleteManifests(); // MISSING
CounterResult indexedDeleteFiles(); // MISSING
CounterResult equalityDeleteFiles(); // MISSING
CounterResult positionalDeleteFiles(); // MISSING

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done in a follow-up.

@@ -96,6 +97,7 @@ public class SparkScanBuilder
private boolean caseSensitive;
private List<Expression> filterExpressions = null;
private Filter[] pushedFilters = NO_FILTERS;
private final InMemoryReadMetricReporter metricsReporter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This should go to the block with final variables above, you may init it in in the definition (up to you).

private final SparkReadConf readConf;
private final List<String> metaColumns = Lists.newArrayList();
private final InMemoryMetricsReporter metricsReporter = new InMemoryMetricsReporter();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still applies.

}

@Test
public void testReadMetrics() throws NoSuchTableException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add proper tests? I think the approach is correct, so we can add tests now.


@Override
public void report(MetricsReport report) {
this.metricsReport = (ScanReport) report;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cast needs to be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this seems to be a generic container allowing to intercept a report.

I'd also call it InMemoryMetricsReporter, there is nothing specific to reads right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also should use Metrics vs Metric in the name to match the interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to both suggestions


public static TaskScannedDataManifests from(ScanReport scanReport) {
CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
long value = counter != null ? counter.value() : -1;
Copy link
Contributor

@nastra nastra Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is -1 typical for Spark metrics to indicate that no data was available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any CustomTaskMetricImpl in Spark handling no data.
Should we send 0 here
(I think "N/A" might not be a good idea and might convey that this data scan manifest was not releveant, WDYT)

Copy link
Contributor

@aokolnychyi aokolnychyi Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably need to check built-in Spark metrics as a reference. It will probably be 0 in other cases as Spark passes an empty array of task values and then does sum on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it 0 based on this

public class TaskScannedDataManifests implements CustomTaskMetric {
private final long value;

public TaskScannedDataManifests(long value) {
Copy link
Contributor

@nastra nastra Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be private because there's the from(ScanReport) method I would have assumed? Same applies for all the other constructors.

seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
Map<String, SQLMetric> metricsMap =
JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
Assertions.assertEquals(1, metricsMap.get("scannedDataManifests").value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're currently moving away from Junit assertions to AssertJ, can you please update this to Assertions.assertThat(...).isEqualTo(1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would it make sense to add some more metric checks here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to use org.assertj.core.api.Assertions
Added more checks

@@ -74,9 +76,10 @@ abstract class SparkPartitioningAwareScan<T extends PartitionScanTask> extends S
Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
SparkReadConf readConf,
Schema expectedSchema,
List<Expression> filters) {
List<Expression> filters,
Supplier<ScanReport> metricsReportSupplier) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: metricsReportSupplier -> scanReportSupplier

@@ -67,7 +84,8 @@ abstract class SparkScan implements Scan, SupportsReportStatistics {
Table table,
SparkReadConf readConf,
Schema expectedSchema,
List<Expression> filters) {
List<Expression> filters,
Supplier<ScanReport> metricsReportSupplier) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: metricsReportSupplier -> scanReportSupplier


@Override
public String description() {
return "result data files";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to understand if we show number of scanned data files in the UI?


@Override
public String description() {
return "num scanned data manifests";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Spark using number of ... for output records? If so, can we match whatever Spark does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on DataSourceScanExec, changing the description

}

List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
driverMetrics.add(TaskTotalFileSize.from(scanReport));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added all metrics from ScanMetricsResult at the point in time when the change was done,
Should we add the remaining now or take in a different PR(since this cange is already reviewed almost)?


public static TaskScannedDataManifests from(ScanReport scanReport) {
CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
long value = counter != null ? counter.value() : -1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any CustomTaskMetricImpl in Spark handling no data.
Should we send 0 here
(I think "N/A" might not be a good idea and might convey that this data scan manifest was not releveant, WDYT)

seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
Map<String, SQLMetric> metricsMap =
JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
Assertions.assertEquals(1, metricsMap.get("scannedDataManifests").value());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to use org.assertj.core.api.Assertions
Added more checks

import org.junit.jupiter.api.Assertions;
import scala.collection.JavaConverters;

public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for v1 and v2 tables.
Metatdata tables(apart from positional deletes table) , dont update org.apache.iceberg.SnapshotScan#scanMetrics.
Since we dont have delete manifest metric in this PR, skipping the test for positional delete metadata table.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final minor comments and should be good to go. We will need to add tests for CoW and MoR plans in a separate PR.

@@ -74,9 +76,10 @@ abstract class SparkPartitioningAwareScan<T extends PartitionScanTask> extends S
Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
SparkReadConf readConf,
Schema expectedSchema,
List<Expression> filters) {
List<Expression> filters,
Supplier<ScanReport> scanReportSupplier) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the empty line here.

}

@Override
public String aggregateTaskMetrics(long[] taskMetrics) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant as we extend CustomSumMetric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for some debugging, removed.


import org.apache.spark.sql.connector.metric.CustomSumMetric;

public class scannedDataFiles extends CustomSumMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a typo? Should start with a capital letter?


public static TaskTotalPlanningDuration from(ScanReport scanReport) {
TimerResult timerResult = scanReport.scanMetrics().totalPlanningDuration();
long value = timerResult != null ? timerResult.count() : -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be totalDuration().toMillis()? Shall we also use 0 as the default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the default at -1 based on Spark default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add a comment why this particular default value was chosen - with a reference to the spark default (here and at the other place)


@Override
public String description() {
return "total planning duration";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding info on what values we show?

total planning duration (ms)


@Override
public String description() {
return "total file size";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about total file size (bytes)?

JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
Assertions.assertThat(metricsMap.get("resultDataFiles").value()).isEqualTo(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess some of these were renamed.

@karuppayya
Copy link
Contributor Author

The test failures doesn't seem to be related

public ScanReport scanReport() {
Preconditions.checkArgument(
metricsReport == null || metricsReport instanceof ScanReport,
"Metric report is not a scan report");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Metric report is not a scan report");
"Metrics report is not a scan report");

JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
Assertions.assertThat(metricsMap.get("scannedDataFiles").value()).isEqualTo(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add checks for all the other metrics here as well (even if they are 0)?

JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
Assertions.assertThat(metricsMap.get("scannedDataFiles").value()).isEqualTo(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@nastra
Copy link
Contributor

nastra commented Jul 20, 2023

This LGTM once CI passes, @karuppayya could you rebase onto latest master please?

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @karuppayya, could you rebase to fix CI?

Can we also follow up to add missing metrics discussed here and add tests for row-level operations?

@@ -449,8 +453,14 @@ private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch,
}

scan = configureSplitPlanning(scan);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this empty line?

@aokolnychyi aokolnychyi merged commit 8e89f6b into apache:master Jul 20, 2023
@aokolnychyi
Copy link
Contributor

It is awesome to get this done, @karuppayya! Thanks for reviewing, @nastra!

@puchengy
Copy link
Contributor

@karuppayya thanks for the contributions, would you mind porting these to lower spark versions? (I particularly interested in spark 3.2). Thanks

@karuppayya
Copy link
Contributor Author

@puchengy This cannot be ported as it is dependent on Spark changes to support driver metrics, which is availble from Spark 3.4

@puchengy
Copy link
Contributor

@karuppayya Thanks for sharing! It seems to much work to first to backport spark changes to spark 3.2, make a release, and backport this change back to spark 3.2 iceberg repo.

@frankliee
Copy link
Contributor

Thanks for this pr. Could I backport this to Spark 3.3. @karuppayya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants