Skip to content

Commit

Permalink
PartitionEvaluator et al.
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Apr 16, 2024
1 parent d139926 commit 5b5f1e7
Show file tree
Hide file tree
Showing 6 changed files with 117 additions and 0 deletions.
28 changes: 28 additions & 0 deletions docs/PartitionEvaluator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
tags:
- DeveloperApi
---

# PartitionEvaluator

`PartitionEvaluator[T, U]` is an [abstraction](#contract) of [partition evaluators](#implementations) that can [compute (_evaluate_) one or more RDD partitions](#eval).

## Contract

### Evaluate Partitions { #eval }

```scala
eval(
partitionIndex: Int,
inputs: Iterator[T]*): Iterator[U]
```

Used when:

* `MapPartitionsWithEvaluatorRDD` is requested to [compute a partition](rdd/MapPartitionsWithEvaluatorRDD.md#compute)
* `ZippedPartitionsWithEvaluatorRDD` is requested to [compute a partition](rdd/ZippedPartitionsWithEvaluatorRDD.md#compute)

## Implementations

!!! note
No built-in implementations available in Spark Core (but [Spark SQL]({{ book.spark_sql }})).
30 changes: 30 additions & 0 deletions docs/PartitionEvaluatorFactory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
tags:
- DeveloperApi
---

# PartitionEvaluatorFactory

`PartitionEvaluatorFactory[T, U]` is an [abstraction](#contract) of [PartitionEvaluator factories](#implementations).

`PartitionEvaluatorFactory` is a `Serializable` ([Java]({{ java.api }}/java/io/Serializable.html)).

## Contract

### Creating PartitionEvaluator { #createEvaluator }

```scala
createEvaluator(): PartitionEvaluator[T, U]
```

Creates a [PartitionEvaluator](PartitionEvaluator.md)

Used when:

* `MapPartitionsWithEvaluatorRDD` is requested to [compute a partition](rdd/MapPartitionsWithEvaluatorRDD.md#compute)
* `ZippedPartitionsWithEvaluatorRDD` is requested to [compute a partition](rdd/ZippedPartitionsWithEvaluatorRDD.md#compute)

## Implementations

!!! note
No built-in implementations available in Spark Core (but [Spark SQL]({{ book.spark_sql }})).
33 changes: 33 additions & 0 deletions docs/rdd/MapPartitionsWithEvaluatorRDD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# MapPartitionsWithEvaluatorRDD

`MapPartitionsWithEvaluatorRDD` is an [RDD](RDD.md).

## Creating Instance

`MapPartitionsWithEvaluatorRDD` takes the following to be created:

* <span id="prev"> Previous [RDD](RDD.md)
* <span id="evaluatorFactory"> [PartitionEvaluatorFactory](../PartitionEvaluatorFactory.md)

`MapPartitionsWithEvaluatorRDD` is created when:

* [RDD.mapPartitionsWithEvaluator](RDD.md#mapPartitionsWithEvaluator) operator is used
* [RDDBarrier.mapPartitionsWithEvaluator](../barrier-execution-mode/RDDBarrier.md#mapPartitionsWithEvaluator) operator is used

## Computing Partition { #compute }

??? note "RDD"

```scala
compute(
split: Partition,
context: TaskContext): Iterator[U]
```

`compute` is part of the [RDD](RDD.md#compute) abstraction.

`compute` requests the [PartitionEvaluatorFactory](#evaluatorFactory) to [create a PartitionEvaluator](../PartitionEvaluatorFactory.md#createEvaluator).

`compute` requests the [first parent RDD](RDD.md#firstParent) to [iterator](RDD.md#iterator).

In the end, `compute` requests the [PartitionEvaluator](../PartitionEvaluator.md) to [evaluate the partition](../PartitionEvaluator.md#eval).
19 changes: 19 additions & 0 deletions docs/rdd/RDD.md
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,25 @@ withScope[U](
!!! note
`withScope` is used for most (if not all) `RDD` API operators.

## mapPartitionsWithEvaluator { #mapPartitionsWithEvaluator }

```scala
mapPartitionsWithEvaluator[U: ClassTag](
evaluatorFactory: PartitionEvaluatorFactory[T, U]): RDD[U]
```

`mapPartitionsWithEvaluator` creates a [MapPartitionsWithEvaluatorRDD](MapPartitionsWithEvaluatorRDD.md) for this `RDD` and the given [PartitionEvaluatorFactory](../PartitionEvaluatorFactory.md).

## zipPartitionsWithEvaluator { #zipPartitionsWithEvaluator }

```scala
zipPartitionsWithEvaluator[U: ClassTag](
rdd2: RDD[T],
evaluatorFactory: PartitionEvaluatorFactory[T, U]): RDD[U]
```

`zipPartitionsWithEvaluator` creates a [ZippedPartitionsWithEvaluatorRDD](ZippedPartitionsWithEvaluatorRDD.md) for this `RDD` and the given `RDD` and the [PartitionEvaluatorFactory](../PartitionEvaluatorFactory.md).

<!---
## Review Me
Expand Down
3 changes: 3 additions & 0 deletions docs/rdd/ZippedPartitionsWithEvaluatorRDD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# ZippedPartitionsWithEvaluatorRDD

`ZippedPartitionsWithEvaluatorRDD` is...FIXME
4 changes: 4 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,8 @@ nav:
- ExecutorDeadException: ExecutorDeadException.md
- HeartbeatReceiver: HeartbeatReceiver.md
- InterruptibleIterator: InterruptibleIterator.md
- PartitionEvaluatorFactory: PartitionEvaluatorFactory.md
- PartitionEvaluator: PartitionEvaluator.md
- Utils: Utils.md
- Spark Tips and Tricks:
- Spark Tips and Tricks: spark-tips-and-tricks.md
Expand Down Expand Up @@ -548,7 +550,9 @@ nav:
- RDD Checkpointing: rdd/checkpointing.md
- RDDCheckpointData: rdd/RDDCheckpointData.md
- LocalRDDCheckpointData: rdd/LocalRDDCheckpointData.md
- MapPartitionsWithEvaluatorRDD: rdd/MapPartitionsWithEvaluatorRDD.md
- ReliableRDDCheckpointData: rdd/ReliableRDDCheckpointData.md
- ZippedPartitionsWithEvaluatorRDD: rdd/ZippedPartitionsWithEvaluatorRDD.md
- Aggregator: rdd/Aggregator.md
- Demos:
- demo/index.md
Expand Down

0 comments on commit 5b5f1e7

Please sign in to comment.