- How did changing values on the SparkSession property parameters affect the throughput and latency of the data?
processedRowsPerSecond
is affected.
- What were the 2-3 most efficient SparkSession property key/value pairs? Through testing multiple variations on values, how can you tell these were the most optimal?
-
spark.default.parallelism
-
spark.streaming.kafka.maxRatePerPartition
Based on the processed rows per second.