Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21236] Make the threshold of using HighlyCompressedStatus configurable. #18446

Closed
wants to merge 2 commits into from

Conversation

jinxing64
Copy link

What changes were proposed in this pull request?

Currently the threshold of using HighlyCompressedMapStatus is hardcoded 2000.
We could make this configurable. Thus users having enough memory on driver can configure the threshold to be larger thus to save the size of blocks more accurately in CompressedMapStatus.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LGTM, should we also add a new test case in MapStatusSuite

@jinxing64
Copy link
Author

Sure, thanks for review :)

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #78761 has finished for PR 18446 at commit e219690.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jinxing64
Copy link
Author

Jenkins, retest this please.

@srowen
Copy link
Member

srowen commented Jun 28, 2017

It's probably OK, but what's the use case for configuring it? when would a caller know to set it higher or lower? just trying to figure out if this is a meaningful knob.

@jiangxb1987
Copy link
Contributor

I guess the size of blocks are not very accurately stored in HighlyCompressedMapStatus, so if we have enough memory in the driver, we prefer to store them in CompressedMapStatus .

@jinxing64
Copy link
Author

Yes, this is discussed ever in #16989 . Only average size of blocks are stored in HighlyCompressedMapStatus. If driver has enough memory, we can store more accurately in CompressedMapStatus. This is pretty useful for example controlling shuffle blocks.

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #78768 has finished for PR 18446 at commit 7b78df5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #78774 has finished for PR 18446 at commit 7b78df5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jinxing64
Copy link
Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #78790 has finished for PR 18446 at commit 7b78df5.

  • This patch fails due to an unknown error code, -10.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jinxing64
Copy link
Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jun 28, 2017

Test build #78801 has finished for PR 18446 at commit 7b78df5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

is this still useful after we have #18031 ?

I think users can just set spark.shuffle.accurateBlockThreshold to a lower value if they want to esitimate shuffle block size more accurate.

@jinxing64
Copy link
Author

True. I just try to make it more complete and refine the hardcode.
spark.shuffle.accurateBlockThreshold is useful, but sometimes it's hard for user to tune it("how low is enough? and at the same time I want to control the size of MapStatus within a fixed bound). This is corner case anyway.
I can close this pr if you think it too trivial.

@cloud-fan
Copy link
Contributor

every new config comes with a cost that users have to learn about it. For this case users already have a config(spark.shuffle.accurateBlockThreshold) to tune it, and this config is also hard to tune IMO.

Let's close it first. If we get a real use case that needs this config, we can reopen.

@jinxing64
Copy link
Author

Sure :)

@jinxing64 jinxing64 closed this Jun 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants