Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds iterator and client unit tests, and prepares for more fetch failure handling #331

Merged
merged 5 commits into from
Jul 10, 2020

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Jul 8, 2020

This is an initial PR introducing the RapidsShuffleFetchFailedException and some usage, to try and address some of the issues brought up in: #326

It also adds initial unit tests for the client side.

Note that the write side has a change squeezed in there. When a task fails, it will call the writer with stop(false) to signify a failure during its processing. Prior to this PR, we would shut down the storage all-together => this would cause all peers to this executor to fail in the future, given that we would now reuse the executor's JVM. This change tracks these "uncomitted" ShuffleBufferIds at the writer level. @jlowe FYI, happy to split that off, handle it differently, but adding it here for discussion.

@abellina abellina added the shuffle things that impact the shuffle plugin label Jul 8, 2020
@abellina abellina changed the title Adds iterator and client unit tests, and prepares the read side for Adds iterator and client unit tests, and prepares for more fetch failure handling Jul 8, 2020
@abellina
Copy link
Collaborator Author

abellina commented Jul 9, 2020

build

@revans2 revans2 merged commit 33b40fd into NVIDIA:branch-0.2 Jul 10, 2020
@abellina abellina deleted the shuffle_tests_1 branch July 10, 2020 21:34
@sameerz sameerz added this to the Jul 6 - Jul 17 milestone Jul 17, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…ure handling (NVIDIA#331)

* Adds iterator and client unit tests, and prepares the read side for
fetch failure handling.

* Remove synchronization that was added by mistake

* Changes to address code review comments

* Add a comment stating the thread safety expectations of removeBuffer

* Add a comment on mapId vs mapIndex
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
…ure handling (NVIDIA#331)

* Adds iterator and client unit tests, and prepares the read side for
fetch failure handling.

* Remove synchronization that was added by mistake

* Changes to address code review comments

* Add a comment stating the thread safety expectations of removeBuffer

* Add a comment on mapId vs mapIndex
pxLi pushed a commit to pxLi/spark-rapids that referenced this pull request May 12, 2022
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…ilding framework (NVIDIA#331)

* Added nvbench support to repo

* Added Google Test framework to repo

* Updating test framework to run in docker container

Signed-off-by: Mike Wilson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
shuffle things that impact the shuffle plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants