Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revert nio sam file merger #2333

Closed
wants to merge 3 commits into from
Closed

Conversation

lbergelson
Copy link
Member

@tomwhite @droazen This fixes the problem I was seeing, but it's insanely slow for some reason. A 30gb file on a heavy duty cluster was taking between 30-50 minutes depending on how I sharded it. Most of that time is spent on the merging step where the master concatenates the chunks.

I think the refactoring of the writing that I've done isn't a bad thing to have. I added some extra log statements that it much easier to understand what's going on when it's slow.

tomwhite and others added 3 commits January 10, 2017 17:08
it seems like the issues related to some sort of race condition with caching and deleting/renaming files in the gcs hdfs adaptor
changing the behavior so that temporary files are first written to a temporary directory and then concatenated into the final file
previously it wrote them to a directory with the same name as the final file, renamed that, and then copied them into the renamed file

improving exception message when failing to write a bam on spark
@droazen
Copy link
Contributor

droazen commented Jan 10, 2017

I think we should wait for the NIO-based fix rather than merging this, which would be useless for our performance work.

@tomwhite
Copy link
Contributor

@droazen I added a comment on #2287 about what's needed from the GCS NIO side (see also googleapis/google-cloud-java#1519).

@lbergelson
Copy link
Member Author

closing this then in favor of waiting for nio updates

@lbergelson lbergelson closed this Jan 11, 2017
@lbergelson lbergelson deleted the lb_revert_nio_sam_file_merger branch February 8, 2017 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants