-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checks that number of matrix elements do not exceed max int in CreateReadCountPanelOfNormals. #4734
Labels
Comments
samuelklee
added a commit
that referenced
this issue
Feb 15, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 15, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 15, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 21, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 21, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Mar 2, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Mar 4, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Mar 4, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Mar 7, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Mar 8, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Mar 9, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee
added a commit
that referenced
this issue
Mar 12, 2019
* Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typos. (#5382) * Added output of MAD values as floats in somatic CNV WDL. (#5591) * Exposed boot disk space for Oncotator in somatic CNV WDL. (#3566) * Added check to skip outlier truncation if number of matrix elements exceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734) * Miscellaneous boy scout activities. * Fixed some issues concerning intervals in DetermineGermlineContigPloidy documentation. * Fixed non-kebab-case argument in CollectAllelicCountsSpark and other minor issues. * Improved consistency of style and input/output validation across CNV tools. (#4825)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ran into this when trying to create a PoN with 100 samples x 100 bp bins = 1.27 * max int elements.
This currently causes issues when truncating outliers, at which point all elements are loaded into an array so that Percentiles can be naively computed, resulting in a
java.lang.NegativeArraySizeException
.Solutions include: 1) simply throwing a message and failing early if the counts matrix is too large (perhaps recommend scattering by contig, see #4728), 2) changing the outlier truncation procedure to be more robust.
I'm not sure how important outlier truncation is to the SVD, as it remains to be evaluated, but for now we should be able to get around this with no code changes by simply disabling it (i.e., setting the relevant truncation percentile to 0).
Note that file I/O takes about an hour for this case. Also note that this is probably on the extreme end of what we should expect to support on a single machine with all counts in memory, as the SVD is probably sufficiently good with 100 samples and 100 bp is on the order of the read length. #4728 will get around this and also make downstream tasks complete faster in parallel, at the very small expense of reducing a few global parameters to per-contig parameters in the modeling step.
The text was updated successfully, but these errors were encountered: