Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks that number of matrix elements do not exceed max int in CreateReadCountPanelOfNormals. #4734

Closed
samuelklee opened this issue May 3, 2018 · 0 comments

Comments

@samuelklee
Copy link
Contributor

samuelklee commented May 3, 2018

Ran into this when trying to create a PoN with 100 samples x 100 bp bins = 1.27 * max int elements.

This currently causes issues when truncating outliers, at which point all elements are loaded into an array so that Percentiles can be naively computed, resulting in a java.lang.NegativeArraySizeException.

Solutions include: 1) simply throwing a message and failing early if the counts matrix is too large (perhaps recommend scattering by contig, see #4728), 2) changing the outlier truncation procedure to be more robust.

I'm not sure how important outlier truncation is to the SVD, as it remains to be evaluated, but for now we should be able to get around this with no code changes by simply disabling it (i.e., setting the relevant truncation percentile to 0).

Note that file I/O takes about an hour for this case. Also note that this is probably on the extreme end of what we should expect to support on a single machine with all counts in memory, as the SVD is probably sufficiently good with 100 samples and 100 bp is on the order of the read length. #4728 will get around this and also make downstream tasks complete faster in parallel, at the very small expense of reducing a few global parameters to per-contig parameters in the modeling step.

@samuelklee samuelklee self-assigned this May 3, 2018
samuelklee added a commit that referenced this issue Feb 15, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 15, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 15, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 21, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 21, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Feb 27, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Mar 2, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Mar 4, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Mar 4, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Mar 7, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Mar 8, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Mar 9, 2019
…xceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)
samuelklee added a commit that referenced this issue Mar 12, 2019
* Cleaned up intermediate files in gCNV WDL and fixed miscellaneous typos. (#5382)

* Added output of MAD values as floats in somatic CNV WDL. (#5591)

* Exposed boot disk space for Oncotator in somatic CNV WDL. (#3566)

* Added check to skip outlier truncation if number of matrix elements exceeds Integer.MAX_VALUE in CreateReadCountPanelOfNormals. (#4734)

* Miscellaneous boy scout activities.

* Fixed some issues concerning intervals in DetermineGermlineContigPloidy documentation.

* Fixed non-kebab-case argument in CollectAllelicCountsSpark and other minor issues.

* Improved consistency of style and input/output validation across CNV tools. (#4825)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant