Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScoreMatrix incorrectly removes valid windows #195

Open
balwierz opened this issue Jan 27, 2021 · 2 comments
Open

ScoreMatrix incorrectly removes valid windows #195

balwierz opened this issue Jan 27, 2021 · 2 comments

Comments

@balwierz
Copy link

Example code:

gr3 <- GRanges("a:4-6")
gr4 <- GRanges("a:1-10")
ScoreMatrix(gr3, gr4)

Expected result:
1x10 ScoreMatrix with data: 0 0 0 1 1 1 0 0 0 0

Observed result: crash

Error in constrainRanges(target, windows) : 
  All windows fell have coordinates outside windows boundaries

The problem arises from the line

win.list.chr = suppressWarnings(subsetByOverlaps(windows, 
        constraint, type = "within", ignore.strand = TRUE))

in constrainRanges, specifically from type = "within"

@katwre
Copy link
Contributor

katwre commented Jan 27, 2021

Hi @balwierz ,
Ok, I see from where your error comes from, but if I remember correctly, @al2na and @frenkiboy implemented constrainRanges with type="within" argument to prevent windows outside of the target object - and in your example dataset you have only one window that is outside of the ranges of a target object.. so, if I am not completely mistaken, it is a feature, not a bug :) I remember having few conversations with @al2na and @frenkiboy about it a few years ago, and I was told to just trim windows outside of target ranges, would that work for you? It looks like you expect 0s in place of missing values, but that might not work well with ScoreMatrixBin(), so then NAs are the other option, which doesn't sound good somehow....
Hope it helps,
Kasia

@balwierz
Copy link
Author

In the example above window is not outside of a target. There is a perfectly fitting read in the window. There could be cases like this all over the genome. All of them would be included in the output apart from the "rightmost" one.

Compare this code:

> gr3 <- GRanges("a", IRanges(c(4, 14), c(6, 19)))
> gr4 <- GRanges("a", IRanges(c(1, 11), c(10, 20)))
> ScoreMatrix(gr3, gr4)@.Data
      [,1]
 [1,]    0
 [2,]    0
 [3,]    0
 [4,]    1
 [5,]    1
 [6,]    1
 [7,]    0
 [8,]    0
 [9,]    0
[10,]    0

The overlap between the first "read" gr3[1] and the first window gr4[1] is the same as in the previous example. But this time this region is included in the output. There is another read and another windows which are not included.
Chromosomal order (which strand is called "+" and which is called "-") is an arbitrary convention. If you were to swap the strand but otherwise keep the biological correspondence to the windows, and keep map the same reads, you would get a different score matrix. gr3[2] overlapping gr4[2], and the first read and window discarded. Thus the output of ScoreMatrix is not invariant under chromosome orientation convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants