-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find a way to ensure alignment of two interval tables. #37
Comments
Yeah, effectively it's about "re-indexing" the two tables to have the same "index", where the "index" is chrom, start, end. That means it is equivalent to a join which could be inner, outer, left or right (but without actually merging into one table). The most common case here would probably be outer. If we forget about the left and right options, it could also be cast as an n-ary operation:
|
two ideas for implementation: |
I think it would have to return the dataframes with a common (reset) index, and chrom, start and end values, determined by some kind of join on the intervals in the inputs. e.g. df1 = pd.DataFrame([
['chr1', 0, 1000, 'a'],
['chr1', 1000, 2000, 'b'],
], columns=['chrom', 'start', 'end', 'foo'])
df2 = pd.DataFrame([
['chr1', 0, 1000, 'c'],
['chr1', 1000, 2000, 'd'],
['chr1', 2000, 3000, 'e'],
], columns=['chrom', 'start', 'end', 'bar'])
df3 = pd.DataFrame([
['chr1', 0, 1000, 'f'],
['chr1', 1000, 2000, 'g'],
['chrX', 0, 1000, 'x'],
['chrX', 1000, 2000, 'y'],
], columns=['chrom', 'start', 'end', 'baz'])
>>> df1, df2, df3 = align_tables([df1, df2, df3], how='outer')
>>> df1
chrom start end foo
0 chr1 0 1000 a
1 chr1 1000 2000 b
2 chr1 2000 3000 NaN
3 chrX 0 1000 NaN
4 chrX 1000 2000 NaN |
so it seems that a requirement for alignability is that any interval in df1 can overlap 1 or 0 intervals in df2 (and vice-versa). anything else? your example has uniform width bins-- were there use-cases where non-uniform bins would make sense? if not, this sounds like a function also, what was the use-case where you'd want to return individual 'aligned' dfs, rather than a df with joined columns? |
For this to work, the alignment multiindex ('chrom', 'start', 'end') + potentially others would have to be unique, i.e. no duplicates, so this would have to be checked first. |
We need a function to synchronize the indices of two tables with almost identical intervals. This is typically needed to enable safe transferring of columns between these tables. Alternatively, we can have a function that transfers columns between two tables of almost identical intervals.
@nvictus , is this a good summary of your request?
The text was updated successfully, but these errors were encountered: