Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading pairs with # in read IDs #192

Closed
Phlya opened this issue Apr 13, 2020 · 6 comments
Closed

Loading pairs with # in read IDs #192

Phlya opened this issue Apr 13, 2020 · 6 comments

Comments

@Phlya
Copy link
Member

Phlya commented Apr 13, 2020

Following the pandas behaviour of treating comments, cooler cload pairs truncates all lines at # - which makes pairs with # in read names unloadable.

Not sure what can be done in general case - but I think we don't really expect comments in the body of pairs, so perhaps after the header is read in, the comment character can be unset for the rest of the file?

Or open2c/pairtools#82 needs to be finalized and merged with some default transform that would prevent this?

I didn't even know the files I am using had this problem until trying to load them (and didn't know this was a limitation).

@Phlya
Copy link
Member Author

Phlya commented Apr 14, 2020

OK, after the meeting in accordance with what we discussed I propose to replace

https://github.com/mirnylab/cooler/blob/8c515d043daaf7bd1d24977e4a966de9b0da5978/cooler/cli/cload.py#L485-L488

with pairtools._fileio.auto_open()

https://github.com/mirnylab/pairtools/blob/775afe656aa1b2ed81882a92573de23f0e6dc33a/pairtools/_fileio.py#L8

like this, accounting for the - API for stdin

https://github.com/mirnylab/pairtools/blob/775afe656aa1b2ed81882a92573de23f0e6dc33a/pairtools/pairtools_dedup.py#L180-L183

passed into

pairtools._headerops.get_header(f_in)[1]

https://github.com/mirnylab/pairtools/blob/775afe656aa1b2ed81882a92573de23f0e6dc33a/pairtools/_headerops.py#L13

Then this stream without the header can be parsed by pandas.read_csv like now, but without the comment argument (which defaults to None).

Shall I implement this? @nvictus

(Turns out permalinks to lines in other repos are not rendered as nice previews!)

@nvictus
Copy link
Member

nvictus commented Apr 14, 2020

Actually, check out get_handle in pandas: https://github.com/pandas-dev/pandas/blob/master/pandas/io/common.py#L329

It should handle file paths the same way read_csv does.

We could combine that with get_header.

@Phlya
Copy link
Member Author

Phlya commented Apr 20, 2020

get_handle was private until 4 months ago...
pandas-dev/pandas@0df8858#diff-0d7b5a2c72b4dfc11d80afe159d45ff8L341

@nvictus
Copy link
Member

nvictus commented Apr 20, 2020

Looks like as of the 1.0 release. We should probably pin the pandas dependency to >=1.0 anyway.

@Phlya
Copy link
Member Author

Phlya commented Apr 20, 2020

Done in #193, would appreciate input there.

@Phlya
Copy link
Member Author

Phlya commented Apr 24, 2020

Fixed in #193!

@Phlya Phlya closed this as completed Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants