Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement comment.char argument in fread #856

Open
Tracked by #3189
arunsrinivasan opened this issue Oct 3, 2014 · 29 comments
Open
Tracked by #3189

Implement comment.char argument in fread #856

arunsrinivasan opened this issue Oct 3, 2014 · 29 comments
Labels
feature request fread top request One of our most-requested issues

Comments

@arunsrinivasan
Copy link
Member

Similar to read.table.

@mattdowle
Copy link
Member

Need to ignore whole lines (starting with comment) as well as trailing comments after valid lines.

@mattdowle
Copy link
Member

Then update this question : http://stackoverflow.com/questions/18920777/fread-data-table-in-r/

@eddelbuettel
Copy link
Contributor

Bump. Needing this right now.

@RLathamBAH

This comment has been minimized.

@DavidArenburg
Copy link
Member

DavidArenburg commented Feb 14, 2018

Actually fread seem to assume it already has this implemented as it is mentioning comment.char in its warnings. This is a warning I've recently saw (using the dev 1.10.5 version)

Warning in fread(file_x, skip = startind - 1L, header = TRUE, fill = TRUE) :
Stopped early on line 2. Expected 168 fields but found 235. Consider fill=TRUE and comment.char=.

Also, not sure why isn't this an error? It make it harder to catch it with tryCatch

@eddelbuettel
Copy link
Contributor

Interesting. Can you chase the comment and find an author and commit per git blame?
I am mostly using the CRAN version so I work around the issue (when I have to, which is not that often).

@DavidArenburg
Copy link
Member

DavidArenburg commented Feb 14, 2018

@eddelbuettel
Copy link
Contributor

Thanks. Which one can click on for git blame so yield ...

Better skip= and nrow= (#2623)

by Matt just one day ago (!!)

@map2085

This comment has been minimized.

@mattdowle mattdowle removed this from the Candidate milestone May 10, 2018
@Berghopper

This comment has been minimized.

@jangorecki

This comment has been minimized.

@Berghopper

This comment has been minimized.

@bpbond

This comment has been minimized.

@joost823

This comment has been minimized.

@ggirelli
Copy link

ggirelli commented Feb 7, 2019

Bumping too. And if you want to skip only lines with hash at the beginning cmd='grep -v "^#" table.csv'.

@cimbusch

This comment has been minimized.

@ArthurPERE

This comment has been minimized.

@Atrebas
Copy link

Atrebas commented May 9, 2019

@ArthurPERE, from ?fread: skip="string" searches for "string" in the file (e.g. a substring of the column names row) and starts on that line, so I think you get the expected result.
You should consider using the following command:

fread("https://github.com/Rdatatable/data.table/files/3162861/Proteome_spodo.as.pfam31.txt", skip = 29, fill = TRUE)

with verbose = TRUE for more details, or the grep method mentioned above.

@ArthurPERE

This comment has been minimized.

@jangorecki
Copy link
Member

@ArthurPERE it is best to use a documentation as reference and defined behaviour. There you can also find there is no such a thing like comment.char parameter. You can find fread manual at https://rdatatable.gitlab.io/data.table/library/data.table/html/fread.html

AFAIR status of this FR or works on it are well reflected in comments. Be sure to upvote this FR so it will likely speed up its implementation, or at least prioritise. You are also welcome to submit a patch introducing such feature.

@davidlvb

This comment has been minimized.

@DanielMedic

This comment has been minimized.

@darcyj

This comment has been minimized.

@doorisajar

This comment has been minimized.

@MichaelChirico MichaelChirico added top request One of our most-requested issues and removed High labels Jun 7, 2020
@asgr

This comment has been minimized.

@mjsteinbaugh
Copy link

Following up on this, I'm open to helping develop and/or test this functionality in a future update

@jangorecki
Copy link
Member

jangorecki commented Jan 13, 2021

@mjsteinbaugh you are very welcome, please submit PR

@dvg-p4
Copy link
Contributor

dvg-p4 commented Nov 14, 2024

There is still that error message that references comment.char:

Warning in fread("vep_output") :
  Stopped early on line 4. Expected 6 fields but found 5. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<## Using cache in /home/dgealow/.vep/homo_sapiens/113_GRCh38>>

This is incredibly confusing since comment.char is not a parameter to fread. If there are no plans to actually implement it any time soon (I notice this ticket has been open for 10 years) I'd suggest removing the reference from that error message.

@MichaelChirico
Copy link
Member

We have a pending PR: #4486. It requires some love to get it over the line :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request fread top request One of our most-requested issues
Projects
None yet
Development

No branches or pull requests