Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadtable with missing values given by an empty cell #147

Closed
MaximilianJHuber opened this issue Mar 7, 2018 · 3 comments
Closed

loadtable with missing values given by an empty cell #147

MaximilianJHuber opened this issue Mar 7, 2018 · 3 comments

Comments

@MaximilianJHuber
Copy link

MaximilianJHuber commented Mar 7, 2018

Hi,

I read couple of .csv files with seperator ';' and 36 columns. I specify the column types correctly. The first two lines are read:

100;20030701;20150831;1234567890;20070131; <...>
200;19991018;20150831;1324567432;20070131; <...>

The third line not:

300;20070101;20070624;;20070131;

although I set "" as nastring:

loadtable(path * file * "/", output="bin", chunks=8, delim=';', distributed=false, header_exists=false, colnames = names, colparsers = [Int64, String, String, Int64, <...>], nastrings=[""])

The error message is:
MethodError: Cannot convert an object of type DataValues.DataValue{Int64} to an object of type TextParse.StrRange
and
CSV parsing error in C:\data.csv at line 4 char 22: ...300;20070101;20070624;;20070131; <...>

One bug is, that with head_exists=false the line number of the error is wrong, it is the thrid line, not the fourth.
And my main question is, how do I tell loadtalbe to take and empty cell as Missing?

@MaximilianJHuber
Copy link
Author

Not specifying the colparsers and nastrings, but increasing the type_detect_rows was successful. I now understand that a possibly missing integer column needs to be specified as DataValues.DataValue{Int64}.

@shashi
Copy link
Collaborator

shashi commented Mar 9, 2018

So what's the issue here now? would you say it's lack of documentation for this case?

@MaximilianJHuber
Copy link
Author

Yes, lack of documentation for handling of missing values, and if header_exists=false the line reported responsible for an error should be reduced by one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants