-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor $scan_csv()
and $read_csv()
#455
Conversation
…ept multiple paths
…arams, accept multiple paths [skip ci]
#' @rdname IO_read_csv | ||
#' @param path Path to a file or URL. It is possible to provide multiple paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid writing the same description twice?
I tried using the inheritParams
tag but it doesn't seem to work.
Why not avoid duplication by combining them into the same Rd file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using the inheritParams tag but it doesn't seem to work.
Yeah, I don't know why, that's annoying
Why not avoid duplication by combining them into the same Rd file?
I'd rather have separate files, but I think I saw somewhere that we could store the roxygen docs as "template" and re-use them, so I'll try to find this again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can include Rmd files in roxygen docs but not in the parameters section, so I don't have a solution for this. I'd still like to keep separate docs so that scan_csv
and read_csv
appears separately in the sidebar of "Reference" on the website
I'm blocked on the let encoding = match encoding {
"utf8" => pl::CsvEncoding::Utf8,
"utf8-lossy" => pl::CsvEncoding::LossyUtf8,
e => {
// panic!("encoding {} not implemented.", e);
let result = Err(format!("encoding {} not implemented.", e)).unwrap();
let out = Ok(result)
.map_err(polars_to_rpolars_err)
.map(LazyFrame);
return out
}
}; @eitsupi can you take a look? |
I will look at it tomorrow if I have time. |
Done. |
You may want to scan/read a csv, apply a bunch of operations to reduce its size and then convert it to R data.frame to use other packages on it. So I think this is quite standard.
It needs to be loaded, not only installed, so this is annoying |
I tried searching, but perhaps there is currently no place to check the
|
On the R side, we could check that there are i64 columns in the data. If there are some, we check that |
Maybe you mean |
Yes, I misread your previous message |
Since a certain range of int64 can be expressed as double, DuckDB seems to cast it to double by default. |
In any case, these are not directly related to CSV reader and should be discussed in a separate issue. |
I have created an issue #465 about int64 handling. |
Thanks, there's just the problem with |
I don't understand why |
I could not see either why |
I'm sorry, but I'm on a business trip and can't let it run on hand for a few days. |
@sorhawell yes it works on Python. I decided to remove this argument for now since we know it doesn't work. @eitsupi no problem, do you have time to release 0.10 after this is merged? Otherwise I can do it if it's just |
|
Close #444
This PR:
TODO:
row_count_
paramsdtypes
andnull_values
panic!
related to wrong encoding valueintegers are parsed as i64 which can't be converted to R, see Improve handling of Polars Int64 to R #465Multiple URLs isn't implemented yet in Python, this fails: