You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure whether this is considered a bug, design flaw, or intentional feature so I present the following as simply an "issue".
The following code demonstrates a simplified and synthetic equivalent to our problem. The use of an unconventional separator in this example is intentional because it can't be fixed by falling back on auto-detection (which fails in this case), but applies equally well to any separator.
The intent here is to demonstrate that explicitly setting the "sep" to any value is unsafe.
In this notional example, we would like to implement some arbitrary functionality which operates on CSV files containing one or more columns.
As a slightly more concrete example, imagine a notional script which takes an arbitrary-length list of CSV files and inner joins them all together. One of these files might contain just a single column, acting as a row filter on the other files.
There are any number of other cases where we might want to support CSV files that contain a single column.
fread() does clearly support this use case, because the sep auto-detection happily identifies the single column case and continues without emiting a warning.
However, should we attempt to enforce a specific separator with an explicit sep param then the single column case fails with an error, even though the CSV content is this case is completely legally formatted.
The two situations appear to be strangely symmetrically "helpful". When you use auto-detection the fread() function will helpfully continue (silently with no warning) using a guess that it is a single column (even when that guess is wrong, as above). When you supply a specific sep the fread() function will helpfully stop you loading a legal single column CSV because you might have specified the sep wrong.
Moreover, the advice provided in the error is impossible to follow except in the case where the R session is interactive and a human is directly in control of the fread() call or some parent function that supports overriding the sep value by passing it down.
When the R session is not interactive, or when the fread() call is buried under several layers of higher level functionality, changing the sep to "\n" to disable the helpful error is not only something we can't do, but is functionally incorrect because the file does not actually have newline separators, it is still a legal CSV file compliant with the original sep value.
To resolve, I recommend the removal of the Error at minimum, since it provides advice that cannot possible by followed in anything other than the most trivial case.
If a function that is calling fread() is sufficiently confident in the sep value they want as to provide it explicitly, then fread() should obey that choice at face value and allow the loading of single column CSV files without complaint, whether they are correctly single column or not.
As things stand, it is impossible to implement anything with explicit sep, as it will error during parse for the single column case. And the only way to avoid this problem is to never use sep at all.
The text was updated successfully, but these errors were encountered:
I'm not sure whether this is considered a bug, design flaw, or intentional feature so I present the following as simply an "issue".
The following code demonstrates a simplified and synthetic equivalent to our problem. The use of an unconventional separator in this example is intentional because it can't be fixed by falling back on auto-detection (which fails in this case), but applies equally well to any separator.
The intent here is to demonstrate that explicitly setting the "sep" to any value is unsafe.
In this notional example, we would like to implement some arbitrary functionality which operates on CSV files containing one or more columns.
As a slightly more concrete example, imagine a notional script which takes an arbitrary-length list of CSV files and inner joins them all together. One of these files might contain just a single column, acting as a row filter on the other files.
There are any number of other cases where we might want to support CSV files that contain a single column.
fread() does clearly support this use case, because the sep auto-detection happily identifies the single column case and continues without emiting a warning.
However, should we attempt to enforce a specific separator with an explicit sep param then the single column case fails with an error, even though the CSV content is this case is completely legally formatted.
The two situations appear to be strangely symmetrically "helpful". When you use auto-detection the fread() function will helpfully continue (silently with no warning) using a guess that it is a single column (even when that guess is wrong, as above). When you supply a specific sep the fread() function will helpfully stop you loading a legal single column CSV because you might have specified the sep wrong.
Moreover, the advice provided in the error is impossible to follow except in the case where the R session is interactive and a human is directly in control of the fread() call or some parent function that supports overriding the sep value by passing it down.
When the R session is not interactive, or when the fread() call is buried under several layers of higher level functionality, changing the sep to "\n" to disable the helpful error is not only something we can't do, but is functionally incorrect because the file does not actually have newline separators, it is still a legal CSV file compliant with the original sep value.
To resolve, I recommend the removal of the Error at minimum, since it provides advice that cannot possible by followed in anything other than the most trivial case.
If a function that is calling fread() is sufficiently confident in the sep value they want as to provide it explicitly, then fread() should obey that choice at face value and allow the loading of single column CSV files without complaint, whether they are correctly single column or not.
As things stand, it is impossible to implement anything with explicit sep, as it will error during parse for the single column case. And the only way to avoid this problem is to never use sep at all.
The text was updated successfully, but these errors were encountered: