-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Melt corrupts data by using column position instead of name #3487
Comments
If you want the columns linked up in a particular order, the simplest option for now might be the list syntax:
There's a comment here about making sure names match up, btw: #3396 (comment) Regarding starting from 0 vs 1 in the variable column, there's an open issue about passing custom values #2551 |
Thanks @franknarf1, I think that covers it. @jsams please follow up on the open issues (even reacting can help), or re-open this issue with some clarification if you disagree this is a duplicate. Thanks! |
I don't understand why this was closed. This is unambiguously incorrect behavior. I appreciate that there is an alternative way to get the correct output, and I had already noted a different workaround in the original report. That's not the point. The point is that there is data corruption due to mis-implementation of melt/patterns
The referenced feature request is related to naming, but the central point here is that data is being corrupted. A relatively high priority data corruption bug versus a convenience feature seem sufficiently different to warrant keeping this open, or upgrading the other to "bug" in recognition that this function is not behaving appropriately. |
It was closed because it's a duplicate. From #3396 (comment) :
|
I was taking the other issue as the mapping of the variable names annoyance. If you want to treat it as a duplicate, that's fine, but the other should be raised to a bug. Having a dev say "oh, the function ignores variable names and just uses the indices, so it's not a bug" does not make it not a bug, it must bely that the dev must not have understood the implications of what they were saying. Quoting Arun from the other report:
This is corrupting data. That should absolutely be enough reason. To be clear if it wasn't in the OP, if I had a columns: id, brand1, brand2, price2, price1, this is mapping brand1 -> price2 and brand2 -> price1. There's no user who wants that to happen. That is a logically incorrect thing to do. I'm fine with, if sometimes annoyed by, missing features. However, corrupting my data is not ok. If patterns() doesn't work with melt and it's too much work to fix it, then that's completely understandable.* Pull it. But don't leave around functionality that corrupts data and hope people notice it. Then I stop trusting the software. * First thought suggestion on making it feasible is to require a grouping expression in the arguments of patterns. e.g. above would be "brand(.*)" and "price(.*)", then the matching of the measure vars would be based on the matching of the groups, which is then the user responsibility to get right, and not particularly difficult to get right in most circumstances. |
IMHO, using the convience function You can get the correct result if you specify the column names directly as a list to
This works perfectly also for column names where the nameing scheme does not suggest a particular order, e.g.,
or with disordered columns
The result is the same (except that I have renamed the molten value columns). For properly ordered columns, the convenience function
but will fail for disordered columns of course because it has no information about the column order
|
I have a data.table with an occasionally inconsistent order to the columns of the data. Consider a dataset where users a purchasing one of three brands or the "0th" brand, and for simplicity, keep prices constant.
melt should be using my variable name indexes for the variable.name, but it seems to use the position they appear in the column order. This is simply an error and results in subtle and silent corruption of data:
Notice the brands are now indexed from 1-4 instead of 0-3 as in the original data, annoying, but fixable. However, the columns are mixed and matched based on the order they appear and not based on the names of the columns. Thus, it indicates the first user is purchasing brand 2 at 0.7 instead of brand 1 at 0.8.
re-ordering the columns appears to fix things:
I'm running R 3.5 and data.table 1.12:
The text was updated successfully, but these errors were encountered: