You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just bumped into the same mistake than in this issue: #2094 .
Briefly, the current help on named character vectors join (argument on=) is confusing. It currently states:
As a named character vector, e.g., X[Y, on=c(x="a", y="b")]. This is useful when column names to join by are different between the two tables.
But the example is ambiguous. Those used to the syntax of merge would expect x="a" to be the equivalent of by.x="a" and y="b" to be the equivalent of by.y="b". In reality column x in data.table X should must be matched to column A in data.table Y.
The error message one gets when doing this mistake is accurate but not very helpful to the newcomer (Column(s) [x,y] not found in X).
Hence a couple of suggestions:
use a simpler naming convention in the example and describe plainly the outcome:
As a named character vector, e.g., X[Y, on=c(x1="y1", x2="y2")] to join X and Y by matching the columns named "x1" and "x2" from X with the columns "y1" and "y2" from Y. This is useful when column names to join by are different between the two tables.
this syntax has no example further down in the help page. I suggest to provide one, for example by adding to the # joins as subsets section:
DT[X, , on=c(y="v")] # join using column "y" of DT with column "v" of X
The text was updated successfully, but these errors were encountered:
I just bumped into the same mistake than in this issue: #2094 .
Briefly, the current help on named character vectors join (argument
on=
) is confusing. It currently states:But the example is ambiguous. Those used to the syntax of
merge
would expectx="a"
to be the equivalent ofby.x="a"
andy="b"
to be the equivalent ofby.y="b"
. In reality columnx
in data.tableX
should must be matched to columnA
in data.tableY
.The error message one gets when doing this mistake is accurate but not very helpful to the newcomer (
Column(s) [x,y] not found in X
).Hence a couple of suggestions:
# joins as subsets
section:The text was updated successfully, but these errors were encountered: