-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
join vignette #2181
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Noting for reference: joins vignette would be a good place to have an example of replacing nested |
summarizing the scope
|
This comment was marked as off-topic.
This comment was marked as off-topic.
@jangorecki, given that #3453 is being prepared where a detailed overview of rolling joins is being covered by @Henrik-P, would it make sense to add separate vignettes for equi- and non equi- joins, as I believe the latter is far more relevant for time series analysis? The content of both vignettes at the moment will be significant given your scope above. |
For Joins vignette: Originally posted by @MichaelChirico in #944 (comment) |
@zeomal better to have 2 bigger vignettes, than 3 smaller IMO. We already have many vignettes. |
@jangorecki, I've created a draft pull request for this vignette. It's a first version, bound to have many changes, but covers the basics of equi-joins. This is my first pull request ever, so if I've done something wrong, please correct me. |
Hello!! I don't know if this is the most recent issue commenting this missing vignette. But I landed on data.table a couple of days ago and I wanted to learn the framework thoroughly following the vignettes in order. I was sad finding out that this vignette does not exist yet. By the way, I wanted to thank you all for this package. I come from Python (I am very used to pandas), and lately I started using R and I was enjoying the tidyverse approach (which is the most usually taught). But when I started with this library I was sooooooo blown away by it. I mean, it's just amazingly good: even if it were not the fastest library ever, I would be using it just for the clever syntax. I cannot emphasize more how much I love it!!! So thanks a lot for this wonderful creation. If I could be of any help with this vignette that would be great. Is there anyone working on it? Is there any branch with a draft of the vignette? I have no idea... |
Thank you for warm words. My impression about DT was quite similar when I arrived to it :) top speed and low memory are just nice bonus to the best syntax. As for learning joins, you can go through the list of join features mentioned in this issue, and look it up in ?data.table manual and stackoverflow. There was a draft of join vignette, or maybe even two, but they were far from complete, so I doubt if the one will succeed as vignette ultimately. |
Yeah, but to be honest it took me some time to decide to invest in it, because my wrong impression, created by many shared opinions in blogs and discussions forums, was that the syntax was ugly and difficult to understand. And things like the mere existence of I mean, I think it's good that both syntax approaches exist (specially, being so orthogonal), and that different people can use R they way they prefer. The only think that makes me sad is that I feel data.table is underpromoted and has an undeserved aura of obscurity. At least, that was my perception. Thanks a lot for your suggestion. I will start with your approach and hopefully I will be able to understand it. |
+1 I found it confusing that this vignette is mentioned in datatable-intro, but can not be found/read. Is there another reference that we can use for teaching people how to do joins? |
@jangorecki, Could I use the Taylor Swift Tidytuesday dataset to create the vignette? I can explain what I learnt in the Joining Data with data.table in R Datacamp course |
Cannot believe that this issue has been around for so long. This is actually a bug. It is better not to mention it at all. |
There are already drafts or work in progress of this vignette, IIRC 2 or even 3, so probably it will be good place to start from rather than adding the next one. |
@AngelFelizR, take a look at #4398 for inspiration and which issues the join vignette could close. |
Thanks @avimallu, I will work to have a first draft by 2023-11-27 |
I've found https://medium.com/analytics-vidhya/r-data-table-joins-48f00b46ce29 to be quite helpful as well. Though since that's on a personal blog you'll probably want to contact the author for permission if you wanted to copy from it for the vignette. |
After reading all the comments related to this issue, I found out that the vignette must be created with simulated data. This approach will demonstrate how to use the package in various situations, from using a short data.table of 5 rows to avoiding unnecessary dependencies. It’s important to keep the story from becoming overwhelming. Here is the basic structure that I will be creating:
Please let me know if I am missing something. |
I used the flights data to explain joins, in my slides for the data.table tutorial at the LatinR meeting last month, https://github.com/tdhock/2023-10-LatinR-data.table#english |
mergelist PR is ready to merge so probably will land in master before the vignette, so should be included as well foverlaps is missing |
I think we should avoid the merge function other than as a side note. One of In addition, the overlap join functions have a separate syntax, it might be worth placing all syntactically similar joins together to have them all in one place. |
I started the vignette with the merge function as is easier to understand for new users. In my case is normal to use many merge function in chain following the next syntax as is the only way to apply left join. DtMerged <-
DT1[...
][, merge(.SD, DT2, by = "x", all.x = TRUE)
][, merge(.SD, DT3, by = "y", all.x = TRUE)] What I could do is to move the mergelist from point 3 to point 2 to avoid the switch from function to syntax.
I thought that overlap join is an application of non-equi join. |
overlapping join ( I second @avimallu suggestion about dropping |
Here is the new structure
|
nb. mergelist supports full join as well, probably much more efficient than merge |
That's sounds really good. So the will the vignette's structure:
Link to |
Moreover, mergelist has more mult options as far as I recall. Aggregate on join is not possible by x's column yet. Only by each I. |
Some tweaks. Related: Rdatatable#6478 Rdatatable#2181
fixed by #6478 |
Several places in the available vignettes refer to this mysterious vignette about join and rolling join. When will it be up? Thanks
The text was updated successfully, but these errors were encountered: