-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coupling of Dictionary with Data causes issues for usage #3097
Comments
To grant a bit more clarity. If the pointer in gdf_column is just to a dictionary then whatever we end up doing in cudf with the nvstrings_category type doesn't require any special code. So order_by woudl work out of the box if this was the case. As it stands I have to add special cose for the case that the column ordered was an GDF_NVSTRINGS_CATEGORY and make a new NVCategory with the new indices and teh values. |
The NVCategory class only really has two member variables. The |
So can they share a dictionary? What I don't want to do is perform antoher gather. I have a list of indicies and I just want to say this is a new nvcategory a thin wrapper for the SAME dictionary and a NEW list of indices (e.g. say I had I filter out values less than equal to love so i end upu with |
We discussed having multiple instances of an What we could do:
|
YUM YUM! that woud be amazing |
So For now what I am going to do is make a wrapper function that basically performs these steps in a less efficient fashion for now making a whole new nvcategory, we can still assume its just an nvcategory and in the future we will have functions that can make its access for efficient. If we are going to be passing around shared_ptrs I will template these functions so we can change them easily when it comes to that |
Implementing the shared-ptr concept would be my preference but is not straight-forward with the current implementation. So for now, I'd like to look at just creating a new |
Superseded by #3535 |
In many cases users are going to things like sort, filter group on indices from NVCategory without wanting to modify the dictionary itself. Because NVCategory contains both then it makes it difficult to do something like reordering a column without having to reconstitute a new NVCategory. Is there a way that we can separate the two? This way you could filter a column and you only have to generate new indices and know that the use the same dictionary. We could do this by modifying the current PR for NVCategory support to store an NVCategory::Dictionary or something of that nature instead of teh NVCategory itself. This woudl also probably require a utility function that could create an NVCategory from this data and this dictionary without requiring copies if we wanted to access its apis to do things like merge them.
The text was updated successfully, but these errors were encountered: