-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DGL Dataloader #3181
DGL Dataloader #3181
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all this hard work. I noticed some minor things.
I also wish we didn't have to copy DataFrames for the reverse_edges
code, but I can't think of a more efficient way either.
if self._input_directory is None: | ||
raise dgl.DGLError( | ||
"Please set input directory by calling `set_input_directory` " | ||
"before trying to fetch a sample" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this check and another method, can the ctor just require an input_directory
arg and do the set_input_directory
steps itself upfront?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i did not do this because this is set/reset per epoch basis as each epoch has a different directory, Happy to discuss if you have ideas on how to do this cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this is fine for now please.
|
||
from cugraph.gnn.data_loading import EXPERIMENTAL__BulkSampler | ||
|
||
BulkSampler = experimental_warning_wrapper(EXPERIMENTAL__BulkSampler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, waiting on #3170 to land before i make these changes.
A couple of clarifications:
CC: @galipremsagar , For context we are renaming some columns and adding an extra column but can not modify the input DataFrame. I think your amazing work with COW will address these cases. |
Yes, COW will remove the memory overhead. But we are slipping it to |
…uJawa/cugraph into bulk_sampler_dataloader_dgl
).persist() | ||
empty_df = ( | ||
create_empty_df_with_edge_props(indices_t, weight_t) | ||
if with_edge_properties | ||
else create_empty_df(indices_t, weight_t) | ||
) | ||
ddf = dask_cudf.from_delayed(result, meta=empty_df, verify_meta=False).persist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be over-writted by : #3170 so please ignore.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from cugraph.gnn.data_loading.bulk_sampler import EXPERIMENTAL__BulkSampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this file #3170
CC: @rlratzel , Have resolved all changes that were part of this PR , other minor 3 variable name changes will come once 3180 lands. |
Codecov ReportBase: 55.30% // Head: 55.61% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #3181 +/- ##
================================================
+ Coverage 55.30% 55.61% +0.31%
================================================
Files 148 150 +2
Lines 9573 9671 +98
================================================
+ Hits 5294 5379 +85
- Misses 4279 4292 +13
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. The use_ddp
additions are pretty exciting.
/merge |
This PR depends upon
#3148(#3170)This PR closes #3154
Tests to add:
Maybe Todo:
- [ ] Test Multi-trainer examples and verification(Will do a follow up)