Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] FeatGraph TVM kernels support #2136

Closed
wants to merge 165 commits into from

Conversation

kira-lin
Copy link
Contributor

@kira-lin kira-lin commented Sep 1, 2020

Description

This branch merges FeatGraph TVM kernels, and implement necessary functions to be able to use them. When it is completed, it should be able to compile optimal kernels, perform partitioning if necessary, given a specific workload just-in-time though TVM.

Checklist

  • Add tests for graph partitioning(for FeatGraph), and different kernels.
  • Add documentation.
  • Implement parameter searching, i.e. finding optimal number of partitions automatically.
  • Fix cache for partitioned graph.
  • Add switch between TVM kernels and traditional ones.

Ubuntu and others added 30 commits July 13, 2020 06:45
* change edge_ids behavior and C++ impl

* fix unittests; remove utils.Index in edge_id

* pass mx and th tests

* pass tf test

* add aten::Scatter_

* Add nonzero; impl CSRGetDataAndIndices/CSRSliceMatrix

* CSRGetData and CSRGetDataAndIndices passed tests

* CSRSliceMatrix basic tests

* fix bug in empty slice

* CUDA CSRHasDuplicate

* has_node; has_edge_between

* predecessors, successors

* deprecate send/recv; fix send_and_recv

* deprecate send/recv; fix send_and_recv

* in_edges; out_edges; all_edges; apply_edges

* in deg/out deg

* subgraph/edge_subgraph

* adj

* in_subgraph/out_subgraph

* sample neighbors

* set/get_n/e_repr

* wip: working on refactoring all idtypes

* pass ndata/edata tests on gpu

* fix

* stash

* workaround nonzero issue

* stash

* nx conversion

* test_hetero_basics except update routines

* test_update_routines

* test_hetero_basics for pytorch

* more fixes

* WIP: flatten graph

* wip: flatten

* test_flatten

* test_to_device

* fix bug in to_homo

* fix bug in CSRSliceMatrix

* pass subgraph test

* fix send_and_recv

* fix filter

* test_heterograph

* passed all pytorch tests

* fix mx unittest

* fix pytorch test_nn

* fix all unittests for PyTorch

* passed all mxnet tests

* lint

* fix tf nn test

* pass all tf tests

* lint

* lint

* change deprecation

* try fix compile

* lint

* update METIDS

* fix utest

* fix

* fix utests

* try debug

* revert

* small fix

* fix utests
* upd

* upd

* upd

* fix

* upd

* upd

* upd

* upd

* upd

* trigger

* +1s
* mutation add_nodes and add_edges

* Add support for remove_edges, remove_nodes, add_selfloop, remove_selfloop

* Fix

Co-authored-by: Ubuntu <[email protected]>
* add nodesy

* All three

* Fix

* lint

* Add some test case

* Fix

* Fix

* Fix

* Fix

* Fix

* Fix

* fix

* triger

* Fix

* fix

Co-authored-by: Ubuntu <[email protected]>
@yzh119 yzh119 requested review from yzh119 and jermainewang and removed request for yzh119 September 1, 2020 07:17
Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations, and thanks for your contribution to this PR.
I've left some comments, please address these concerns.

I suggest preserving old kernel code instead of deleting them. When users want to use tvm kernels, the gspmm call would be dispatched to gspmm_featgraph, likewise for gsddmm.

examples/pytorch/gcmc/model.py Outdated Show resolved Hide resolved
python/dgl/sparse.py Outdated Show resolved Hide resolved
# pass edge_mapping to tvm only when array packing will be used
use_idx = edge_shuffled and num_feat_partitions > 1 and not use_bcast and use_e
f_input = [indptr, indices]
key = (num_rows, num_cols, nnz, op, reduce_op, u_shp, e_shp, use_idx, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used for indexing, should find a more elegant way.

# num_row_partitions, num_col_partitions = 2, 2
# num_feat_partitions = 2
if target == 'cuda' and (num_row_partitions > 1 or num_col_partitions > 1 or num_feat_partitions > 1):
print('Partitioning not supported on GPU')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this with logging.

python/dgl/sparse.py Outdated Show resolved Hide resolved
python/dgl/tvm/gsddmm.py Show resolved Hide resolved
python/dgl/tvm/gsddmm.py Show resolved Hide resolved
python/dgl/tvm/gsddmm.py Show resolved Hide resolved
python/dgl/tvm/gsddmm.py Outdated Show resolved Hide resolved
else:
raise NotImplementedError
# check if used for sampling
generic_shape = nnz == 0 and num_rows == 0 and num_cols == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user would like to deal with generic shape, nnz, num_rows and num_cols need to be set to all zeros?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We can use another flag, if you prefer.

Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @jermainewang I think this PR is ready to be merged.
Let's finish the remaining stuff in future PRs:

  1. support sampling.
  2. support fp16.
  3. support GE-SpMM.

@yzh119 yzh119 changed the title [WIP] FeatGraph TVM kernels support [performance] FeatGraph TVM kernels support Sep 11, 2020
@jermainewang
Copy link
Member

Things todo before merging this PR.

  • Move all the related codes to dgl/experimental/.
  • Write an one to two pager as a mini user guide on how to use this feature and what are the expected results. You can put it under a new "Experimental Features" section on docs.dgl.ai . @yzh119
  • Benchmark the current implementation against GESpMM and merge-based algorithm. @kira-lin
  • Code review by @jermainewang

@yzh119
Copy link
Member

yzh119 commented Dec 31, 2020

After discussion we decide to use AOT solution (see #2367).

@yzh119 yzh119 closed this Dec 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants