Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(wgt): enable DI using torch-rpc to support GPU-p2p and RDMA-rpc #562

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Commits on Jan 12, 2023

  1. feature(wgt): enable DI using torch-rpc to support GPU-p2p and RDMA-rpc

    1. Add torchrpc message queue.
    
    2. Implement buffer based on CUDA-shared-tensor to optimize the data path of torchrpc.
    
    3. Add 'bypass_eventloop' arg in Task() and Parallel().
    
    4. Add thread lock in distributer.py to prevent sender and receiver competition.
    
    5. Add message queue perf test for torchrpc, nccl, nng, shm
    
    6. Add comm_perf_helper.py to make program timing more convenient.
    
    7. Modified the subscribe() of class MQ, adding 'fn' parameter and 'is_once' parameter.
    
    8. Add new DummyLock and ConditionLock type in lock_helper.py
    
    9. Add message queues perf test.
    
    10. Introduced a new self-hosted runner to execute cuda, multiprocess, torchrpc related tests.
    SolenoidWGT committed Jan 12, 2023
    Configuration menu
    Copy the full SHA
    5cfc2fb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2cee01a View commit details
    Browse the repository at this point in the history
  3. fix branch conflict

    SolenoidWGT committed Jan 12, 2023
    Configuration menu
    Copy the full SHA
    6162b81 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    97b9bc7 View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2023

  1. change port for nng perf

    SolenoidWGT committed Jan 13, 2023
    Configuration menu
    Copy the full SHA
    c5119f5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a7a57a6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    fdd1bb9 View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2023

  1. Configuration menu
    Copy the full SHA
    c06288e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3fa5319 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2023

  1. Configuration menu
    Copy the full SHA
    345cc92 View commit details
    Browse the repository at this point in the history
  2. add pytest timeout

    SolenoidWGT committed Jan 18, 2023
    Configuration menu
    Copy the full SHA
    dcc0a1a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    adab7fb View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2023

  1. Configuration menu
    Copy the full SHA
    ca25b27 View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2023

  1. polish

    wangguoteng.p committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    44dcf13 View commit details
    Browse the repository at this point in the history
  2. test pytest worker = 1 to avoid timeout

    wangguoteng.p committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    735e7cc View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2023

  1. Configuration menu
    Copy the full SHA
    e32055b View commit details
    Browse the repository at this point in the history