Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IB/RC: Initial implementation of RC transport. #19

Merged
merged 4 commits into from
Nov 9, 2014

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Nov 6, 2014

No description provided.

+ Memory mapping/unmapping.
+ Remote key pack/unpack.
+ Implement put_short() for mlx5 device.
+ Add unit test and performance test for put_short()
@yosefe yosefe assigned yosefe and shamisp and unassigned yosefe Nov 6, 2014
ucs_status_t (*query)(uct_pd_h pd, uct_pd_attr_t *pd_attr);

ucs_status_t (*mem_map)(uct_pd_h pd, void *address, size_t length,
uct_lkey_t *lkey_p);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is providing RKEY ? I think it has to come out registration function. Am I right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it comes only from rkey_unpack
user does rkey_pack(lkey) -> OOB -> rkey_unpack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I want to use it locally, do I really want to go through pack unpack flow ? Shell we provide direct access to rkey and then have this pack/unpack option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only reason i see you would want to use it locally is for tests over loopback. and in that case we do go through pack/unpack. i see no reason adding another API just for that.
also, forcing the user do pack/unpack prevents the error of just sending the rkey "as-is" to remote peer.

@mellanox-github
Copy link
Contributor

Merged build triggered.

1 similar comment
@mellanox-github
Copy link
Contributor

Merged build triggered.

@mellanox-github
Copy link
Contributor

Merged build started.

@mellanox-github
Copy link
Contributor

Merged build finished. Test FAILed.

@mellanox-github
Copy link
Contributor

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr/8/

Build Log
last 50 lines

[...truncated 1721 lines...]

[----------] 6 tests from test_config
[ RUN      ] test_config.parse_default
[       OK ] test_config.parse_default (0 ms)
[ RUN      ] test_config.parse_with_prefix
[       OK ] test_config.parse_with_prefix (0 ms)
[ RUN      ] test_config.clone
[       OK ] test_config.clone (0 ms)
[ RUN      ] test_config.set
[       OK ] test_config.set (0 ms)
[ RUN      ] test_config.performance
[       OK ] test_config.performance (2 ms)
[ RUN      ] test_config.dump
[       OK ] test_config.dump (0 ms)
[----------] 6 tests from test_config (2 ms total)

[----------] 1 test from test_component
[ RUN      ] test_component.init_cleanup
[       OK ] test_component.init_cleanup (1 ms)
[----------] 1 test from test_component (1 ms total)

[----------] 3 tests from test_uct
[ RUN      ] test_uct.query_resources
[       OK ] test_uct.query_resources (28 ms)
[ RUN      ] test_uct.open_iface
[       OK ] test_uct.open_iface (1 ms)
[ RUN      ] test_uct.connect_ep
uct/test_uct_context.cc:86: Failure
Error: No such device
terminate called after throwing an instance of 'ucs::test_abort_exception'
  what():  std::exception
make: *** [test] Aborted (core dumped)
make: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ucx-pr/workspace/test/gtest'
Build step 'Execute shell' marked build as failure
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: **/*.tap
Did not find any matching files.
[Valgrind] Files to copy:
[Valgrind] Analysing valgrind results
[Valgrind] Ending the valgrind analysis.
Anchor chain: could not read file with links: /var/lib/jenkins/jobs/gh-ucx-pr/workspace/jenkins_sidelinks.txt (No such file or directory)
[copy-to-slave] The build is taking place on the master node, no copy back to the master will take place.
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Request made to compress build log
Sending email to: [email protected] [email protected] [email protected]
[BFA] Scanning build for known causes...

[BFA] Done. 0s

Test FAILed.

goto err;
}

dev = uct_ib_iface_device(iface);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if NULL == dev ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not allocator function, it's convenience inline function to get the IB device from the IB iface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@shamisp
Copy link
Contributor

shamisp commented Nov 6, 2014

Once testing is happy, it is good to go.

* Add destructor error handling convention to CodeStyle
* Change return value of uct_ib_device_port_check() to ucs_status_t
@mellanox-github
Copy link
Contributor

Merged build triggered.

1 similar comment
@mellanox-github
Copy link
Contributor

Merged build triggered.

@mellanox-github
Copy link
Contributor

Merged build started.

@mellanox-github
Copy link
Contributor

Merged build finished. Test FAILed.

@mellanox-github
Copy link
Contributor

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr/9/

Build Log
last 50 lines

[...truncated 1721 lines...]

[----------] 6 tests from test_config
[ RUN      ] test_config.parse_default
[       OK ] test_config.parse_default (0 ms)
[ RUN      ] test_config.parse_with_prefix
[       OK ] test_config.parse_with_prefix (0 ms)
[ RUN      ] test_config.clone
[       OK ] test_config.clone (0 ms)
[ RUN      ] test_config.set
[       OK ] test_config.set (0 ms)
[ RUN      ] test_config.performance
[       OK ] test_config.performance (2 ms)
[ RUN      ] test_config.dump
[       OK ] test_config.dump (0 ms)
[----------] 6 tests from test_config (2 ms total)

[----------] 1 test from test_component
[ RUN      ] test_component.init_cleanup
[       OK ] test_component.init_cleanup (0 ms)
[----------] 1 test from test_component (0 ms total)

[----------] 3 tests from test_uct
[ RUN      ] test_uct.query_resources
[       OK ] test_uct.query_resources (3 ms)
[ RUN      ] test_uct.open_iface
[       OK ] test_uct.open_iface (1 ms)
[ RUN      ] test_uct.connect_ep
uct/test_uct_context.cc:86: Failure
Error: No such device
terminate called after throwing an instance of 'ucs::test_abort_exception'
  what():  std::exception
make: *** [test] Aborted (core dumped)
make: Leaving directory `/scrap/jenkins/jenkins/jobs/gh-ucx-pr/workspace/test/gtest'
Build step 'Execute shell' marked build as failure
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: **/*.tap
Did not find any matching files.
[Valgrind] Files to copy:
[Valgrind] Analysing valgrind results
[Valgrind] Ending the valgrind analysis.
Anchor chain: could not read file with links: /var/lib/jenkins/jobs/gh-ucx-pr/workspace/jenkins_sidelinks.txt (No such file or directory)
[copy-to-slave] The build is taking place on the master node, no copy back to the master will take place.
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Request made to compress build log
Sending email to: [email protected] [email protected] [email protected]
[BFA] Scanning build for known causes...

[BFA] Done. 0s

Test FAILed.

@mellanox-github
Copy link
Contributor

Merged build started.

@mellanox-github
Copy link
Contributor

Merged build finished. Test FAILed.

@mellanox-github
Copy link
Contributor

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr/10/

Build Log
last 50 lines

[...truncated 924 lines...]
 /bin/mkdir -p '/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/include/api'
 /usr/bin/install -c -m 644  ../../../src/uct/api/tl.h ../../../src/uct/api/uct_def.h ../../../src/uct/api/uct.h api/version.h '/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/include/api'
libtool: install: warning: relinking `libuct.la'
libtool: install: (cd /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_build/src/uct; /bin/sh /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_build/libtool  --silent --tag CC --mode=relink gcc -O3 -g -Wall -Werror -ldl -version-info 2:1:0 -libverbs -o libuct.la -rpath /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib tl/libuct_la-context.lo ib/base/libuct_la-ib_context.lo ib/base/libuct_la-ib_device.lo ib/base/libuct_la-ib_iface.lo ib/mlx5/libuct_la-ib_mlx5.lo ib/rc/libuct_la-rc_ep.lo ib/rc/libuct_la-rc_iface.lo ib/rc/libuct_la-rc_mlx5.lo -lm ../ucs/libucs.la -lz -lrt -lbfd -liberty -ldl )
libtool: install: /usr/bin/install -c .libs/libuct.so.2.0.1T /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib/libuct.so.2.0.1
libtool: install: (cd /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib && { ln -s -f libuct.so.2.0.1 libuct.so.2 || { rm -f libuct.so.2 && ln -s libuct.so.2.0.1 libuct.so.2; }; })
libtool: install: (cd /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib && { ln -s -f libuct.so.2.0.1 libuct.so || { rm -f libuct.so && ln -s libuct.so.2.0.1 libuct.so; }; })
libtool: install: /usr/bin/install -c .libs/libuct.lai /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib/libuct.la
libtool: install: /usr/bin/install -c .libs/libuct.a /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib/libuct.a
libtool: install: chmod 644 /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib/libuct.a
libtool: install: ranlib /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib/libuct.a
libtool: finish: PATH="/hpc/local/bin/:/hpc/local/bin/:/hpc/local/bin:/hpc/local/bin/:/hpc/local/bin/:/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/ibutils/bin:/sbin" ldconfig -n /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib
----------------------------------------------------------------------
Libraries have been installed in:
   /scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
make[3]: Leaving directory `/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_build/src/uct'
make[2]: Leaving directory `/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_build/src/uct'
Making install in test/perf
make[2]: Entering directory `/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_build/test/perf'
make[3]: Entering directory `/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_build/test/perf'
make[3]: Nothing to be done for `install-data-am'.
 /bin/mkdir -p '/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/bin'
  /bin/sh ../../libtool   --mode=install /usr/bin/install -c ucx_perftest '/scrap/jenkins/jobs/gh-ucx-pr/workspace/ucx-0.1.29/_inst/bin'
Build was aborted
Aborted by Eugene
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: **/*.tap
Did not find any matching files.
Anchor chain: could not read file with links: /var/lib/jenkins/jobs/gh-ucx-pr/workspace/jenkins_sidelinks.txt (No such file or directory)
[copy-to-slave] The build is taking place on the master node, no copy back to the master will take place.
No emails were triggered.
[BFA] Scanning build for known causes...

[BFA] Done. 0s

Test FAILed.

@mellanox-github
Copy link
Contributor

Merged build started.

@mellanox-github
Copy link
Contributor

Merged build finished. Test FAILed.

@mellanox-github
Copy link
Contributor

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr/11/

Build Log
last 50 lines

[...truncated 2175 lines...]
Processing 2 C/C++ errors.

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Processed 2 C/C++ errors.
+ rc=2
+ cov_url=http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr//ws/cov_build_11/c/output/errors/index.html
+ rm -f jenkins_sidelinks.txt
+ echo 1..1
+ '[' 2 -gt 0 ']'
+ echo 'not ok 1 Coverity Detected 2 failures # http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr//ws/cov_build_11/c/output/errors/index.html'
+ echo Coverity report: http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr//ws/cov_build_11/c/output/errors/index.html
Coverity report: http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr//ws/cov_build_11/c/output/errors/index.html
+ printf '%s\t%s\n' Coverity http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr//ws/cov_build_11/c/output/errors/index.html
+ module unload tools/cov
++ /usr/bin/modulecmd bash unload tools/cov
+ eval LOADEDMODULES=tools/hpc ';export' 'LOADEDMODULES;PATH=/hpc/local/bin/:/hpc/local/bin/:/hpc/local/bin:/hpc/local/bin/:/hpc/local/bin/:/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/ibutils/bin' ';export' 'PATH;_LMFILES_=/hpc/local/etc/modulefiles/tools/hpc' ';export' '_LMFILES_;unset' 'COV_HOME;'
++ LOADEDMODULES=tools/hpc
++ export LOADEDMODULES
++ PATH=/hpc/local/bin/:/hpc/local/bin/:/hpc/local/bin:/hpc/local/bin/:/hpc/local/bin/:/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/ibutils/bin
++ export PATH
++ _LMFILES_=/hpc/local/etc/modulefiles/tools/hpc
++ export _LMFILES_
++ unset COV_HOME
+ exit 2
Build step 'Execute shell' marked build as failure
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: **/*.tap
Saving reports...
Processing '/var/lib/jenkins/jobs/gh-ucx-pr/builds/2014-11-09_11-34-50/tap-master-files/coverity.tap'
Parsing TAP test result [/var/lib/jenkins/jobs/gh-ucx-pr/builds/2014-11-09_11-34-50/tap-master-files/coverity.tap].
There are failed test cases and the job is configured to mark the build as failure. Marking build as FAILURE
TAP Reports Processing: FINISH
[Valgrind] Files to copy:
[Valgrind] test/gtest/valgrind.xml
[Valgrind] Copying test/gtest/valgrind.xml to /var/lib/jenkins/jobs/gh-ucx-pr/builds/2014-11-09_11-34-50/valgrind-plugin/valgrind-results/test/gtest/valgrind.xml
[Valgrind] Analysing valgrind results
[Valgrind] workspacePath: /var/lib/jenkins/jobs/gh-ucx-pr/workspace/
[Valgrind] Ending the valgrind analysis.
Coverity    http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr//ws/cov_build_11/c/output/errors/index.html
[copy-to-slave] The build is taking place on the master node, no copy back to the master will take place.
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Request made to compress build log
Sending email to: [email protected] [email protected] [email protected]
[BFA] Scanning build for known causes...

[BFA] Done. 0s

Test FAILed.

@mellanox-github
Copy link
Contributor

Merged build triggered.

1 similar comment
@mellanox-github
Copy link
Contributor

Merged build triggered.

@mellanox-github
Copy link
Contributor

Merged build started.

@mellanox-github
Copy link
Contributor

Merged build finished. Test PASSed.

@mellanox-github
Copy link
Contributor

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com:8000/jenkins-secure/job/gh-ucx-pr/25/
Test PASSed.

@shamisp
Copy link
Contributor

shamisp commented Nov 9, 2014

👍

yosefe added a commit that referenced this pull request Nov 9, 2014
IB/RC: Initial implementation of RC transport.
@yosefe yosefe merged commit a4f304f into openucx:master Nov 9, 2014
@yosefe yosefe deleted the topic/add-rc-mlx5-transport branch November 9, 2014 17:59
alinask pushed a commit to alinask/ucx that referenced this pull request Aug 20, 2020
…nfig-max

UCP: set a hard coded value for the worker's ep_config_max value - 64
evgeny-leksikov pushed a commit to evgeny-leksikov/ucx that referenced this pull request Sep 24, 2021
…r-rndv

UCP/TAG/SEND: added eager/rndv flags to tag_send op
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants