Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] fix dcu error in device event base, test=develop #41521

Merged
merged 2 commits into from
Apr 8, 2022

Conversation

qili93
Copy link
Contributor

@qili93 qili93 commented Apr 7, 2022

PR types

Bug fixes

PR changes

Others

Describe

Bug fix of missing DCU device event code when code migrating.

python -c "import paddle; paddle.utils.run_check()"
Running verify PaddlePaddle program ...
W0407 10:49:18.383044 76672 gpu_context.cc:244] Please NOTE: device: 0, GPU Compute Capability: 70.5, Driver API Version: 0.0, Runtime API Version: 22.4
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.7/site-packages/paddle/utils/install_check.py", line 266, in run_check
    _run_static_single(use_cuda, use_xpu, use_npu)
  File "/opt/conda/lib/python3.7/site-packages/paddle/utils/install_check.py", line 173, in _run_static_single
    fetch_list=[out.name, param_grads[1].name])
  File "/opt/conda/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1293, in run
    six.reraise(*sys.exc_info())
  File "/opt/conda/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1289, in run
    return_merged=return_merged)
  File "/opt/conda/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1458, in _run_impl
    return new_exe.run(list(feed.keys()), fetch_list, return_numpy)
  File "/opt/conda/lib/python3.7/site-packages/paddle/fluid/executor.py", line 541, in run
    tensors = self._new_exe.run(feed_names, fetch_list)._move_to_list()
RuntimeError: (Unavailable) event_creator_[1] shall not be nullptr.
  [Hint: event_creator_[type_id_] should not be null.] (at /workspace/Paddle/paddle/fluid/platform/device_event_base.h:70)

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 7, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See [Paddle CI Manual(https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/git_guides/paddle_ci_manual_cn.html) for details.

@qili93 qili93 requested a review from Aurelius84 April 8, 2022 04:15
@qili93 qili93 merged commit 14dba63 into PaddlePaddle:develop Apr 8, 2022
@qili93 qili93 deleted the fix_dcu_dev branch April 8, 2022 04:23
douch pushed a commit to douch/Paddle that referenced this pull request Apr 10, 2022
…#41521)

* [ROCm] fix dcu error in device event base, test=develop

* fix, test=develop
Thunderbrook pushed a commit that referenced this pull request Apr 22, 2022
* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)

cherry-pick

fix compile bug of windows cuda11.5 #41433

* fix bug of missing boost when compile cache.cc (#41449)

【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies

* Fix eager try catch (#41438) (#41477)

[Cherry-Pick]Fix eager try catch (#41438)

* Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)

Cherry-pick PR #41407

* [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)

* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest

* fix bugs of reshape double grad infermeta (#41459) (#41493)

* [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)

Co-authored-by: JingZhuangzhuang <[email protected]>

* [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)

Cherry-pick of #41521

* [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)

* Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)

* Add fill_constant_batch_size YAML and UT (#41474)

* Switch some dy2st UT to eager mode (#41382)

* Sitch some dy2st UT to eager mode

* Fix test_lstm and remove test_transformer

* Run test_resnet_v2 in old dy mode

* Unittest recover (#41431)

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove

Co-authored-by: esythan <[email protected]>

* add ssd sparse table

* fix

* add cache shuffle

* fix

* fix

* fix

* fix

* fix

* fix

* add unit test

* fix

Co-authored-by: Zhou Wei <[email protected]>
Co-authored-by: Sing_chan <[email protected]>
Co-authored-by: 0x45f <[email protected]>
Co-authored-by: pangyoki <[email protected]>
Co-authored-by: Siming Dai <[email protected]>
Co-authored-by: YuanRisheng <[email protected]>
Co-authored-by: Zhang Jun <[email protected]>
Co-authored-by: JingZhuangzhuang <[email protected]>
Co-authored-by: Qi Li <[email protected]>
Co-authored-by: esythan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants