Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A10的GPU上运行模型,过一段时间就会报CUDA error(700), an illegal memory access was encountered,机器看不出任何问题 #1964

Closed
bdbaigc opened this issue Aug 18, 2023 · 1 comment

Comments

@bdbaigc
Copy link

bdbaigc commented Aug 18, 2023

ERROR 2023-08-18 22:36:56,242 [operator.py:1079] [text_quality] failed to predict. (data_id=517045 log_id=517045) [text_quality|6] Failed to process(batch: [517045]): (External) CUDA error(700), an illegal memory access was encountered.
[Hint: Please search for the error code(700) on website (https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038) to get Nvidia's official solution and advice about CUDA Error.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:252)
. Please check the input dict and checkout PipelineServingLogs/pipeline.log for more details.
INFO 2023-08-18 22:36:56,242 [operator.py:1454] prometheus inf count +1
ERROR 2023-08-18 22:36:56,247 [dag.py:420] (data_id=517045 log_id=0) Failed to predict: [text_quality] failed to predict. (data_id=517045 log_id=517045) [text_quality|6] Failed to process(batch: [517045]): (External) CUDA error(700), an illegal memory access was encountered.
[Hint: Please search for the error code(700) on website (https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038) to get Nvidia's official solution and advice about CUDA Error.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:252)
. Please check the input dict and checkout PipelineServingLogs/pipeline.log for more details

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A10 On | 00000000:5E:00.0 Off | 0 |
| 0% 44C P0 57W / 150W | 6606MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|

没看出任何问题,

@xinj7
Copy link

xinj7 commented Oct 7, 2023

同问

@paddle-bot paddle-bot bot closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants