[Bug] [AutoScheduler] [OpenCL]: TVM enqueues too many kernel launch command causing CL_OUT_OF_HOST_MEMORY error #16276
Labels
needs-triage
PRs or issues that need to be investigated by maintainers to find the right assignees to address it
type: bug
(For original problem report see https://discuss.tvm.apache.org/t/most-tasks-failed-with-autoscheduler-on-mali-g610-gpu/16139)
TVM defaults to run at least 1000ms for every task measurement for non-CPU target, but for some fast tasks the task is repeated too many times, for OpenCL target it means too many kernel launch command is enqueued by clEnqueueNDRangeKernel, causing an out of memory (or sometimes the driver just hang, causing timeout) error.
Lowering min_repeat_ms from the default (1000) solves the issue, however, there should be a limit about maximum repeat count for a single kernel.
Expected behavior
The AutoScheduler running without problem
Actual behavior
Every measurement result in this error:
InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-6: CL_OUT_OF_HOST_MEMORY
Environment
RK3588 SoC with Mali-G610 MP4 GPU
ARM vendor GPU driver, OpenCL 3.0
Debian 11
TVM master branch
Steps to reproduce
Triage
backend: opencl
tune: auto_scheduler
cc @echuraev @elvin-n
The text was updated successfully, but these errors were encountered: