[Bug] [AutoScheduler] [OpenCL]: TVM enqueues too many kernel launch command causing CL_OUT_OF_HOST_MEMORY error #16276

happyme531 · 2023-12-25T13:03:35Z

(For original problem report see https://discuss.tvm.apache.org/t/most-tasks-failed-with-autoscheduler-on-mali-g610-gpu/16139)

TVM defaults to run at least 1000ms for every task measurement for non-CPU target, but for some fast tasks the task is repeated too many times, for OpenCL target it means too many kernel launch command is enqueued by clEnqueueNDRangeKernel, causing an out of memory (or sometimes the driver just hang, causing timeout) error.

Lowering min_repeat_ms from the default (1000) solves the issue, however, there should be a limit about maximum repeat count for a single kernel.

Expected behavior

The AutoScheduler running without problem

Actual behavior

Every measurement result in this error: InternalError: Check failed: (e == CL_SUCCESS) is false: OpenCL Error, code=-6: CL_OUT_OF_HOST_MEMORY

Environment

RK3588 SoC with Mali-G610 MP4 GPU
ARM vendor GPU driver, OpenCL 3.0
Debian 11
TVM master branch

Steps to reproduce

Try the lightglue model from original forum link, or the script below:

import tvm
from tvm import auto_scheduler, te


# simple add operator, runs very fast
@auto_scheduler.register_workload
def simple_add(N, A, B):
    a = te.placeholder((N,), name="a")
    b = te.placeholder((N,), name="b")
    c = te.compute(a.shape, lambda i: a[i] + b[i], name="c")
    return [a, b, c]


if __name__ == '__main__':
    target = tvm.target.Target(target='opencl', host='llvm')
    task = auto_scheduler.SearchTask(func=simple_add, args=(1024, 1024, 1024), target=target)
    tune_options = auto_scheduler.TuningOptions(
        num_measure_trials=64,
        num_measures_per_round=16,
        verbose=5,
    )
    task.tune(tune_options)

Triage

backend: opencl
tune: auto_scheduler

cc @echuraev @elvin-n

The text was updated successfully, but these errors were encountered:

happyme531 added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Dec 25, 2023

happyme531 mentioned this issue Feb 6, 2024

[Bug] [OpenCL] [Mali]: Models with too many nodes failed to profile/run with CL_OUT_OF_HOST_MEMORY error #16530

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [AutoScheduler] [OpenCL]: TVM enqueues too many kernel launch command causing CL_OUT_OF_HOST_MEMORY error #16276

[Bug] [AutoScheduler] [OpenCL]: TVM enqueues too many kernel launch command causing CL_OUT_OF_HOST_MEMORY error #16276

happyme531 commented Dec 25, 2023 •

edited by github-actions bot

Loading

[Bug] [AutoScheduler] [OpenCL]: TVM enqueues too many kernel launch command causing CL_OUT_OF_HOST_MEMORY error #16276

[Bug] [AutoScheduler] [OpenCL]: TVM enqueues too many kernel launch command causing CL_OUT_OF_HOST_MEMORY error #16276

Comments

happyme531 commented Dec 25, 2023 • edited by github-actions bot Loading

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

happyme531 commented Dec 25, 2023 •

edited by github-actions bot

Loading