(Inf1) Neuron Compilation OOM when model's weight changes #1064

takipipo · 2024-12-11T04:10:02Z

Description

I am able to compile the pretrained detection tasks COCO weight from ultralytics (i.e. yolov8l.pt, yolov8x.pt). However when I load the weight from https://github.com/WildChlamydia/MiVOLO?tab=readme-ov-file#demo at the Download, I cannot compile the model to neuron due to the OOM

Environments

pip list

Package                   Version
------------------------- ------------------
absl-py                   2.1.0
aiohappyeyeballs          2.4.3
aiohttp                   3.10.10
aiosignal                 1.3.1
amqp                      5.2.0
annotated-types           0.7.0
ansicolors                1.1.8
antlr4-python3-runtime    4.9.3
anyio                     4.6.2.post1
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
astor                     0.8.1
astroid                   3.3.5
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     24.2.0
Automat                   24.8.1
awscli                    1.35.14
babel                     2.16.0
beautifulsoup4            4.12.3
billiard                  4.2.1
bleach                    6.1.0
boto3                     1.35.48
botocore                  1.35.48
build                     1.2.2.post1
celery                    5.4.0
certifi                   2024.8.30
cffi                      1.17.1
charset-normalizer        3.4.0
click                     8.1.7
click-didyoumean          0.3.1
click-plugins             1.1.1
click-repl                0.3.0
cloudpickle               3.1.0
cmake                     3.30.5
colorama                  0.4.6
comm                      0.2.2
constantly                23.10.4
contourpy                 1.3.0
cryptography              43.0.3
cssselect                 1.2.0
cycler                    0.12.1
dask                      2024.10.0
debugpy                   1.8.7
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.9
distlib                   0.3.9
dmlc-nnvm                 1.19.6.0+0
dmlc-topi                 1.19.6.0+0
dmlc-tvm                  1.19.6.0+0
docutils                  0.16
dparse                    0.6.3
entrypoints               0.4
environment-kernels       1.2.0
exceptiongroup            1.2.2
executing                 2.1.0
fastapi                   0.115.3
fastjsonschema            2.20.0
filelock                  3.16.1
fonttools                 4.54.1
fqdn                      1.5.1
frozenlist                1.5.0
fsspec                    2024.10.0
gast                      0.2.2
google-pasta              0.2.0
grpcio                    1.67.0
h11                       0.14.0
h5py                      3.6.0
httpcore                  1.0.6
httpie                    3.2.3
httpx                     0.27.2
hyperlink                 21.0.0
idna                      3.10
imageio                   2.36.0
importlib_metadata        8.5.0
incremental               24.7.2
inferentia-hwm            1.17.6.0+fbcd6c853
iniconfig                 2.0.0
ipykernel                 6.29.5
ipython                   8.28.0
ipywidgets                8.1.5
islpy                     2023.1
isoduration               20.11.0
isort                     5.13.2
itemadapter               0.9.0
itemloaders               1.3.2
jedi                      0.19.1
Jinja2                    3.1.4
jmespath                  1.0.1
joblib                    1.4.2
json5                     0.9.25
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
jupyter                   1.1.1
jupyter_client            8.6.3
jupyter-console           6.6.3
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.2
jupyter_server_terminals  0.5.3
jupyterlab                4.2.5
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.3
jupyterlab_widgets        3.0.13
Keras-Applications        1.0.8
Keras-Preprocessing       1.1.2
kiwisolver                1.4.7
kombu                     5.4.2
llvmlite                  0.43.0
locket                    1.0.0
lxml                      5.3.0
Markdown                  3.7
markdown-it-py            3.0.0
MarkupSafe                3.0.2
matplotlib                3.9.2
matplotlib-inline         0.1.7
mccabe                    0.7.0
mdurl                     0.1.2
mistune                   3.0.2
multidict                 6.1.0
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  2.6.3
neuron-cc                 1.24.0.0+d58fa6134
notebook                  7.2.2
notebook_shim             0.2.4
numba                     0.60.0
numpy                     1.23.4
nvidia-cublas-cu11        11.10.3.66
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cudnn-cu11         8.5.0.96
omegaconf                 2.3.0
opencv-python             4.10.0.84
opt_einsum                3.4.0
overrides                 7.7.0
packaging                 21.3
pandas                    2.2.3
pandocfilters             1.5.1
papermill                 2.6.0
parsel                    1.9.1
parso                     0.8.4
partd                     1.4.2
pexpect                   4.9.0
pillow                    11.0.0
pip                       24.2
pip-tools                 7.4.1
pipenv                    2024.2.0
platformdirs              4.3.6
plotly                    5.24.1
pluggy                    1.5.0
prometheus_client         0.21.0
prompt_toolkit            3.0.48
propcache                 0.2.0
Protego                   0.3.1
protobuf                  3.20.1
psutil                    6.1.0
ptyprocess                0.7.0
pure_eval                 0.2.3
py-cpuinfo                9.0.0
pyasn1                    0.6.1
pyasn1_modules            0.4.1
pycparser                 2.22
pydantic                  2.9.2
pydantic_core             2.23.4
PyDispatcher              2.0.7
Pygments                  2.18.0
pylint                    3.3.1
pyOpenSSL                 24.2.1
pyparsing                 3.2.0
pyproject_hooks           1.2.0
PySocks                   1.7.1
pytest                    8.3.3
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
pytz                      2024.2
PyYAML                    6.0.2
pyzmq                     26.2.0
queuelib                  1.7.0
referencing               0.35.1
requests                  2.31.0
requests-file             2.1.0
requests-toolbelt         1.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.9.3
rpds-py                   0.20.0
rsa                       4.7.2
ruamel.yaml               0.18.6
ruamel.yaml.clib          0.2.12
s3transfer                0.10.3
safety                    2.3.5
scikit-learn              1.5.2
scipy                     1.11.4
Scrapy                    2.11.2
seaborn                   0.13.2
Send2Trash                1.8.3
service-identity          24.1.0
setuptools                69.5.1
shap                      0.46.0
six                       1.16.0
slicer                    0.0.8
sniffio                   1.3.1
soupsieve                 2.6
stack-data                0.6.3
starlette                 0.41.0
tenacity                  9.0.0
tensorboard               1.15.0
tensorflow                1.15.5.post1
tensorflow-estimator      1.15.1
termcolor                 2.5.0
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.4.0
tldextract                5.1.2
tomli                     2.0.2
tomlkit                   0.13.2
toolz                     1.0.0
torch                     1.13.1
torch-neuron              1.13.1.2.11.7.0
torchvision               0.14.1
tornado                   6.4.1
tqdm                      4.66.5
traitlets                 5.14.3
Twisted                   24.7.0
types-python-dateutil     2.9.0.20241003
typing_extensions         4.12.2
tzdata                    2024.2
ultralytics               8.2.48
ultralytics-thop          2.0.12
uri-template              1.3.0
urllib3                   2.2.3
vine                      5.1.0
virtualenv                20.27.0
w3lib                     2.2.1
wcwidth                   0.2.13
webcolors                 24.8.0
webencodings              0.5.1
websocket-client          1.8.0
Werkzeug                  3.0.5
wget                      3.2
wheel                     0.44.0
widgetsnbextension        4.0.13
wrapt                     1.16.0
yarl                      1.16.0
zipp                      3.20.2
zope.interface            7.1.1

neuron-cc -V

Neuron Compiler version 1.24.0.0+d58fa6134

HWM version 1.17.6.0-fbcd6c853
NEFF version Dynamic
TVM version 1.19.6.0+0
NumPy version 1.23.4
MXNet not available
TF not available

Log Output from Neuron Compiler

(aws_neuron_venv_pytorch_1_13_inf1) root@ip-10-104-110-148:/var/snap/amazon-ssm-agent/6312/ultralytics# ipython

Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ultralytics import NeuronYOLO
   ...: model = NeuronYOLO("yolov8x_person_face.pt")
   ...: model.export(format = "neuron")
   ...: 
Ultralytics YOLOv8.2.48 🚀 Python-3.10.12 torch-1.13.1+cu117 CPU (Intel Xeon Platinum 8275CL 3.00GHz)
Model summary (fused): 268 layers, 68125494 parameters, 0 gradients, 257.4 GFLOPs

PyTorch: starting from 'yolov8x_person_face.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 6, 8400) (130.4 MB)

AWS Neuron: starting export with torch 1.13.1.2.11.7.0...
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 278, fused = 278, percent fused = 100.0%
/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/dask/dataframe/__init__.py:42: FutureWarning: 
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)
INFO:Neuron:Compiling function _NeuronGraph$1070 with neuron-cc
INFO:Neuron:Compiling with command line: '/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]} --verbose 35'
............................WARNING:Neuron:The neuron-cc (neuron compiler) process was killed (SIG_KILL).  This typically happens when there is insufficient memory to compile and the linux Out Of Memory (OOM) killer terminates the compiler.  Consider trying compilation on an instance with more memory
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$1070; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]}' --verbose 35
Traceback (most recent call last):
  File "/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/convert.py", line 413, in op_converter
    neuron_function = self.subgraph_compiler(
  File "/opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/decorators.py", line 263, in trace
    raise subprocess.SubprocessError(
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/opt/aws_neuron_venv_pytorch_1_13_inf1/bin/neuron-cc compile /tmp/tmp5ldpdpcf/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp5ldpdpcf/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 640, 640], "float32"]}, "outputs": ["Detect_74/aten_cat_5/concat:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 278, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 7 [supported]
INFO:Neuron: => aten::_convolution: 104 [supported]
INFO:Neuron: => aten::add: 20 [supported]
INFO:Neuron: => aten::cat: 19 [supported]
INFO:Neuron: => aten::chunk: 1 [supported]
INFO:Neuron: => aten::div: 1 [supported]
INFO:Neuron: => aten::max_pool2d: 3 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::sigmoid: 1 [supported]
INFO:Neuron: => aten::silu_: 97 [supported]
INFO:Neuron: => aten::size: 3 [supported]
INFO:Neuron: => aten::softmax: 1 [supported]
INFO:Neuron: => aten::split_with_sizes: 9 [supported]
INFO:Neuron: => aten::sub: 2 [supported]
INFO:Neuron: => aten::transpose: 1 [supported]
INFO:Neuron: => aten::unsqueeze: 1 [supported]
INFO:Neuron: => aten::upsample_nearest2d: 2 [supported]
INFO:Neuron: => aten::view: 5 [supported]
AWS Neuron: export failure ❌ 644.2s: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 3
      1 from ultralytics import NeuronYOLO
      2 model = NeuronYOLO("yolov8x_person_face.pt")
----> 3 model.export(format = "neuron")

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_model.py:55, in NeuronModel.export(self, **kwargs)
     43 custom = {
     44     "imgsz": self.model.args["imgsz"],
     45     "batch": 1,
     46     "data": None,
     47     "verbose": False,
     48 }  # method defaults
     49 args = {
     50     **self.overrides,
     51     **custom,
     52     **kwargs,
     53     "mode": "export",
     54 }  # highest priority args on the right
---> 55 return NeuronExporter(overrides=args, _callbacks=self.callbacks)(model=self.model)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:319, in NeuronExporter.__call__(self, model)
    317     f[12], _ = self.export_neuronx()
    318 if neuron:  # Neuron
--> 319     f[13], _ = self.export_neuron()
    321 # Finish
    322 f = [str(x) for x in f if x]  # filter out '' and None

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:130, in try_export.<locals>.outer_func(*args, **kwargs)
    128 except Exception as e:
    129     LOGGER.info(f"{prefix} export failure ❌ {dt.t:.1f}s: {e}")
--> 130     raise e

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:125, in try_export.<locals>.outer_func(*args, **kwargs)
    123 try:
    124     with Profile() as dt:
--> 125         f, model = inner_func(*args, **kwargs)
    126     LOGGER.info(f"{prefix} export success ✅ {dt.t:.1f}s, saved as '{f}' ({file_size(f):.1f} MB)")
    127     return f, model

File /var/snap/amazon-ssm-agent/6312/ultralytics/ultralytics/engine/neuron_exporter.py:372, in NeuronExporter.export_neuron(self, prefix)
    370 LOGGER.info(f"\n{prefix} starting export with torch {torch_neuron.__version__}...")
    371 f = self.file.with_suffix(".neuron")
--> 372 ts = torch_neuron.trace(self.model, self.im, strict=False)
    373 extra_files = {"config.txt": json.dumps(self.metadata)}
    374 ts.save(str(f), _extra_files=extra_files)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/convert.py:217, in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, single_fusion_ratio_threshold, _neuron_trace, compiler_args, optimizations, separate_weights, verbose, **kwargs)
    215     logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
    216     neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 217 cu.stats_post_compiler(neuron_graph)
    219 # Wrap the compiled version of the model in a script module. Note that this is
    220 # necessary for torch==1.8.1 due to the usage of `torch.classes.model.Model`. The
    221 # custom class must be a submodule of the traced graph.
    222 neuron_graph = AwsNeuronGraphModule(neuron_graph)

File /opt/aws_neuron_venv_pytorch_1_13_inf1/lib/python3.10/site-packages/torch_neuron/convert.py:530, in CompilationUnit.stats_post_compiler(self, neuron_graph)
    526             logger.info(' => {}: {} {}'.format(
    527                 name, remaining_count, supported_string))
    529 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
--> 530     raise RuntimeError(
    531         "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
    533 if percent_operations_compiled < 50.0:
    534     logger.warning(
    535         "torch.neuron.trace was unable to compile > 50% of the operators in the compiled model!")

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

How to Reproduce

Start EC2 instance c5.2xlarge with AMI: ami-09c4564a5c7fa27d5
Install required libraries to compile the model

source /opt/aws_neuron_venv_pytorch_1_13_inf1/bin/activate
git clone https://github.com/wisesight/ultralytics.git
cd ultralytics
git checkout v8.2.48-aws-neuron
pip install .
pip install numpy==1.23.4

Compile the model

from ultralytics import NeuronYOLO
model = NeuronYOLO("yolov8x_person_face.pt")
model.export(format = "neuron")

What I've Tried

Try compile with 64GB memory instance, but still failed.

The text was updated successfully, but these errors were encountered:

FThompsonAWS · 2024-12-13T03:03:32Z

Thanks @takipipo for filing this issue. We will take a look and get back to you.

FThompsonAWS added the Inf1 label Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Inf1) Neuron Compilation OOM when model's weight changes #1064

(Inf1) Neuron Compilation OOM when model's weight changes #1064

takipipo commented Dec 11, 2024 •

edited

Loading

FThompsonAWS commented Dec 13, 2024

(Inf1) Neuron Compilation OOM when model's weight changes #1064

(Inf1) Neuron Compilation OOM when model's weight changes #1064

Comments

takipipo commented Dec 11, 2024 • edited Loading

Description

Environments

Log Output from Neuron Compiler

How to Reproduce

What I've Tried

FThompsonAWS commented Dec 13, 2024

takipipo commented Dec 11, 2024 •

edited

Loading