Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition writing and loading MLModel in Coremltools 8.1 #2404

Open
kasper0406 opened this issue Nov 21, 2024 · 2 comments
Open

Race condition writing and loading MLModel in Coremltools 8.1 #2404

kasper0406 opened this issue Nov 21, 2024 · 2 comments
Assignees
Labels
bug Unexpected behaviour that should be corrected (type)

Comments

@kasper0406
Copy link
Contributor

kasper0406 commented Nov 21, 2024

When having a python debugger attached, I have started to see a race condition when loading the converted MLModel.
I have not seen this happen without having a debugger attached, and I have not hit this prior to the 8.1 release.

Specifically it happens in the load_spec function in coremltools/models/utils.py (notice that I modified the code to try again in a loop):

Screenshot 2024-11-20 at 21 58 27

Looking in my file system, I indeed see that the file isn't there:

Screenshot 2024-11-20 at 21 59 26

However, if I run one iteration of the loop (i.e. execute the loading code, and free the Python GIL from the thread), the Manifest.json files gets created:

Screenshot 2024-11-20 at 22 01 01

Notice the timestamp for Manifest.json is 5 minutes later. The next time the loop executes, the model is usually loaded.

I did see some issues that If I step through the loop I added using a debugger, I get the following error:

an integer is required
  File "/Users/knielsen/Library/Application Support/hatch/env/virtual/stablehlo-coreml-experimental/Ux-esJH6/test.py3.12/lib/python3.12/site-packages/_pydevd_sys_monitoring\\_pydevd_sys_monitoring_cython.pyx", line 1367, in _pydevd_sys_monitoring_cython._jump_event
  File "<stringsource>", line 69, in cfunc.to_py.__Pyx_CFunc_7f6725__29_pydevd_sys_monitoring_cython_object__lParen__etc_to_py_4code_11from_offset_9to_offset.wrap
  File "/Users/knielsen/Library/Application Support/hatch/env/virtual/stablehlo-coreml-experimental/Ux-esJH6/test.py3.12/lib/python3.12/site-packages/coremltools/models/utils.py", line 256, in load_spec
    try:
  File "/Users/knielsen/Library/Application Support/hatch/env/virtual/stablehlo-coreml-experimental/Ux-esJH6/test.py3.12/lib/python3.12/site-packages/coremltools/models/model.py", line 531, in _get_proxy_and_spec
    specification = _load_spec(filename)
                    ^^^^^^^^^^^^^^^^^^^^
  File "/Users/knielsen/Library/Application Support/hatch/env/virtual/stablehlo-coreml-experimental/Ux-esJH6/test.py3.12/lib/python3.12/site-packages/coremltools/models/model.py", line 469, in __init__
    self.__proxy__, self._spec, self._framework_error = self._get_proxy_and_spec(

Program to reproduce:

Run the following code with a Python debugger attached using coremltools 8.1:

import coremltools as ct
import numpy as np
from coremltools.converters.mil import Builder as mb

@mb.program(input_specs=[
    mb.TensorSpec(shape=(2, 3, 4, 5)),
    mb.TensorSpec(shape=(2, 4, 3, 5)),
])
def mil_program(arg0, arg1):
    arg0_reshaped = mb.reshape(x=arg0, shape=(1, 120))
    arg1_reshaped = mb.reshape(x=arg1, shape=(1, 120))
    result = mb.matmul(x=arg0_reshaped, y=arg1_reshaped, transpose_x=False, transpose_y=True)
    result = mb.reshape(x=result, shape=(1,))
    return result

cml_model = ct.convert(
    mil_program,
    source="milinternal",
    minimum_deployment_target=ct.target.iOS18,
)

inputs = {
    "arg0": np.random.normal(0.0, 1.0, (2, 3, 4, 5)),
    "arg1": np.random.normal(0.0, 1.0, (2, 4, 3, 5)),
}
predictions = cml_model.predict(inputs)
print(predictions)

I briefly attempted to find the bug myself, but the diff of the 8.1 release (#2394) is humongous, making it require more effort than I want to spend.

@kasper0406 kasper0406 added the bug Unexpected behaviour that should be corrected (type) label Nov 21, 2024
@jakesabathia2
Copy link
Collaborator

@cymbalrush you might will have more context on this issue ^

@cymbalrush
Copy link
Collaborator

Investigating

@cymbalrush cymbalrush self-assigned this Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected behaviour that should be corrected (type)
Projects
None yet
Development

No branches or pull requests

3 participants