1.12.1 incompatible with c++ built for 1.12.0 and vice versa #88301

charliebudd · 2022-11-02T10:02:48Z

🐛 Describe the bug

I have a c++/cuda extension which I compile and distribute via PyPI. My build matrix covers all "recent" PyTorch major and minor versions (aswell as python and CUDA versions, if anybody is interest I host my build system as a git workflow here). So far this has been enough to ensure my code runs and I have not had to take patch version into account. However, I have recently had an issue (described below) with 1.12.1. My understanding is that only critical bug fixes should be released in patches. If this is not the case, then distributing built c++ PyTorch extensions would be even harder than it currently is.

The error I get is when passing a jit model down to c++. Before 1.12.1, the _c attribute on a jit script model returned a torch.jit.ScriptModule which will cast to a torch::jit::Module when passed to a c++ function. However, in 1.12.1, the _c attribute returns a torch.ScriptModule which no longer casts in the same way, causing an exception to be thrown.

Minimal code example below. To recreate, compile the c++ for torch 1.12.0 and run the python script with torch 1.12.1 and/or vice versa. When the version matches, the code will run fine, else there is an error.

import torch
import myextension

model = torch.jit.load("model.pt")
myextension.test(model._c)

#include <torch/extension.h>

void test_function(torch::jit::Module model)
{
    py::print("Test function called");
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) 
{
    m.def("test", &test_function);
}

See forum discussion on this issue here.

Versions

torch 1.12.0 and 1.12.1

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel @jbschlosser

The text was updated successfully, but these errors were encountered:

malfet · 2022-11-02T13:05:19Z

In general, there are no guarantee for binary compatibility between even the patch versions (as adding a new virtual function to base class will offset vptr of all inherited classes).
Also, even between patch revisions one is allowed to change internal apis (and _c clearly indicates that this method is internal)

But the change as you are describing it a bit unexpected

malfet · 2022-11-02T13:27:12Z

Hmm, is this OS specific, as I see torch.ScriptModule in both 1.12.0 and 1.12.1 on my MacOS:

% conda run -n torch-1.12.0 python -c "import torch; m=torch.nn.Linear(2, 3); n=torch.jit.script(m); print(torch.__version__, type(n._c))"
1.12.0 <class 'torch.ScriptModule'>

% conda run -n torch-1.12.1 python -c "import torch; m=torch.nn.Linear(2, 3); n=torch.jit.script(m); print(torch.__version__, type(n._c))"
1.12.1 <class 'torch.ScriptModule'>

charliebudd · 2022-11-02T13:40:41Z

n general, there are no guarantee for binary compatibility between even the patch versions (as adding a new virtual function to base class will offset vptr of all inherited classes).

The interface between python and the compiled c++ extension module is ultimately an interpreted interface, if I'm not mistaken.

Also, even between patch revisions one is allowed to change internal apis (and _c clearly indicates that this method is internal)

The use of _c is a bit of hack to have a way of passing a model down to c++ in the first place. This should really be a way of achieving this with the public API. There is a forum post bemoaning this issue which is reasonable trafficked (considering its obscure nature).

Hmm, is this OS specific, as I see torch.ScriptModule in both 1.12.0 and 1.12.1 on my MacOS:

Yes this does appear to be specific to Linux. I just tried on Windows with the same result as you found on MacOS. But for Linux your test command gives <class 'torch._C.ScriptModule'> for pre 1.12.1 and <class 'torch.ScriptModule'> for 1.12.1.

zou3519 · 2022-11-02T14:10:16Z

Possibly related #80489?

charliebudd · 2022-11-02T14:15:52Z

But for Linux your test command gives <class 'torch._C.ScriptModule'> for pre 1.12.1 and <class 'torch.ScriptModule'> for 1.12.1.

Sorry this is incorrect, I get <class 'torch._C.ScriptModule'> for pre 1.12 and <class 'torch.ScriptModule'> for 1.12.0 AND 1.12.1. I still have the error as described in the original post, but your command is not suggesting any difference. I am currently unable to reproduce the torch.jit.ScriptModule type from _c that I originally reported.

malfet · 2022-11-02T14:20:02Z

Possibly related #80489?

I though about that, but c++ ABI changes should not result in different namespaces

malfet · 2022-11-02T14:30:37Z

@charliebudd not sure what original post you are referring to, but can you please post somewhere what issue are you seeing.

In 1.12 builds we've migrated to a newer compiler, which resulted in binary incompatibility between CUDA and CPU wheels, which I've fixed by explicitly specifying c++abi version. (one can check the ABI version using torch._C._PYBIND11_BUILD_ABI property )

charliebudd · 2022-11-02T14:34:54Z

Sorry the conditions appear harder to replicate than I thought, I'll try and put together a reproducible example.

charliebudd · 2022-11-02T15:28:36Z

Okay, I've made a gist demonstrating the issue...
https://gist.github.com/charliebudd/fef898d9bf6505ea455a3beed29880d0

Just create a virtual environment and run the run.sh script to see the error. It takes some time as it has to install different torch versions a few times.

I struggled to recreate the issue, as it seems to depend on exactly what wheel you install. As my build system needs to use torch versions with specific backends, I'm installing torch versions in the following way...
pip install -f https://download.pytorch.org/whl/torch_stable.html torch==1.12.1+cu113

If you simply pip install torch 1.12.1 it seems to work fine.

malfet added triage review module: cpp Related to C++ API oncall: jit Add this issue/PR to JIT oncall triage queue labels Nov 2, 2022

malfet removed the triage review label Nov 2, 2022

mgoin mentioned this issue Feb 22, 2024

Upgrade to torch==2.2.1 vllm-project/vllm#2804

Closed

mfbalin mentioned this issue Jul 9, 2024

Issue with libgraphbolt dynamic loading. conda-forge/dgl-feedstock#41

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.12.1 incompatible with c++ built for 1.12.0 and vice versa #88301

1.12.1 incompatible with c++ built for 1.12.0 and vice versa #88301

charliebudd commented Nov 2, 2022 •

edited by pytorch-bot bot

Loading

malfet commented Nov 2, 2022

malfet commented Nov 2, 2022

charliebudd commented Nov 2, 2022 •

edited

Loading

zou3519 commented Nov 2, 2022

charliebudd commented Nov 2, 2022 •

edited

Loading

malfet commented Nov 2, 2022

malfet commented Nov 2, 2022

charliebudd commented Nov 2, 2022

charliebudd commented Nov 2, 2022

1.12.1 incompatible with c++ built for 1.12.0 and vice versa #88301

1.12.1 incompatible with c++ built for 1.12.0 and vice versa #88301

Comments

charliebudd commented Nov 2, 2022 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

malfet commented Nov 2, 2022

malfet commented Nov 2, 2022

charliebudd commented Nov 2, 2022 • edited Loading

zou3519 commented Nov 2, 2022

charliebudd commented Nov 2, 2022 • edited Loading

malfet commented Nov 2, 2022

malfet commented Nov 2, 2022

charliebudd commented Nov 2, 2022

charliebudd commented Nov 2, 2022

charliebudd commented Nov 2, 2022 •

edited by pytorch-bot bot

Loading

charliebudd commented Nov 2, 2022 •

edited

Loading

charliebudd commented Nov 2, 2022 •

edited

Loading