Random Access Violations on shutdown #1977

rohanjain101 · 2022-10-18T00:54:24Z

Environment

Pythonnet version: 3.0.0.post1
Python version: 3.8.10
Operating System: Windows 11
.NET Runtime: .NET Framework 472

Details

Describe what you were trying to get done.

I am getting random access violations during shutdown in pythonnet. The issue can be re-produced with a C# function as simple as

public static int Add(int a, int b) => a + b;

I get the access violation; however, it is not deterministic at all. Some times, it succeeds as expected, other times I get this. Note that I am upgrading from pythonnet 2.5.2 to pythonnet 3.0.0.post1 so this code used to work. All I am doing is invoking this code from python. I am running this python code that uses pythonnet in pytest. I haven't seen this when we were using pythonnet 2.5.2.

If there was a crash, please include the traceback here.

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Python.Runtime.Runtime.PyObject_TYPE(BorrowedReference op)
   at Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
   at Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
   at Python.Runtime.Runtime.Shutdown()
   at Python.Runtime.PythonEngine.Shutdown()
   at Python.Runtime.Loader.Shutdown(IntPtr data, Int32 size)

filmor · 2022-10-18T06:28:10Z

Can you please give the exact example that produces the error for you (Python and C# code)?

rohanjain101 · 2022-10-18T16:34:27Z

We're using pythonnet in many different scenarios, and this issue seems to be happening randomly. This is the simplest example I can find that (occasionally) reproduces the failure

using System;
using System.Linq;

namespace Test.Local
{
    public static partial class Operations
    {
        public static decimal? explicit_cast_float128_Float128_Scalar(decimal value) =>
            (decimal?)value;
    }
}

# Python Code:
import clr
clr.AddReference(r"Test.Local.dll")
from Test.Local import Opertations
from System import Decimal
Operations.explicit_cast_float128_Float128_Scalar(Decimal(123))

Note that we are not able to consistently repro the issue. We are running a series of unit tests using pytest, and every run, we get different tests failing with this same stack trace. Is there something specific we could be doing in our code that may cause this crash in the new release? I ask because we're getting this issue when upgrading to pythonnet3.0.0post1, we're currently on pythonnet2.5.2 and haven't had this issue.

filmor · 2022-10-18T17:14:25Z

One major difference is that older versions of Python.NET didn't even try to properly clean up in the end. How are your tests set up that you get test failures due to shutdowns in them? Are they spinning up Python sub-processes? Can the issue be reproduced by just taking the example that you gave and running it a few thousand times or is this just an example from your test suite?

rohanjain101 · 2022-10-18T20:38:47Z

This is an example from the test suite, I'll try running it a few thousand times to see if I can repro. We're just using pytest, AFAIK, it creates one python process, runs a set of tests sequentially, then terminates the python process. What's interesting is that the tests themselves pass, but when the python process is shutting down, that's where we start getting these random access violations.

Do we need to have certain cleanup code in between each test that runs in the process?

rohanjain101 · 2022-10-19T23:04:06Z

Still working on a repro outside of our unit tests, is there anything from the stack trace that may help you identify if there's something we're doing that may cause this? Pytest runs all the tests in one python process I confirmed that, so the shutdown should only be happening once, shouldn't be spinning up python sub-processes.

donno · 2022-12-01T08:27:59Z

Pythonnet version: 3.0.1
Python: 3.8 and 3.11.

I encountered a crash with the same traceback as the first post when running unittests with Python Development Mode activated (-X dev)

I was able to reproduce this crash outside of my tests with the following sample:

import clr
clr.AddReference("System.Windows.Forms")
from System.Windows.Forms import Form

def example():  
  form = Form()
  form.Show()

example()

Save the above as issue1977.py and run with python -X dev issue1977.py on Windows 10 results in an access violation except.

In my case I am not using System.Windows.Form but a different assembly and once I construct a class from that assembly after importing then I get this issue.

Python 3.7 doesn't have experience the same problem.

rohanjain101 · 2022-12-01T23:50:55Z

Thanks @donno . Are you able to repro this crash deterministically or do you have the same issue as us where it seems to be happening randomly? @filmor is there anything we can do to avoid this crash?

donno · 2022-12-02T11:11:18Z

For me, the above snippet is deterministic., it results in the System.AccessViolationException every time.

filmor · 2022-12-02T11:56:57Z

The actual issue is not the dev mode as such, but that it uses the debug mode for the allocator. If you just want to use all of the other warnings, you can explicitly force the non-debug one by setting the PYTHONMALLOC=default environment variable. I don't see the exact same backtrace, though, so my hunch is that this is a different bug:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Runtime.InteropServices.Marshal.ReadInt32(IntPtr ptr, Int32 ofs)
   at Python.Runtime.ManagedType.TryFreeGCHandle(BorrowedReference reflectedClrObject, BorrowedReference type)
   at Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
   at Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
   at Python.Runtime.Runtime.Shutdown()
   at Python.Runtime.PythonEngine.Shutdown()
   at Python.Runtime.Loader.Shutdown(IntPtr data, Int32 size)

donno · 2022-12-03T13:47:49Z

Ah, yes my traceback had the call to ReadInt32 rather than PyObject_TYPE from the first post.

Thanks for narrowing down my specific case and providing an alternative to still get the benefits I was aiming for with the dev mode.

maxcapodi78 · 2022-12-06T16:58:01Z

@filmor I tried disabling the PYTHONMALLOC default but still get it (see below)
Should I do something before the fix is released to avoid the issue?

maxcapodi78 · 2022-12-06T17:13:12Z

@filmor Ok I've realized that settings PYTHONMALLOC=malloc made the error disappear? So, maybe should be used this instead of default?

filmor · 2022-12-06T20:04:08Z

None of these should be necessary, but running Python.NET with anything but the default allocation is undertested (as in, not at all). My guess is that somewhere in one of these debug hooks, some .NET code is evaluated and throws (across FFI boundaries). I'll have some time to debug this this weekend.

rohanjain101 · 2023-01-06T04:42:27Z

@filmor just wondering if there's something we should do in the mean time to mitigate this issue?

jgmarcel · 2023-02-07T18:40:05Z

I confirm I am also experiencing this error:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Python.Runtime.Runtime.PyObject_TYPE(BorrowedReference op)
   at Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
   at Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
   at Python.Runtime.Runtime.Shutdown()
   at Python.Runtime.PythonEngine.Shutdown()
   at Python.Runtime.Loader.Shutdown(IntPtr data, Int32 size)

Unfortunately I cannot provide you with the C# code that triggers the problem because I did not write it myself. My Python code that calls that code is very much similar to what @donno posted above.

I am running python 3.9.16 and pythonnet 3.0.1. The error does not manifest itself in version 2.5.2.

rohanjain101 · 2023-03-10T20:15:29Z

@filmor I was wondering if there's any update on this issue or if there's anything we can do to work around this issue. Would you suggest we force using the default allocator?

jgmarcel · 2023-05-05T19:24:58Z

Should we maintain any hopes that this issue will be addressed anytime soon?

lostmsu · 2023-05-05T19:55:19Z

@jgmarcel we can not fix the issue we can not reproduce. Does your scenario prohibit use of malloc allocator as suggested above?

jgmarcel · 2023-05-05T20:18:02Z

@jgmarcel we can not fix the issue we can not reproduce. Does your scenario prohibit use of malloc allocator as suggested above?

Unfortunately it did not work:

Failed to shutdown pythonnet: System.NullReferenceException: Object reference not set to an instance of an object.
   at Python.Runtime.BorrowedReference.DangerousGetAddress()
   at Python.Runtime.ManagedType.TryFreeGCHandle(BorrowedReference reflectedClrObject, BorrowedReference type)
   at Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
   at Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
   at Python.Runtime.Runtime.Shutdown()
   at Python.Runtime.PythonEngine.Shutdown()
   at Python.Runtime.Loader.Shutdown(IntPtr data, Int32 size)
   at Python.Runtime.BorrowedReference.DangerousGetAddress()
   at Python.Runtime.ManagedType.TryFreeGCHandle(BorrowedReference reflectedClrObject, BorrowedReference type)
   at Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
   at Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
   at Python.Runtime.Runtime.Shutdown()
   at Python.Runtime.PythonEngine.Shutdown()
   at Python.Runtime.Loader.Shutdown(IntPtr data, Int32 size)Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\Users\prodbwgi\Anaconda3\envs\bwgi\lib\site-packages\pythonnet\__init__.py", line 158, in unload
    raise RuntimeError("Failed to call Python.NET shutdown")
RuntimeError: Failed to call Python.NET shutdown

jgmarcel · 2023-05-05T20:22:01Z

The error above was preceded by this one:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Python.Runtime.Runtime.PyObject_TYPE(BorrowedReference op)
   at Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
   at Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
   at Python.Runtime.Runtime.Shutdown()
   at Python.Runtime.PythonEngine.Shutdown()
   at Python.Runtime.Loader.Shutdown(IntPtr data, Int32 size)

I guess I am stuck to version 2.5.2?

filmor · 2023-05-06T08:07:30Z

@jgmarcel If you can't provide a reproducible example, we can't debug. It's as easy as that. I could reproduce the example from the start of the thread and figured out that it had something to do with allocator, using the malloc allocator made /that/ problem go away. It's entirely possible that your problem, while exhibiting a similar backtrace, is different.

jgmarcel · 2023-05-08T00:33:58Z

@filmor Sure thing. The C# code I am calling from Python is distributed in a DLL shipped with a commercial product, so I am afraid I cannot include it in my report. My Python code, on the other hand, is very simple and just calls the C# code as instructed by the software vendor. Is there anything else (an output, some profiling results, etc.) I could provide you with that would be useful?

dgrunwald · 2023-05-11T13:20:22Z

This is a use-after-free; and switching the allocator only changes the likelyhood of the problem resulting in a crash.

I've debugged the case with -Xdev a bit, and what's happening is that a Python object is getting freed because its last reference is released by:

Python.Runtime.Runtime.XDecref(Python.Runtime.StolenReference)
Python.Runtime.Finalizer.DisposeAll()
Python.Runtime.Runtime.TryCollectingGarbage(Int32, Boolean)
Python.Runtime.Runtime.Shutdown()
Python.Runtime.PythonEngine.Shutdown()

Due to the use of a debug allocator, this overwrites the ob_type field of the freed object with a 0xdddddddd bit pattern.

After that, NullGCHandles(CLRObject.reflectedObjects); is called. But the reflectedObjects still contain a pointer to the deleted PyObject, so PyObject_TYPE is a use-after-free and retrieves the invalid PyTypeObject* 0xdddddddd. Attempting to dereference that to get at the tp_clr_inst field results in the crash:

   at System.Runtime.InteropServices.Marshal.ReadInt32(IntPtr ptr, Int32 ofs)
   at Python.Runtime.ManagedType.TryFreeGCHandle(BorrowedReference reflectedClrObject, BorrowedReference type)
   at Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
   at Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
   at Python.Runtime.Runtime.Shutdown()
   at Python.Runtime.PythonEngine.Shutdown()

With the default allocator (without -Xdev; i.e. pymalloc), I've also observed cases where PyObject_TYPE crashes directly, but those are more tricky to reproduce. That likely happens because the freed object was the last in its arena, so pymalloc returned the memory to the system. But it is likely that the same bug is the cause for both.

Switching to PYTHONMALLOC=malloc just means the memory is more likely to stay around long enough to avoid the crash, making the use-after-free somewhat harmless. But with the default pymalloc allocator, it's likely to cause real crashes; and with the debug allocator it's basically guaranteed to crash.

siegfriedpammer · 2023-05-15T09:41:20Z

Would this be an appropriate fix?

diff --git a/src/runtime/Finalizer.cs b/src/runtime/Finalizer.cs
index 713564f..d6bbc3b 100644
--- a/src/runtime/Finalizer.cs
+++ b/src/runtime/Finalizer.cs
@@ -229,6 +229,7 @@ namespace Python.Runtime

                         IntPtr copyForException = obj.PyObj;
                         Runtime.XDecref(StolenReference.Take(ref obj.PyObj));
+                        CLRObject.reflectedObjects.Remove(copyForException);
                         collected++;
                         try
                         {
@@ -236,7 +237,7 @@ namespace Python.Runtime
                         }
                         catch (Exception e)
                         {
-                            HandleFinalizationException(obj.PyObj, e);
+                            HandleFinalizationException(copyForException, e);
                         }
                     }

The fix is simple: remove a possible reference to the currently disposed object from the reflectedObjects set. If the item does not exist the Remove is a no-op.

I am not entirely sure why a variable copyForException is not used in the catch block.

rohanjain101 · 2023-05-18T20:02:19Z

@siegfriedpammer could you create a PR with this fix? We have found a scenario on python 3.11 that hits this crash deterministically and could test that this fix resolves this issue.

siegfriedpammer · 2023-05-19T04:39:09Z

I think, the above fix does only work if a PyObject only has one reference left. For my own use of the library I have just removed the offending call to NullGCHandles because the process using the library is cleaned up by the OS anyway. The .NET runtime dies when the Python runtime dies when the process dies.

But I hope this is fixed at some point by someone more knowledgeable about pythonnet so I don't have to patch the library again if I ever need to update. Thanks!

johnpp143 · 2023-06-02T11:47:55Z

I see the same issue when I am running robot framework tests that use 3.0.1 version of pythonnet. When I downgrade I don't see the issue anymore. Will this be fixed soon?

lostmsu · 2023-06-02T18:51:28Z

But the reflectedObjects still contain a pointer to the deleted PyObject

So this is not a trivial issue, and we need a reproducible example to debug on our side.

The thing is: if PyObject is alive and is referenced from .NET, it is present in reflectedObjects and also it has a GC handle pointing back to .NET, which means that the .NET object can not end up in the finalization queue, because there is a GC handle that references it.

The only exception from this rule I am aware of is Python types that are derived from .NET types. Under certain circumstances they change the GC handle type to weak reference, which allows it to be collected and placed into finalization queue. See

pythonnet/src/runtime/Types/ClassDerived.cs

Line 87 in a1cd73f

GCHandle gc = GCHandle.Alloc(self, GCHandleType.Weak);

There's a force break loops bit in Shutdown, but when it nulls GC handles it also cleans the reflectedObjects collection.

dgrunwald · 2023-06-03T06:13:43Z

The example from #1977 (comment) is perfectly reproducible for us when using the debug allocator (-Xdev).

rohanjain101 · 2023-06-03T18:52:12Z

@lostmsu @filmor is there any reason for a manual shutdown/cleanup in the first place instead of relying on the OS as older versions of pythonnet did?

siegfriedpammer · 2023-06-03T19:11:58Z

Yes, please add an option to disable the shutdown code completely. I have had cases where simply Ctrl+Cing the process caused a .NET stack trace to be printed, which should not be the case. I have tried monkey-patching the pythonnet.unload function, but it made everything worse, that is, even more annoying stack traces printed to the console.

filmor · 2023-06-04T07:51:48Z

PRs are welcome
The problem with not cleaning up explicitly is that Python and .NET still have references on each other and it's completely random whether destructors/finalizers are actually run properly
The slow down is not due to the cleanup as such but because of the serialization that is run to support AppDomain switching, there is a ticket to make it (and thus appdomain support) optional I think and it should also be completely deactivated in case of hosting in Python

agoodwin · 2023-09-06T12:06:28Z

Hi @filmor, I had a look for the AppDomain switching ticket you were referring to in your third point, would I be right to understand that it's these ones: #1781 and #2008?

And just to clarify, would I be right to understand that when those tickets are resolved, we should expect to have an option to disable AppDomain support (or, it would just always be disabled in the use-case of using Python.NET to invoke .NET code from Python), and if this was to be done then we should then expect this "random access violation on shutdown" behaviour to simply disappear as a nice side-benefit?

vreindelESZ · 2023-09-25T12:29:02Z

@filmor I have the same Problem. This code does NOT create any error messages and always works perfectly fine:

def cascade_function_sdata_filepaths(ordered_list_data_paths):
    check_number_of_ports_is_equal = []
    for sdata in ordered_list_data_paths:
        sdata = _Generic.LoadSParamData[_UncNumber](sdata)
        nports = sdata.NPorts
        check_number_of_ports_is_equal.append(nports)
    print(check_number_of_ports_is_equal)
    # Überprüfe, ob alle nports gleich sind
    if all(x == check_number_of_ports_is_equal[0] for x in check_number_of_ports_is_equal):
        nports = check_number_of_ports_is_equal[0]
        print("Alle nports sind gleich. Wert: ", nports)
        if len(ordered_list_data_paths) == 1:
            data_path_one = ordered_list_data_paths[0]
            s1data = _Generic.LoadSParamData[_UncNumber](data_path_one)
            print(s1data)

#from here different code below#######################################

        if len(ordered_list_data_paths) == 2:
            data_path_one = ordered_list_data_paths[0]
            s1data = _Generic.LoadSParamData[_UncNumber](data_path_one)
            print(s1data)

            data_path_two = ordered_list_data_paths[1]
            s2data = _Generic.LoadSParamData[_UncNumber](data_path_two)
            print(s2data)
            result = SParamTools.Cascade[_UncNumber](s1data, s2data)
            return(result)
        
        if len(ordered_list_data_paths) == 3:
            data_path_one = ordered_list_data_paths[0]
            s1data = _Generic.LoadSParamData[_UncNumber](data_path_one)
            print(s1data)

            data_path_two = ordered_list_data_paths[1]
            s2data = _Generic.LoadSParamData[_UncNumber](data_path_two)
            print(s2data)
            result = SParamTools.Cascade[_UncNumber](s1data, s2data)

            data_path_three = ordered_list_data_paths[2]
            s3data = _Generic.LoadSParamData[_UncNumber](data_path_three)
            result = SParamTools.Cascade[_UncNumber](result, s3data)
            return(result)

but this code which is just a loop always produces the error:

def cascade_function_sdata_filepaths(ordered_list_data_paths):
    check_number_of_ports_is_equal = []
    for sdata in ordered_list_data_paths:
        sdata = _Generic.LoadSParamData[_UncNumber](sdata)
        nports = sdata.NPorts
        check_number_of_ports_is_equal.append(nports)
    print(check_number_of_ports_is_equal)
    # Überprüfe, ob alle nports gleich sind
    if all(x == check_number_of_ports_is_equal[0] for x in check_number_of_ports_is_equal):
        nports = check_number_of_ports_is_equal[0]
        print(f"Alle nports sind gleich. Wert: {nports}")
        if len(ordered_list_data_paths) == 1:
            data_path_one = ordered_list_data_paths[0]
            s1data = _Generic.LoadSParamData[_UncNumber](data_path_one)
            print(s1data)

#from here different code below#######################################

        result = None
        for data_path in ordered_list_data_paths:
            sdata = _Generic.LoadSParamData[_UncNumber](data_path)
            print(sdata)
            if result is None:
                result = sdata
            else:
                result = SParamTools.Cascade[_UncNumber](result, sdata)
        result_data = create_dataarray_from_loaded_sdatb(result)
        print(result_data)
        return result

(German) errormessage:
Unbehandelte Ausnahme: System.AccessViolationException: Es wurde versucht, im geschützten Speicher zu lesen oder zu schreiben. Dies ist häufig ein Hinweis darauf, dass anderer Speicher beschädigt ist.
bei Python.Runtime.Runtime.PyObject_TYPE(BorrowedReference op)
bei Python.Runtime.Runtime.NullGCHandles(IEnumerable`1 objects)
bei Python.Runtime.Runtime.TryCollectingGarbage(Int32 runs, Boolean forceBreakLoops)
bei Python.Runtime.Runtime.Shutdown()
bei Python.Runtime.PythonEngine.Shutdown()
bei Python.Runtime.Loader.Shutdown(IntPtr data, Int32 size)

doctormachine · 2023-10-31T04:56:35Z

To avoid the above exception error, I have removed the below lines from the Runtime.cs file:

if (!HostedInPython && !ProcessIsTerminating)
{
// avoid saving dead objects
TryCollectingGarbage(runs: 3);

            RuntimeData.Stash();
        }

        AssemblyManager.Shutdown();
        OperatorMethod.Shutdown();
        ImportHook.Shutdown();

        ClearClrModules();
        RemoveClrRootModule();

        NullGCHandles(ExtensionType.loadedExtensions);
        ClassManager.RemoveClasses();
        TypeManager.RemoveTypes();
        _typesInitialized = false;

        MetaType.Release();
        PyCLRMetaType.Dispose();
        PyCLRMetaType = null!;

        Exceptions.Shutdown();
        PythonEngine.InteropConfiguration.Dispose();
        DisposeLazyObject(clrInterop);
        DisposeLazyObject(inspect);
        DisposeLazyObject(hexCallable);
        PyObjectConversions.Reset();

        PyGC_Collect();
        bool everythingSeemsCollected = TryCollectingGarbage(MaxCollectRetriesOnShutdown,
                                                             forceBreakLoops: true);
        Debug.Assert(everythingSeemsCollected);

        Finalizer.Shutdown();
        InternString.Shutdown();

        ResetPyMembers();

duyang76 · 2024-03-06T02:16:13Z

We get similar issue in our application as well. Here are more details of our use case:

We have a very heavy .net layer to deal with data and business logic
We use pythonnet in our python application to interop with our .net layer
We frequent get AccessViolationException when the python application shutdown. It is not fully deterministic, but we are able to reproduce the exception every time with a reasonably simple python code. The stack trace is always the same (as below).

Python.Runtime.dll!Python.Runtime.Runtime.PyObject_TYPE(Python.Runtime.BorrowedReference op) Line 913.
Python.Runtime.dll!Python.Runtime.Runtime.NullGCHandles(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<System.IntPtr, string>> objects) Line 479.
Python.Runtime.dll!Python.Runtime.Runtime.TryCollectingGarbage(int runs, bool forceBreakLoops) Line 356.
Python.Runtime.dll!Python.Runtime.Runtime.Shutdown() Line 300.
Python.Runtime.dll!Python.Runtime.PythonEngine.Shutdown() Line 395.
Python.Runtime.dll!Python.Runtime.Loader.Shutdown(System.IntPtr data, int size) Line 49.

I spent a lot of time to go over the pythonnet code, and debugging info into pythonnet, step through the execution in Visual Studio. I think I have a better understanding of the pythonnet implementation (which is complicated but elegant) and may find something relevant (at least to our use case).

When a .net object is used on the python side (say iterating through an IEnumerable), CLRObject.Create() is called (see the stack trace below)

Python.Runtime.dll!Python.Runtime.CLRObject.Create(object ob, Python.Runtime.BorrowedReference tp) Line 28.
Python.Runtime.dll!Python.Runtime.CLRObject.GetReference(object ob, System.Type type) Line 51.
Python.Runtime.dll!Python.Runtime.Converter.ToPython(object value, System.Type type) Line 184.
Python.Runtime.dll!Python.Runtime.Iterator.tp_iternext(Python.Runtime.BorrowedReference ob) Line 46.

To create the python object, CLRObject.Create() calls Runtime.PyType_GenericAlloc(), which I think does not increase the refcount of the new object.
After the python object is used, the refcount decreases to 0. The memory space allocated in step 2 may be reused or reclaimed.
When pythonnet shuts down (see the stack trace at the beginning), ManagedType.TryFreeGCHandle() calls Runtime.PyObject_TYPE(), which tries to access the reclaimed memory space and causes the AccessViolationException.

The code below illustrates that the allocated memories are reused later. If the code is complicated enough to cause the heap to shrink, we can expect the exception to happen.

    def get_refcnt(pt):
        return f'({pt.X:.3f}, {pt.Y:.3f}), {len(gc.get_referrers(pt))}, {getrefcount(pt)}, {c_long.from_address(id(pt))}'

    # 10 random .net points
    pts = MyCSPoint.TenRandomPoints
    cnts = [get_refcnt(pt) for pt in pts]
    ids = [id(pt) for pt in pts]
    print(".Net Results")
    print("\n".join(cnts))
    print([c_long.from_address(id) for id in ids])

    # 10 random python points
    pts2 = [MyPyPoint(x,y) for (x,y) in zip(np.random.rand(10), np.random.rand(10))]
    cnts2 = [get_refcnt(pt) for pt in pts2]
    ids2 = [id(pt) for pt in pts2]
    print("Python Results")
    print("\n".join(cnts2))
    print([c_long.from_address(id) for id in ids2])

Here is the output. I am not sure what is the correct way to measure ref count of a python object created by the .net code, so I tried all 3 ways I am aware of. They return different results, but we can see the pattern here: when the objects are still in use, the .net ref count is 1 less than python ref count; when the objects are not in use, the .net ref count is kind of random (which indicates that the memory is reused).

.Net Results
...
(0.700, 0.924), 0, 4, c_long(3)
(0.459, 0.048), 0, 4, c_long(3)
(0.108, 0.517), 0, 4, c_long(3)
(0.683, 0.056), 0, 4, c_long(3)
[c_long(0), c_long(7), c_long(0), c_long(7), c_long(0), c_long(0), c_long(0), c_long(0), c_long(0), c_long(7)]

Python Results
...
(0.572, 0.850), 1, 5, c_long(4)
(0.130, 0.236), 1, 5, c_long(4)
(0.764, 0.736), 1, 5, c_long(4)
(0.406, 0.526), 1, 5, c_long(4)
[c_long(1), c_long(1), c_long(1), c_long(1), c_long(1), c_long(1), c_long(1), c_long(1), c_long(1), c_long(1)]

Possible fix? I tried to increase the ref count by 1 after the object is created.

    var py = Runtime.PyType_GenericAlloc(tp, 0);
    Runtime.XIncref(py.Borrow());

It seems to work (no more exceptions), but it also means that these allocated objects will stay in the heap forever. Experts, any suggestion for a more robust fix?

Frawak · 2024-05-07T13:44:58Z

I am also struggling with this issue. I am trying to dig into it. @duyang76 has the right idea that the problem is something in regards to the reference count but the solution

    var py = Runtime.PyType_GenericAlloc(tp, 0);
    Runtime.XIncref(py.Borrow());

is not correct because var py = Runtime.PyType_GenericAlloc(tp, 0); already increases the reference count for the object under tp.

However, in the method NullGCHandles(CLRObject.reflectedObjects) within the Shutdown (when the exception occurs) all references in CLRObject.reflectedObjects have a reference count of zero.
The question is: Why?

The NewReference which are created with that var py = Runtime.PyType_GenericAlloc(tp, 0); and whose IntPtr are added to CLRObject.reflectedObjects must have been disposed (along with all other references to the object) while these pointers remain.
And (this is my intepretation) because of the zero reference count the memory is opened up to be reclaimed.

EDIT: Basically related to this comment: #1977 (comment)

@lostmsu But the crux is that the exception is thrown while trying to null the GC Handles. Why is this done after the reference count is decremented to zero?

When nulling the GC handles on shutdown the reference count of all objects pointed to by the IntPtr in the `reflectedObjects` are zero. This caused an exception in some scenarios because `Runtime.PyObject_TYPE(reflectedClrObject)` is called while the reference counter is at zero, hence, not being guaranteed the Python object is still there and the memory not reclaimed. The solution presented is treating the pointer in `reflectedObjects` as strong references - incrementing the respective ref count adding to the set and decrementing removing from it or clearing the entire set (the latter is already done after nulling the GC Handles).

When nulling the GC handles on shutdown the reference count of all objects pointed to by the IntPtr in the `CLRObject.reflectedObjects` are zero. This caused an exception in some scenarios because `Runtime.PyObject_TYPE(reflectedClrObject)` is called while the reference counter is at zero. After `TypeManager.RemoveTypes();` is called in the `Runtime.Shutdown()` method, reference count decrements to zero do not invoke `ClassBase.tp_clear` for managed objects anymore which normally is responsible for removing references from `CLRObject.reflectedObjects`. Collecting objects referenced in `CLRObject.reflectedObjects` only after leads to an unstable state in which the reference count for these object addresses is zero while still maintaining them to be used for further pseudo-cleanup. In that time, the memory could have been reclaimed already which leads to the exception.

Frawak · 2024-10-11T12:43:03Z

This issue can be closed, right?

I tested 3.0.4 again.

Panzerfury · 2024-10-21T07:22:20Z

I still get:

Exit code is -1073741819 (Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at Python.Runtime.Runtime.PyErr_Clear()
at Python.Runtime.Exceptions.Clear()
at Python.Runtime.PythonEngine.Initialize(System.Collections.Generic.IEnumerable1<System.String>, Boolean, Boolean) at Python.Runtime.PythonEngine.Initialize(Boolean, Boolean) at Python.Runtime.PythonEngine.Initialize() at Program.<Main>$(System.String[]) at DynamicClass.InvokeStub_Program.<Main>$(System.Object, System.Span1<System.Object>)
at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
at System.Reflection.MethodBase.Invoke(System.Object, System.Object[])
at Microsoft.Extensions.Hosting.HostFactoryResolver+HostingListener.b__10_0())

In my tests of my C# web API.

I initialize my python in my program.cs
var path = builder.Configuration.GetSection("PythonDllLocation").Value;
Runtime.PythonDLL = path;
PythonEngine.Initialize();
PythonEngine.BeginAllowThreads();

And my OnShutDown(){
AppContext.SetSwitch("System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization", true);
PythonEngine.Shutdown();
AppContext.SetSwitch("System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization", false);
}

Each integrationtest seems to spin up a program.cs on it's own (i didn't write the test suite).

Before I had my Initialize and shutdown in my service class. Which seemed to work fine. But it was rather slow, having to spin up the python engine on each request.

Frawak · 2024-10-21T07:33:36Z

That callstack is on initialization. I would suggest a separate issue.

rmartin16 mentioned this issue Dec 3, 2022

Windows: briefcase dev crashes on shutdown beeware/briefcase#987

Closed

freakboy3742 mentioned this issue Dec 4, 2022

Disable the debug malloc on Windows. beeware/briefcase#988

Merged

4 tasks

Frawak mentioned this issue May 8, 2024

Fix access violation exception on shutdown (#1977) #2380

Closed

5 tasks

Frawak mentioned this issue May 13, 2024

Fix access violation exception on shutdown #2386

Merged

5 tasks

CodeTex mentioned this issue Jul 30, 2024

System.AccessViolationException with Hypothesis 6.108.x HypothesisWorks/hypothesis#4055

Closed

Random Access Violations on shutdown #1977

Random Access Violations on shutdown #1977

Comments

rohanjain101 commented Oct 18, 2022 • edited Loading

Environment

Details

filmor commented Oct 18, 2022

rohanjain101 commented Oct 18, 2022 • edited Loading

filmor commented Oct 18, 2022

rohanjain101 commented Oct 18, 2022

rohanjain101 commented Oct 19, 2022

donno commented Dec 1, 2022

rohanjain101 commented Dec 1, 2022

donno commented Dec 2, 2022

filmor commented Dec 2, 2022

donno commented Dec 3, 2022

maxcapodi78 commented Dec 6, 2022

maxcapodi78 commented Dec 6, 2022

filmor commented Dec 6, 2022

rohanjain101 commented Jan 6, 2023

jgmarcel commented Feb 7, 2023

rohanjain101 commented Mar 10, 2023

jgmarcel commented May 5, 2023 • edited Loading

lostmsu commented May 5, 2023

jgmarcel commented May 5, 2023

jgmarcel commented May 5, 2023

filmor commented May 6, 2023

jgmarcel commented May 8, 2023

dgrunwald commented May 11, 2023

siegfriedpammer commented May 15, 2023

rohanjain101 commented May 18, 2023

siegfriedpammer commented May 19, 2023

johnpp143 commented Jun 2, 2023

lostmsu commented Jun 2, 2023 • edited Loading

dgrunwald commented Jun 3, 2023

rohanjain101 commented Jun 3, 2023 • edited Loading

siegfriedpammer commented Jun 3, 2023

filmor commented Jun 4, 2023

agoodwin commented Sep 6, 2023

vreindelESZ commented Sep 25, 2023 • edited Loading

doctormachine commented Oct 31, 2023 • edited Loading

duyang76 commented Mar 6, 2024

Frawak commented May 7, 2024 • edited Loading

Frawak commented Oct 11, 2024

Panzerfury commented Oct 21, 2024

Frawak commented Oct 21, 2024

rohanjain101 commented Oct 18, 2022 •

edited

Loading

rohanjain101 commented Oct 18, 2022 •

edited

Loading

jgmarcel commented May 5, 2023 •

edited

Loading

lostmsu commented Jun 2, 2023 •

edited

Loading

rohanjain101 commented Jun 3, 2023 •

edited

Loading

vreindelESZ commented Sep 25, 2023 •

edited

Loading

doctormachine commented Oct 31, 2023 •

edited

Loading

Frawak commented May 7, 2024 •

edited

Loading