Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retain exported NSRunningApplication reference to avoid autorelease #973

Closed

Conversation

donaldguy
Copy link

@donaldguy donaldguy commented Jul 28, 2021

Per discussion/thinking outloud in #967 (#967 (comment) ) with nod from @koekeishiya – Added explicit call to retain before returning to C++:

https://developer.apple.com/documentation/objectivec/1418956-nsobject/1571946-retain

this seems to be a better way to fix #920 and possibly other crashes in Monterey (and who knows, maybe even earlier); and probably should have been in 8c9bbbe in the first place

I built this (on xorpse/master) and can open and close application without any crash so far on Monterey beta 3 (21A5294g) *knock on wood*

(neither the immediate crashes I was seeing on application launch before, nor the GC-triggered crash as in #923 (comment) )


If this were not needed it would probably introduce a memory leak.

To check I set an lldb breakpoint at

[application release];

Once triggered, I found that:

(lldb) p [application retainCount]
(unsigned long long) $3 = 1

and

(lldb) p (NSObject*)[ application release]
(NSObject *) $5 = nil
(lldb) p (NSObject*)[ application release]
error: Execution was interrupted, reason: EXC_BREAKPOINT (code=1, subcode=0x18d284930).
The process has been returned to the state before expression evaluation.

so seems good to me

@donaldguy
Copy link
Author

Whoops; rebasing

@donaldguy donaldguy force-pushed the ns_application_retain branch from d4a78d5 to db3e811 Compare July 28, 2021 22:14
@donaldguy donaldguy changed the title Ns application retain retain exported NSRunningApplication reference to avoid autorelease Jul 28, 2021
@donaldguy
Copy link
Author

I did a build of literally db3e811 on my older mac (x86_64 / Big Sur) - and it, in fact, behaves this way at the same breakpoint:

Target 0: (yabai) stopped.
(lldb) p [application retainCount]
(unsigned long long) $0 = 2
(lldb) p [application retainCount]
(unsigned long long) $1 = 2
(lldb) p (NSObject*)[application release]
(NSObject *) $2 = 0x021dffff80155239
(lldb) p [application retainCount]
(unsigned long long) $3 = 1
(lldb) p (NSObject*)[application release]
(NSObject *) $4 = 0x0000000000005837
(lldb) p [application retainCount]
(unsigned long long) $5 = 0
(lldb) p (NSObject*)[application release]
(NSObject *) $6 = 0x001dffff80155239
(lldb) p [application retainCount]
(unsigned long long) $7 = 0
(lldb) p [application retainCount]
(unsigned long long) $8 = 0

@donaldguy
Copy link
Author

donaldguy commented Jul 29, 2021

I built db3e811 (with #include <CoreGraphics/CoreGraphics.h> added to src/osax/payload.m)
on the M1 (on Monterey)

and the lldb outpoint at the breakpoint matches what I showed in description (so, i.e. it is not a difference between xorpse/ and koekeishiya/)

so it does seem like behavior on return or at least ~the eagerness of autorelease has changed (and that once its been released, call through the pointers are no longer valid)

@donaldguy
Copy link
Author

After skimming through https://releases.llvm.org/5.0.1/tools/clang/docs/AutomaticReferenceCounting.html

I thought that

diff --git a/src/workspace.m b/src/workspace.m
index 6862a8b..eb386ae 100644
--- a/src/workspace.m
+++ b/src/workspace.m
@@ -20,9 +20,9 @@ void workspace_event_handler_end(void *context)
     [ws_context dealloc];
 }
 
-void *workspace_application_create_running_ns_application(struct process *process)
+void * __strong workspace_application_create_running_ns_application(struct process *process)
 {
-    return [[NSRunningApplication runningApplicationWithProcessIdentifier:process->pid] retain];
+    return [NSRunningApplication runningApplicationWithProcessIdentifier:process->pid];
 }
 
 void workspace_application_destroy_running_ns_application(void *ws_context, struct process *process)

might be good enough but it seems to still lead to crashes on (other) application start on Monterey

@donaldguy
Copy link
Author

donaldguy commented Jul 29, 2021

Of possible note, in the ARC docs I'm looking at: https://clang.llvm.org/docs/AutomaticReferenceCounting.html#general

there is discussion of a "create/copy naming convention" (which I believe this method comports to) which confers an expectation of being an unretained return

That just seems to me like the sort of thing that might have changed / been relied upon accidentally

@donaldguy
Copy link
Author

donaldguy commented Jul 29, 2021

I'd think maybe we want something like

diff --git a/src/workspace.h b/src/workspace.h
index 784483e..8001b58 100644
--- a/src/workspace.h
+++ b/src/workspace.h
@@ -11,7 +11,7 @@ void workspace_event_handler_begin(void **context);
 void workspace_event_handler_end(void *context);
 
 struct process;
-void *workspace_application_create_running_ns_application(struct process *process);
+void *workspace_application_create_running_ns_application(struct process *process) __attribute__((cf_returns_retained));
 void workspace_application_destroy_running_ns_application(void *context, struct process *process);
 bool workspace_application_is_observable(struct process *process);
 bool workspace_application_is_finished_launching(struct process *process);
diff --git a/src/workspace.m b/src/workspace.m
index 6862a8b..9af529a 100644
--- a/src/workspace.m
+++ b/src/workspace.m
@@ -20,7 +20,8 @@ void workspace_event_handler_end(void *context)
     [ws_context dealloc];
 }
 
-void *workspace_application_create_running_ns_application(struct process *process)
+__attribute__((cf_returns_retained))
+void * workspace_application_create_running_ns_application(struct process *process)
 {
     return [[NSRunningApplication runningApplicationWithProcessIdentifier:process->pid] retain];
 }

to specify the semantics to be consistent across Monterey and Big Sur (and earlier) but I have yet to figure out the combination of specifying cf_returns_retained or cf_returns_not_retained and calling or not calling retain that creates consistent retainCount across the new archs & OS versions.

Doing (__bridge_retained void *) issued a warning:

src/workspace.m:25:13: warning: '__bridge_retained' casts have no effect when not using ARC [-Warc-bridge-casts-disallowed-in-nonarc]

which ... thanks for the warning? ; I'm not sure if I actually tried that version or not.

it may not really matter - especially if we replace the release at workspace.m:60 with autorelease


Before I got into any of this, my inclination was to replace the call to release in workspace_application_destroy_running_ns_application with like a for loop counting up to [application retainCount] doing release calls ; but the clang docs above make me worry calling retainCount is considered an illegal operation?

@donaldguy
Copy link
Author

Except maybe that warning is right and none of this code is covered by ARC (on Monterey) in general?

(That being separate I guess from whether there are autorelease pools)

cause then maybe we wouldn't be allowed to call retain or release at all?

I'm getting more confused and am probs gonna call it a night

@koekeishiya
Copy link
Owner

ARC is not a thing in this project, although it is not explicitly disabled in the makefile (I'll do that). Autoreleasepools are entirely separate, but since this project is very minimal in the amount of obj-c that is used, I could probably just swizzle away [NSObject autorelease] to disable that, and release these references manually when appropriate.

I am not particularly fond of automatic (implicit) memory management for many reasons, but I am not going to start a debate/discussion around this. This project would be C only if Apple actually exposed all the necessary functionality that way, but unfortunately they do not.

@donaldguy
Copy link
Author

I don't have any interest in debating this here either; I was just trying to figure out why (with this PR applied) [application retainCount] is 2 when I breakpoint in the destroy on Big Sur / x86_64 (but 1 on Monterey / arm64)

Cause this PR does seem to fix my crashes on Monterey, but that 2 seems to suggest it'll cause a memory leak on other targets.

Since there isn't another explicit call to retain (that I've seen), I assumed ARC must be involved. But it might actually be a more limited set of implicit retain-injecting semantics that specially treat methods containing create (on Big Sur but not Monterey?) - or it could just be a change in what's returned from the constructor proper

@koekeishiya
Copy link
Owner

As you said I am not sure either why the behaviour appears to be different between Big Sur and Monterey, however, I am fairly certain that the correct behaviour is to call retain, also in the Big Sur case. The issue you experience when inspecting the retain count has to be related to when/how the autoreleasepool for the main thread is drained. You could probably run in the debugger and try to break at the autoreleasepool drain routine (or something). You could also break at NSObject autorelease to verify.

As you have mentioned before there are naming conventions; particularly functions named Create, Copy, Alloc requires an explicit free by the caller. The particular function we are talking about, runningApplicationWithProcessIdentifier, by the looks of it does rely on the autorelease pool, in the same way NSString stringWithFormat does.

I'll take a deeper look at this myself when I feel like doing some programming in my spare time, but might be a while before that happens.

@donaldguy
Copy link
Author

on my local clone of my fork, I've actually been looking at, instead/additionally like ...

diff --git a/src/event.c b/src/event.c
index 8998bf7..0c11dda 100644
--- a/src/event.c
+++ b/src/event.c
@@ -111,6 +111,10 @@ static EVENT_CALLBACK(EVENT_HANDLER_APPLICATION_LAUNCHED)
         }
     }
 
+    //we are done depending on workspace, and its better to clear the observers than let them fire again
+    //if, somehow, e.g. activation policy changes again
+    workspace_application_destroy_running_ns_application(g_workspace_context, process);
+
     return EVENT_SUCCESS;
 }
 
diff --git a/src/process_manager.c b/src/process_manager.c
index c7dca18..dab808c 100644
--- a/src/process_manager.c
+++ b/src/process_manager.c
@@ -41,7 +41,6 @@ struct process *process_create(ProcessSerialNumber psn)
 
 void process_destroy(struct process *process)
 {
-    workspace_application_destroy_running_ns_application(g_workspace_context, process);
     free(process->name);
     free(process);
 }

as I think the observation is really only interesting as long as the application launched event is being processed. I think we may actually want the reference to be as short lived as possible to avoid ~use after free for short lived windows (owned indirectly by this object? it shows up with window server crashing in InvalidateShadow still... I think)

@koekeishiya
Copy link
Owner

I've done a thorough investigation by hooking related calls to autorelease and drain/release calls and made appropriate changes to various threads that should resolve this.

@donaldguy donaldguy deleted the ns_application_retain branch October 25, 2023 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Yabai is crashing/restarting constantly on macOS 12
2 participants