Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detours-related crashes #22

Closed
vintagepc opened this issue Jun 23, 2021 · 11 comments
Closed

Detours-related crashes #22

vintagepc opened this issue Jun 23, 2021 · 11 comments

Comments

@vintagepc
Copy link

vintagepc commented Jun 23, 2021

Not sure if this is related to #11 but I too am seeing segfaults both on Centos 7 and 8 relating to Detours. Some systems will do it consistently, others (with the same set of srcds server files copied via rsync -a) will have no issues whatsoever.

This is on TF2Classic (SDK2013) with the TF2C.smx plugin. (Additional discussion here: Scags/TF2Classic-Tools#6 )

Disabling all DHookEnableDetour calls in the plugin stops the crashes from happening, and changing which ones I leave enabled or comment out changes the crash address consistently.

There's a linked crash ID there and I can provide more if they would be helpful.

Any thoughts? Observed with SM1.10 and 11, 2.2.0 detours 14a thru 16.

@peace-maker
Copy link
Owner

That crash is interesting. It tries to execute the detour trampoline on the heap, but that part of the heap isn't executable. It should be though, so maybe it's some flaw with setting the memory permission flags on Centos.

void SetMemPatchable(void* pAddr, size_t size)
{
#if defined __linux__
mprotect((void *) ALIGN(pAddr), sysconf(_SC_PAGESIZE), PAGE_EXECUTE_READWRITE);
#elif defined _WIN32
DWORD old_prot;
VirtualProtect(pAddr, size, PAGE_EXECUTE_READWRITE, &old_prot);
#endif
}

The first thing to try is using the size parameter, but 0x0bebec40 isn't right at a page boundary, so this bug doesn't seem relevant to your crash. The memory maps in the breakpad dump of the last crash ID you posted in that other thread show that this address is right inside the rw- heap. Not sure why it's not executable.

SourceHook does
https://github.com/alliedmodders/metamod-source/blob/80d3f9c14d5a78c3e93b9822e983eace41981d4f/core/sourcehook/sh_memory.h#L195

@vintagepc
Copy link
Author

vintagepc commented Jun 23, 2021

Interesting. Here's a few more dumps with different IDs and different fault addresses while I was experimenting with enabling/disabling different combos of detours. I'm guessing it's the same underlying issue?

VHIS-MTG5-OIN2
JNID-X2YW-2BWB

The extra weird thing is the systems that work fine have not (to my knowledge) been configured differently from those that do not. Just applying the provider's Centos image (same provider for all hosts), install required libraries for SRCDS, and then deploy the server instance.

@peace-maker
Copy link
Owner

Do you have ssh access to the Centos instance? What does getconf PAGESIZE on the commandline give you? I'd expect 4096.

@vintagepc
Copy link
Author

vintagepc commented Jun 23, 2021

Yes, I have full control.

addons]$ getconf PAGESIZE
4096

(both for working and non-working instances)

@peace-maker
Copy link
Owner

I've added a fail check to the mprotect call. Can you try this build and monitor the server console during boot? It's compiled against SM1.10.

dhooks.ext.zip

@vintagepc
Copy link
Author

Sure thing, I will do that after I finish work.

@peace-maker
Copy link
Owner

After looking at the logic again, there is no good reason to use the heap at all in this code. It's left over from the original DynamicHooks library, but since we use SourcePawn's allocator for all other code chunks already, we can use it for the trampoline as well.

This should avoid your issue all together. Please have a try with this version, it contains the previous mprotect+perror check as well.
dhooks.ext.zip

@peace-maker
Copy link
Owner

It might still be interesting to see if the first build prints any errors of mprotect failing on heap addresses, so would be great if you could test both builds even though I'm pretty sure the last one fixes the issue.

@vintagepc
Copy link
Author

🔔 🔔 🔔 WE HAVE A WINNER

mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied

followed by the expected crash on player connect.

Second plugin -
Looks good. I am running a custom version of the offending plugin with extra logging in its detours. For any given detour pair , e.g:

		DHookEnableDetour(hook, false, CTFPlayerShared_RemoveCond);
		DHookEnableDetour(hook, true, CTFPlayerShared_RemoveCondPost);	

I would previously see the primary detour get called. Post detours were never getting called and I would see one, maybe two before the crash occurred.

With the fixed version, the Post detour is firing correctly and I had a few pages of log lines entering/exiting the function go by without any issues.

I'll leave it running with some bots for a bit to see if anything else crops up.

Thanks for getting this turned around so quickly!

@peace-maker
Copy link
Owner

Great! 🎉
Do you use SELinux by any chance? Check the output of getsebool allow_execheap which might be your offender. You should leave it disallowed, but this might be a good thing to check for in your audit logs for other crashes you experience in the future.

@vintagepc
Copy link
Author

selinuxuser_execheap --> off for the problematic instances.

We no longer have the "working" VPS to verify if it was indeed allowed or disabled there though.

That's probably it. Very strange that by default it would be enabled on one VPS deploy and not another from the same provider though. Ah well shrug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants