Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler_rt: Port UBSan mini-runtime #5165

Closed
wants to merge 1 commit into from
Closed

Conversation

LemonBoy
Copy link
Contributor

This allows the user to know what went wrong whenever the UBSan-inserted
checks fail.

Consider this as a working MVP, there's no de-duplication as in the original miniubsan and the dependency on the stdlib may be too much.

We can do much better than trapping.

Addresses #5163

@daurnimator daurnimator added standard library This issue involves writing Zig code for the standard library. zig cc Zig as a drop-in C compiler feature labels Apr 25, 2020
@daurnimator
Copy link
Contributor

Need to make sure this doesn't end up breaking on targets where the zig standard library's panic isn't done yet

This allows the user to know what went wrong whenever the UBSan-inserted
checks fail.
@andrewrk
Copy link
Member

The problem with depending on the std lib is that compiler_rt.a and the application are in different compilation units. So all the stdlib stuff (including the mutex protecting stderr) are duplicated. Big waste of bytes, and potentially a race with writing to stderr.

That's where @panic comes in. @panic is designed to cross object file boundaries, so if you only use that, you will correctly integrate with the panic handler of the zig application.

A problem we have to solve in order to merge this is that there are two different use cases:

  • "main" is defined by C. Zig code, if any, is only a library. Possibly there is no zig code and zig cc is being used to compile C.
  • "main" is defined by zig code, and therefore we have a default panic handler which is possibly overridden.

The latter case is easy; compiler-rt uses @panic and everything works perfectly. C code that hits UBSAN will print a nice message and then the normal stack trace code takes over.

The former case is a bit trickier. How much debug info code should compiler-rt include to make UBSAN errors nicer in C code? On one side of the scale we have -fsanitize-undefined=trap which is very minimal. Illegal Instruction will happen for undefined behavior. Honestly I'm fine with this. But it's understandable to want more. On the other side of the scale we have the option to bloat compiler-rt with debug info capabilities and print a full stack trace for UB in C code. Maybe that's desirable!

I think the key here is to identify which version of compiler-rt we want. If we can identify whether we are in the former or latter case, it will inform which compiler-rt to generate. Zig is uniquely positioned to generate compiler-rt tailored for specific use cases like this.

So here's my proposal to move forward on this:

Add detection of which "compiler-rt mode" is desirable: zig-as-the-main-application tailored compiler-rt or C-as-the-main-application tailored compiler-rt. Float this mode down into compiler-rt somehow so it can make a comptime branch on that value. Depending on the mode, call @panic or std.debug.panic() (which dumps a stack trace)

I believe what this pull request currently does is unconditionally assume the C-as-the-main-application tailored compiler-rt.

@andrewrk
Copy link
Member

Depending on the mode, call @panic or std.debug.panic() (which dumps a stack trace)

Follow-up: the UBSAN handling code could unconditionally call @panic and we should probably regardless update compiler-rt's panic function:

// Avoid dragging in the runtime safety mechanisms into this .o file,
// unless we're trying to test this file.
pub fn panic(msg: []const u8, error_return_trace: ?*builtin.StackTrace) noreturn {
    @setCold(true);
    if (is_test) {
        std.debug.panic("{}", .{msg});
    } else {
        unreachable;
    }
}

Regardless of this PR, this should probably be changed to trap rather than unreachable. Anyway, here is where the comptime logic would go - it would be if (is_test or is_tailored_for_c_main)

@LemonBoy
Copy link
Contributor Author

It's slightly more complex than that as we also have to provide the "recovery" handlers that only record/print the problem so panicking doesn't really work there.

@andrewrk
Copy link
Member

Do we though? Panic is noreturn so we could simply omit the recovery mechanisms and crash. This seems like a perfectly acceptable way to handle undefined behavior to me.

@LemonBoy
Copy link
Contributor Author

Do we though?

Yes, the contract between clang and the compiler_rt implementation is pretty clear and you need to export both the aborting and non-aborting variants to allow the compiler to respect the -fno-sanitize-recover/-fsanitize switches.

If yu're not interested in providing a complete minimal ubsan runtime then we can add -fno-sanitize-recover=undefined and get away without the recovering handlers.

@justinbalexander
Copy link
Contributor

A feature I wish the current version of GCC had for ubsan is that the version that calls __builtin_trap on detection of undefined behavior would at least call a function that would pass an enum to indicate what the failure is.

We are on a bare-metal target so we are using the ubsan runtime but not linking the library. We define a handler with just a breakpoint instruction for each of the potential handlers. The app on the debugger stops immediately which is nice, and I can go up the callstack to find the line where the offense occurred.

The issue is that the binary gets bloated a bit extra from the unnecessary setting up of the arguments to the ubsan runtime functions, which we completely ignore. Space is premium in embedded obviously, so this is sub-optimal. We have 16MB of flash space on this target which is extremely unusual for us or else we wouldn't be able to use ubsan at all.

An alternative would be to just call the different abort handlers without passing anything on failure. Really I just need the name of the function to provide the hint as to what the issue was.

@andrewrk
Copy link
Member

I think this is an important idea to explore, but it's not ready to merge as is. Closing for now, with the understanding that it can be re-opened in the future whenever you have time to work on it.

@moosichu
Copy link
Contributor

@LemonBoy is it OK if I look at picking this up using your code as a basis? :)

@LemonBoy
Copy link
Contributor Author

You have my blessing, cc me if you want a review or opinion on something.

@franciscod
Copy link

franciscod commented Feb 13, 2022

commit 7c9c37c12ec5f726e8298c47bac0e699b2880f49
Author: Francisco Demartino <[email protected]>
Date:   Sat Feb 12 22:07:55 2022 -0300

    update to zig 0.10.0-dev.85+c0ae9647f

diff --git a/zigubsan/ubsan.zig b/zigubsan/ubsan.zig
index 07acdd35..9495ab0e 100644
--- a/zigubsan/ubsan.zig
+++ b/zigubsan/ubsan.zig
@@ -19,7 +19,7 @@
 // Runtime support for Clang's Undefined Behavior sanitizer
 const std = @import("std");
 const ubsan_value = @import("ubsan_value.zig");
-const builtin = std.builtin;
+const builtin = @import("builtin");
 
 // for setting breakpoints with gdb / lldb
 export fn ubsanbreak() void {}
@@ -30,10 +30,10 @@ export fn ubsanbreak() void {}
 fn makeHandler(comptime error_msg: []const u8) type {
     return struct {
         pub fn recover_handler() callconv(.C) void {
-            std.debug.warn("ubsan: " ++ error_msg, .{});
+            std.debug.print("ubsan: " ++ error_msg, .{});
         }
         pub fn abort_handler() callconv(.C) noreturn {
-            std.debug.warn("ubsan: " ++ error_msg, .{});
+            std.debug.print("ubsan: " ++ error_msg, .{});
             // was panic here instead, but panicked while panicking. let's add a silly hook for breaking
             ubsanbreak();
             std.os.exit(234);
@@ -49,12 +49,12 @@ const OverflowData = extern struct {
 const DynamicTypeCacheMissData = extern struct {
     source_location: ubsan_value.SourceLocation,
     type_descriptor: *const ubsan_value.TypeDescriptor,
-    type_info: ?*c_void,
+    type_info: ?*opaque{},
     type_check_kind: u8,
 };
 
 fn exportHandlers(comptime handlers: anytype, comptime export_name: []const u8) void {
-    const linkage: builtin.GlobalLinkage = if (std.builtin.is_test) .Internal else .Weak;
+    const linkage: std.builtin.GlobalLinkage = if (builtin.is_test) .Internal else .Weak;
     {
         const handler_symbol = std.builtin.ExportOptions{ .name = "__ubsan_" ++ export_name, .linkage = linkage };
         @export(handlers.recover_handler, handler_symbol);
@@ -66,7 +66,7 @@ fn exportHandlers(comptime handlers: anytype, comptime export_name: []const u8)
 }
 
 const abort_log = std.debug.panic;
-const warn_log = std.debug.warn;
+const warn_log = std.debug.print;
 
 fn handleOverflow(comptime report_log: anytype, comptime name: []const u8, comptime symbol: []const u8, overflow_data: *OverflowData, lhs: ubsan_value.ValueHandle, rhs: ubsan_value.ValueHandle) void {
     const source_location = overflow_data.source_location.acquire();
@@ -174,7 +174,7 @@ export fn __ubsan_handle_dynamic_type_cache_miss_abort(dynamic_type_cache_miss_d
 
 export fn __ubsan_handle_dynamic_type_cache_miss(dynamic_type_cache_miss_data: *DynamicTypeCacheMissData, pointer: ubsan_value.ValueHandle, hash: ubsan_value.ValueHandle) callconv(.C) void {
     if (handleDynamicTypeCacheMiss(dynamic_type_cache_miss_data, pointer, hash)) {
-        std.debug.warn("ubsan: " ++ "handle_dynamic_type_cache_miss", .{});
+        std.debug.print("ubsan: " ++ "handle_dynamic_type_cache_miss", .{});
     }
 }
 
@@ -220,7 +220,7 @@ comptime {
         .{ "handle_add_overflow_abort", "add-overflow-abort", .Abort, .Full },
     };
 
-    const linkage: builtin.GlobalLinkage = if (std.builtin.is_test) .Internal else .Weak;
+    const linkage: std.builtin.GlobalLinkage = if (builtin.is_test) .Internal else .Weak;
 
     inline for (HANDLERS) |entry| {
         const handler = makeHandler(entry[1]);
diff --git a/zigubsan/ubsan_value.zig b/zigubsan/ubsan_value.zig
index 4d01e12c..16ba6867 100644
--- a/zigubsan/ubsan_value.zig
+++ b/zigubsan/ubsan_value.zig
@@ -54,7 +54,7 @@ pub const TypeDescriptor = extern struct {
 
     pub fn getFloatSize(type_descriptor: *const TypeDescriptor) u64 {
         if (type_descriptor.kind != .float) unreachable;
-        return info;
+        return type_descriptor.info;
     }
 
     fn getIntValueBits(type_descriptor: *const TypeDescriptor, value_handle: ValueHandle) u128 {
@@ -89,4 +89,4 @@ pub const TypeDescriptor = extern struct {
 // If the type is small enough - its value is stored in
 // in the ValueHandle pointer, otherwise it is what the
 //  the ValueHandle points at.
-pub const ValueHandle = *c_void;
+pub const ValueHandle = *opaque{};

@franciscod
Copy link

commit 8d01d9ca4566322b16b0d3bcb4c33d0ac4fa0806
Author: Francisco Demartino <[email protected]>
Date:   Thu Mar 16 22:06:06 2023 -0300

    fix zig ubsan for 0.10.1

diff --git a/zigubsan/ubsan.zig b/zigubsan/ubsan.zig
index 9495ab0e..ec0026de 100644
--- a/zigubsan/ubsan.zig
+++ b/zigubsan/ubsan.zig
@@ -72,18 +72,20 @@ fn handleOverflow(comptime report_log: anytype, comptime name: []const u8, compt
     const source_location = overflow_data.source_location.acquire();
     const format_string = "ubsan: " ++ name ++ " Overflow: {} " ++ symbol ++ " {} in type {s}\n{s}:{}:{}\n";
     const type_name = overflow_data.type_descriptor.getNameAsString();
-
-    if (overflow_data.type_descriptor.isSignedInteger()) {
-        const lhs_value = overflow_data.type_descriptor.getSignedIntValue(lhs);
-        const rhs_value = overflow_data.type_descriptor.getSignedIntValue(rhs);
-        report_log(format_string, .{ lhs_value, rhs_value, type_name, source_location.file_name, source_location.line, source_location.column });
-    } else {
-        const lhs_value = overflow_data.type_descriptor.getUnsignedIntValue(lhs);
-        const rhs_value = overflow_data.type_descriptor.getUnsignedIntValue(rhs);
-        report_log(format_string, .{ lhs_value, rhs_value, type_name, source_location.file_name, source_location.line, source_location.column });
+    if (source_location.file_name) |file_name| {
+        if (overflow_data.type_descriptor.isSignedInteger()) {
+            const lhs_value = overflow_data.type_descriptor.getSignedIntValue(lhs);
+            const rhs_value = overflow_data.type_descriptor.getSignedIntValue(rhs);
+            report_log(format_string, .{ lhs_value, rhs_value, type_name, file_name, source_location.line, source_location.column });
+        } else {
+            const lhs_value = overflow_data.type_descriptor.getUnsignedIntValue(lhs);
+            const rhs_value = overflow_data.type_descriptor.getUnsignedIntValue(rhs);
+            report_log(format_string, .{ lhs_value, rhs_value, type_name, file_name, source_location.line, source_location.column });
+        }
     }
 }

+
 fn makeOverflowHandler(comptime export_name: []const u8, comptime name: []const u8, comptime symbol: []const u8) void {
     const handlers = struct {
         pub fn recover_handler(overflow_data: *OverflowData, lhs: ubsan_value.ValueHandle, rhs: ubsan_value.ValueHandle) callconv(.C) void {
@@ -107,12 +109,14 @@ fn handleNegateOverflow(comptime report_log: anytype, overflow_data: *OverflowDa
     const format_string = "ubsan: Negation of {} cannot be represented in type {s}\n{s}:{}:{}\n";
     const type_name = overflow_data.type_descriptor.getNameAsString();

-    if (overflow_data.type_descriptor.isSignedInteger()) {
-        const int_value = overflow_data.type_descriptor.getSignedIntValue(value);
-        report_log(format_string, .{ int_value, type_name, source_location.file_name, source_location.line, source_location.column });
-    } else {
-        const int_value = overflow_data.type_descriptor.getUnsignedIntValue(value);
-        report_log(format_string, .{ int_value, type_name, source_location.file_name, source_location.line, source_location.column });
+    if (source_location.file_name) |file_name| {
+        if (overflow_data.type_descriptor.isSignedInteger()) {
+            const int_value = overflow_data.type_descriptor.getSignedIntValue(value);
+            report_log(format_string, .{ int_value, type_name, file_name, source_location.line, source_location.column });
+        } else {
+            const int_value = overflow_data.type_descriptor.getUnsignedIntValue(value);
+            report_log(format_string, .{ int_value, type_name, file_name, source_location.line, source_location.column });
+        }
     }
 }

@@ -130,19 +134,22 @@ comptime {

 fn handleDivremOverflow(comptime report_log: anytype, overflow_data: *OverflowData, lhs: ubsan_value.ValueHandle, rhs: ubsan_value.ValueHandle) void {
     const source_location = overflow_data.source_location.acquire();
-    if (overflow_data.type_descriptor.isSignedInteger() and (overflow_data.type_descriptor.getSignedIntValue(rhs) == -1)) {
-        const format_string = "ubsan: Divsion of {} by -1 cannot be represented in type {s}\n{s}:{}:{}\n";
-        const type_name = overflow_data.type_descriptor.getNameAsString();
-        const int_value = overflow_data.type_descriptor.getSignedIntValue(lhs);
-        report_log(format_string, .{ int_value, type_name, source_location.file_name, source_location.line, source_location.column });
-    } else switch (overflow_data.type_descriptor.kind) {
-        .float, .integer => {
-            const format_string = "ubsan: Division by 0\n{s}:{}:{}\n";
-            report_log(format_string, .{ source_location.file_name, source_location.line, source_location.column });
-        },
-        .unknown => {
-            unreachable;
-        },
+
+    if (source_location.file_name) |file_name| {
+        if (overflow_data.type_descriptor.isSignedInteger() and (overflow_data.type_descriptor.getSignedIntValue(rhs) == -1)) {
+            const format_string = "ubsan: Divsion of {} by -1 cannot be represented in type {s}\n{s}:{}:{}\n";
+            const type_name = overflow_data.type_descriptor.getNameAsString();
+            const int_value = overflow_data.type_descriptor.getSignedIntValue(lhs);
+            report_log(format_string, .{ int_value, type_name, file_name, source_location.line, source_location.column });
+        } else switch (overflow_data.type_descriptor.kind) {
+            .float, .integer => {
+                const format_string = "ubsan: Division by 0\n{s}:{}:{}\n";
+                report_log(format_string, .{ file_name, source_location.line, source_location.column });
+            },
+            .unknown => {
+                unreachable;
+            },
+        }
     }
 }

@franciscod
Copy link

gah, the diffs are for this (kinda related) gist: https://gist.github.com/moosichu/ee3dd8407653640a36fb9625ac586082

@franciscod
Copy link

commit 5389c0a3be05c52df56eb18ab4f3d9984a4e6e02
Author: Francisco Demartino <[email protected]>
Date:   Tue Sep 5 22:54:47 2023 -0300

    fix zig ubsan for 0.11.0

diff --git a/src/ubsan_value.zig b/src/ubsan_value.zig
index 0133e60f..769b2f25 100644
--- a/src/ubsan_value.zig
+++ b/src/ubsan_value.zig
@@ -40,7 +40,7 @@ pub const TypeDescriptor = extern struct {
     name: [1]u8,
 
     pub fn getNameAsString(type_descriptor: *const TypeDescriptor) [*:0]const u8 {
-        return @ptrCast([*:0]const u8, &type_descriptor.name);
+        return @as([*:0]const u8, @ptrCast(&type_descriptor.name));
     }
 
     pub fn isSignedInteger(type_descriptor: *const TypeDescriptor) bool {
@@ -50,7 +50,7 @@ pub const TypeDescriptor = extern struct {
     pub fn getIntegerSize(type_descriptor: *const TypeDescriptor) u64 {
         if (type_descriptor.kind != .integer) unreachable;
         // Bit-shift the signed value away and then get 2^n out
-        return @as(u64, 1) << @intCast(u6, type_descriptor.info >> 1);
+        return @as(u64, 1) << @as(u6, @intCast(type_descriptor.info >> 1));
     }
 
     pub fn getFloatSize(type_descriptor: *const TypeDescriptor) u64 {
@@ -63,12 +63,12 @@ pub const TypeDescriptor = extern struct {
         const size = type_descriptor.getIntegerSize();
         const max_inline_size = @sizeOf(ValueHandle) * 8;
         if (size <= max_inline_size) {
-            return @ptrToInt(value_handle);
+            return @intFromPtr(value_handle);
         } else {
             if (size == 64) {
-                return @ptrCast(*u64, @alignCast(8, value_handle)).*;
+                return @as(*u64, @alignCast(@ptrCast(value_handle))).*;
             } else if (size == 128) {
-                return @ptrCast(*u128, @alignCast(16, value_handle)).*;
+                return @as(*u128, @alignCast(@ptrCast(value_handle))).*;
             } else {
                 unreachable;
             }
@@ -77,7 +77,7 @@ pub const TypeDescriptor = extern struct {
 
     pub fn getSignedIntValue(type_descriptor: *const TypeDescriptor, value_handle: ValueHandle) i128 {
         if (!type_descriptor.isSignedInteger()) unreachable;
-        return @bitCast(i128, type_descriptor.getIntValueBits(value_handle));
+        return @as(i128, @bitCast(type_descriptor.getIntValueBits(value_handle)));
     }
 
     pub fn getUnsignedIntValue(type_descriptor: *const TypeDescriptor, value_handle: ValueHandle) u128 {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library This issue involves writing Zig code for the standard library. zig cc Zig as a drop-in C compiler feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants