__builtin_object_size() does not track object sizes across assignment #55742

kees · 2022-05-27T22:03:13Z

The information used to examine object sizes is lost across assignment:

struct something {
    int a;
    int c[4];
    int b;
};

unsigned char *p = &instance->c[1];

__builtin_object_size(p, 1) should equal 12, but instead is -1. (__bos(&instance->c[1], 1) correctly returns 12.)

This breaks FORTIFY_SOURCE protection as buffers that should be visible become trivially obfuscated, and is a regression compared to GCC.

https://godbolt.org/z/4Pozfe9bY

The text was updated successfully, but these errors were encountered:

kees · 2022-05-27T22:03:52Z

cc @nickdesaulniers @gwelymernans

efriedma-quic · 2022-05-27T22:55:12Z

clang has two implementations of __builtin_objectsize: one at

llvm-project/clang/lib/AST/ExprConstant.cpp

Line 11705 in de20fb7

static bool tryEvaluateBuiltinObjectSize(const Expr *E, unsigned Type,

, that works off the clang AST, and one at

llvm-project/llvm/lib/Analysis/MemoryBuiltins.cpp

Line 577 in 4e432f1

Value *llvm::lowerObjectSizeCall(IntrinsicInst *ObjectSize,

, which works off of LLVM IR. Putting the pointer into a variable forces the use of the second one.

The second one is generally less powerful; we lose a lot of information. In this particular case, I'm not sure how we could even represent the information necessary in LLVM IR. So the only way I can think of to make this work with clang's current architecture is to run __builtin_objectsize through some sort of dataflow analysis on the AST. I doubt this is high enough on anyone's priority list to make it worth spending months on something like that. Maybe it becomes simpler if the project to make clang generate MLIR works out...

nickdesaulniers · 2022-05-28T00:52:30Z

if the project to make clang generate MLIR works out...

Sign me up. :^)

serge-sans-paille · 2022-05-31T12:28:14Z

I've been investigating this test case, and the reason why it fails for trailing_array and not middle_array (from the godbolt link) is because the trailing array is actually considered as a flexible array, see https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L11594 introduced by f8f6324 .

Basically it has nothing to do with assignment, it's a problem of not being able to find the allocation point, and we decide not to rely on type information.

Note that if the second argument of __builtin_object_size is 3, then we correctly provide a lower bound.

serge-sans-paille · 2022-05-31T12:29:09Z

cc @gburgessiv who authored the flexible array extension.

llvmbot · 2022-05-31T13:40:16Z

@llvm/issue-subscribers-clang-codegen

msebor · 2022-05-31T17:35:03Z

I don't have the background on LLVM's design but it seems to me that the difference is due to the different decisions made by the two implementations of of the built-in. The front end folds simple expressions and seems to do no flow analysis. The middle end does flow analysis but doesn't fully consider type information. A simple test case is below (the same effect can be seen by making the array subscript a local variable set to 1):

struct A {
  int a[2], b;
};

__SIZE_TYPE__ f (struct A  *q)
{
  return __builtin_object_size (&q->a[1], 1);   // folded to 4 by front end
}

__SIZE_TYPE__ g (struct A *q)
{
  int *r = &q->a[1];
  return __builtin_object_size (r, 1);   // folded to -1 by middle end
}

GCC has just one implementation, in the middle end, that considers type info, so it folds both of the above to 4.

gburgessiv · 2022-05-31T18:26:45Z

I think @efriedma-quic covers the reasons behind this well in general.

RE flexible arrays, yeah, clang's frontend gives up in many of those cases. The general pattern of code that it tries to allow for is:

struct string {
  size_t len;
  char data[0];
};

struct string *new_string(const char *s) {
  size_t len = strlen(s);
  struct string *str = malloc(sizeof(*str) + len);
  // <insert *str initialization code here>
  return str;
}

Which is relatively common in C. Clang originally had checks for whether the trailing member of the struct was unsized or had a length <= 1, but that broke in the face of e.g., FreeBSD's struct sockaddr, as @serge-sans-paille notes.

kees · 2022-05-31T18:34:54Z

I feel like most of the comments above are related to bug #55741 , which is about lacking -fstrict-flex-array. The bug here is about this line in the test case:

WAT: __builtin_object_size(p, 1) == -1 (expected 12)

from:

    expect(__builtin_object_size(middle->c, 1), 16);
    p = (void *)&middle->c[1];
    expect(__builtin_object_size(p, 1), 12);

This is not from flex-array confusion.

(Additionally, using option "3" for __bos doesn't fix any of these.)

gburgessiv · 2022-05-31T18:55:00Z

I feel like most of the comments above are related to bug #55741 , which is about lacking -fstrict-flex-array. The bug here is about this line in the test case:

This is mostly what I think @efriedma-quic was trying to speak to. Let me try to break down what I think is happening here (I haven't actually run Clang, so I could be wrong, but I'm relatively confident that I'm not :) ):

In trying to evaluate __builtin_object_size(p, 1), Clang's front-end tries to figure out with p points to. It sees that p is a non-const pointer, so immediately gives up(*).

Clang, having failed to determine the objectsize of p, defers to LLVM in the form of @llvm.objectsize. LLVM has no reliable type information available to it, so it cannot reason about the number of bytes at &middle->c[1] any more than it can *middle. LLVM tries to figure out how many bytes middle has (minus offsetof(middle, c[1])), and fails, since middle refers to a parameter, which isn't an alloca or alloc_size call.

Since LLVM fails to determine this, it lowers @llvm.objectsize to -1.

(*) -- If it were an unsigned char *const p = ...;, I assume Clang would give up when trying to figure out what &middle->c[1] points to. Maybe there's room for a peephole here in the frontend's __builtin_object_size evaluation, but I imagine it'd be hard to convince folks to add more consts everywhere for Clang, so the helpfulness of that is unclear to me.

nickdesaulniers · 2023-01-13T01:55:11Z

Casts also seem problematic:

struct foo {
    int x;
    int y [14];
};

struct quux {
    long x, y, z;
};

unsigned long baz(void) {
    struct foo my_foo;
    return __builtin_object_size(&((struct quux*)&my_foo)->y, 1);
}

GCC: 8
Clang: 52

efriedma-quic · 2023-01-13T02:33:35Z

That looks like a different issue; all the relevant information is available to the code in ExprConstant, it's just coming up with an over-conservative answer.

kees · 2024-02-27T05:38:26Z

@bwendling

github-actions bot added the new issue label May 27, 2022

EugeneZelenko added clang:codegen and removed new issue labels May 31, 2022

kees mentioned this issue May 31, 2022

__builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails (need "-fstrict-flex-array") #55741

Open

tbaederr changed the title ~~__builtin_object_size() does not track object sizes across asignment~~ __builtin_object_size() does not track object sizes across assignment Jun 1, 2022

nikic mentioned this issue Jan 6, 2023

__builtin_object_size(..., 1) of a struct member acts like __bos(..., 0) #59850

Closed

nathanchance mentioned this issue Apr 28, 2023

-Wattribute-warning in drivers/crypto/hisilicon/sgl.c ClangBuiltLinux/linux#1780

Closed

shafik mentioned this issue Jun 17, 2024

__builtin_object_size suboptimal size when passed to function #95635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

__builtin_object_size() does not track object sizes across assignment #55742

__builtin_object_size() does not track object sizes across assignment #55742

kees commented May 27, 2022 •

edited by VoltrexKeyva

Loading

kees commented May 27, 2022

efriedma-quic commented May 27, 2022

nickdesaulniers commented May 28, 2022

serge-sans-paille commented May 31, 2022

serge-sans-paille commented May 31, 2022

llvmbot commented May 31, 2022

msebor commented May 31, 2022

gburgessiv commented May 31, 2022

kees commented May 31, 2022

gburgessiv commented May 31, 2022

nickdesaulniers commented Jan 13, 2023

efriedma-quic commented Jan 13, 2023

kees commented Feb 27, 2024

__builtin_object_size() does not track object sizes across assignment #55742

__builtin_object_size() does not track object sizes across assignment #55742

Comments

kees commented May 27, 2022 • edited by VoltrexKeyva Loading

kees commented May 27, 2022

efriedma-quic commented May 27, 2022

nickdesaulniers commented May 28, 2022

serge-sans-paille commented May 31, 2022

serge-sans-paille commented May 31, 2022

llvmbot commented May 31, 2022

msebor commented May 31, 2022

gburgessiv commented May 31, 2022

kees commented May 31, 2022

gburgessiv commented May 31, 2022

nickdesaulniers commented Jan 13, 2023

efriedma-quic commented Jan 13, 2023

kees commented Feb 27, 2024

kees commented May 27, 2022 •

edited by VoltrexKeyva

Loading