Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__builtin_object_size() does not track object sizes across assignment #55742

Open
kees opened this issue May 27, 2022 · 13 comments
Open

__builtin_object_size() does not track object sizes across assignment #55742

kees opened this issue May 27, 2022 · 13 comments

Comments

@kees
Copy link
Contributor

kees commented May 27, 2022

The information used to examine object sizes is lost across assignment:

struct something {
    int a;
    int c[4];
    int b;
};

unsigned char *p = &instance->c[1];

__builtin_object_size(p, 1) should equal 12, but instead is -1. (__bos(&instance->c[1], 1) correctly returns 12.)

This breaks FORTIFY_SOURCE protection as buffers that should be visible become trivially obfuscated, and is a regression compared to GCC.

https://godbolt.org/z/4Pozfe9bY

@kees
Copy link
Contributor Author

kees commented May 27, 2022

cc @nickdesaulniers @gwelymernans

@efriedma-quic
Copy link
Collaborator

clang has two implementations of __builtin_objectsize: one at

static bool tryEvaluateBuiltinObjectSize(const Expr *E, unsigned Type,
, that works off the clang AST, and one at
Value *llvm::lowerObjectSizeCall(IntrinsicInst *ObjectSize,
, which works off of LLVM IR. Putting the pointer into a variable forces the use of the second one.

The second one is generally less powerful; we lose a lot of information. In this particular case, I'm not sure how we could even represent the information necessary in LLVM IR. So the only way I can think of to make this work with clang's current architecture is to run __builtin_objectsize through some sort of dataflow analysis on the AST. I doubt this is high enough on anyone's priority list to make it worth spending months on something like that. Maybe it becomes simpler if the project to make clang generate MLIR works out...

@nickdesaulniers
Copy link
Member

if the project to make clang generate MLIR works out...

Sign me up. :^)

@serge-sans-paille
Copy link
Collaborator

I've been investigating this test case, and the reason why it fails for trailing_array and not middle_array (from the godbolt link) is because the trailing array is actually considered as a flexible array, see https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L11594 introduced by f8f6324 .

Basically it has nothing to do with assignment, it's a problem of not being able to find the allocation point, and we decide not to rely on type information.

Note that if the second argument of __builtin_object_size is 3, then we correctly provide a lower bound.

@serge-sans-paille
Copy link
Collaborator

cc @gburgessiv who authored the flexible array extension.

@llvmbot
Copy link
Member

llvmbot commented May 31, 2022

@llvm/issue-subscribers-clang-codegen

@msebor
Copy link

msebor commented May 31, 2022

I don't have the background on LLVM's design but it seems to me that the difference is due to the different decisions made by the two implementations of of the built-in. The front end folds simple expressions and seems to do no flow analysis. The middle end does flow analysis but doesn't fully consider type information. A simple test case is below (the same effect can be seen by making the array subscript a local variable set to 1):

struct A {
  int a[2], b;
};

__SIZE_TYPE__ f (struct A  *q)
{
  return __builtin_object_size (&q->a[1], 1);   // folded to 4 by front end
}

__SIZE_TYPE__ g (struct A *q)
{
  int *r = &q->a[1];
  return __builtin_object_size (r, 1);   // folded to -1 by middle end
}

GCC has just one implementation, in the middle end, that considers type info, so it folds both of the above to 4.

@gburgessiv
Copy link
Member

I think @efriedma-quic covers the reasons behind this well in general.

RE flexible arrays, yeah, clang's frontend gives up in many of those cases. The general pattern of code that it tries to allow for is:

struct string {
  size_t len;
  char data[0];
};

struct string *new_string(const char *s) {
  size_t len = strlen(s);
  struct string *str = malloc(sizeof(*str) + len);
  // <insert *str initialization code here>
  return str;
}

Which is relatively common in C. Clang originally had checks for whether the trailing member of the struct was unsized or had a length <= 1, but that broke in the face of e.g., FreeBSD's struct sockaddr, as @serge-sans-paille notes.

@kees
Copy link
Contributor Author

kees commented May 31, 2022

I feel like most of the comments above are related to bug #55741 , which is about lacking -fstrict-flex-array. The bug here is about this line in the test case:

WAT: __builtin_object_size(p, 1) == -1 (expected 12)

from:

    expect(__builtin_object_size(middle->c, 1), 16);
    p = (void *)&middle->c[1];
    expect(__builtin_object_size(p, 1), 12);

This is not from flex-array confusion.

(Additionally, using option "3" for __bos doesn't fix any of these.)

@gburgessiv
Copy link
Member

I feel like most of the comments above are related to bug #55741 , which is about lacking -fstrict-flex-array. The bug here is about this line in the test case:

This is mostly what I think @efriedma-quic was trying to speak to. Let me try to break down what I think is happening here (I haven't actually run Clang, so I could be wrong, but I'm relatively confident that I'm not :) ):

In trying to evaluate __builtin_object_size(p, 1), Clang's front-end tries to figure out with p points to. It sees that p is a non-const pointer, so immediately gives up(*).

Clang, having failed to determine the objectsize of p, defers to LLVM in the form of @llvm.objectsize. LLVM has no reliable type information available to it, so it cannot reason about the number of bytes at &middle->c[1] any more than it can *middle. LLVM tries to figure out how many bytes middle has (minus offsetof(middle, c[1])), and fails, since middle refers to a parameter, which isn't an alloca or alloc_size call.

Since LLVM fails to determine this, it lowers @llvm.objectsize to -1.

(*) -- If it were an unsigned char *const p = ...;, I assume Clang would give up when trying to figure out what &middle->c[1] points to. Maybe there's room for a peephole here in the frontend's __builtin_object_size evaluation, but I imagine it'd be hard to convince folks to add more consts everywhere for Clang, so the helpfulness of that is unclear to me.

@tbaederr tbaederr changed the title __builtin_object_size() does not track object sizes across asignment __builtin_object_size() does not track object sizes across assignment Jun 1, 2022
@nickdesaulniers
Copy link
Member

Casts also seem problematic:

struct foo {
    int x;
    int y [14];
};

struct quux {
    long x, y, z;
};

unsigned long baz(void) {
    struct foo my_foo;
    return __builtin_object_size(&((struct quux*)&my_foo)->y, 1);
}

GCC: 8
Clang: 52

@efriedma-quic
Copy link
Collaborator

That looks like a different issue; all the relevant information is available to the code in ExprConstant, it's just coming up with an over-conservative answer.

@kees
Copy link
Contributor Author

kees commented Feb 27, 2024

@bwendling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants