Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASan][libc++] Correct (explicit) annotation size #79292

Merged
merged 3 commits into from
Jan 25, 2024

Conversation

AdvenamTacet
Copy link
Member

@AdvenamTacet AdvenamTacet commented Jan 24, 2024

A quick examination suggests that the current code in the codebase does not lead to incorrect annotations. However, the intention is for the object after the function to be annotated in a way that only its contents are unpoisoned and the rest is poisoned. This commit makes it explicit and avoids potential issues in future.

In addition, I have implemented a few tests for a function that helped me identify the specific argument value.

Notice: there is no known scenario where old code results in incorrect annotation.

A quick examination suggests that the current code in the codebase does not lead to incorrect annotation.
However, the intention is for the object after the function to be annotated in a way that only its contents are unpoisoned and the rest is poisoned.
This commit makes it explicit and avoids potential issues in future.

In addition, I have implemented a few tests for a function that helped me identify the specific argument value.
@AdvenamTacet AdvenamTacet added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jan 24, 2024
@AdvenamTacet AdvenamTacet requested a review from a team as a code owner January 24, 2024 13:59
@llvmbot
Copy link

llvmbot commented Jan 24, 2024

@llvm/pr-subscribers-libcxx

Author: Tacet (AdvenamTacet)

Changes

A quick examination suggests that the current code in the codebase does not lead to incorrect annotations. However, the intention is for the object after the function to be annotated in a way that only its contents are unpoisoned and the rest is poisoned. This commit makes it explicit and avoids potential issues in future.

In addition, I have implemented a few tests for a function that helped me identify the specific argument value.


Full diff: https://github.com/llvm/llvm-project/pull/79292.diff

2 Files Affected:

  • (modified) libcxx/include/string (+1-1)
  • (modified) libcxx/test/std/strings/basic.string/string.modifiers/string_append/pointer_size.pass.cpp (+6)
diff --git a/libcxx/include/string b/libcxx/include/string
index e97139206d4fa7c..e69da6e61d11b27 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -2385,7 +2385,7 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 void basic_string<_CharT, _Traits, _Allocator>::__
   __old_sz = __n_copy + __n_add + __sec_cp_sz;
   __set_long_size(__old_sz);
   traits_type::assign(__p[__old_sz], value_type());
-  __annotate_new(__old_cap + __delta_cap);
+  __annotate_new(__old_sz);
 }
 
 // __grow_by is deprecated because it does not set the size. It may not update the size when the size is changed, and it
diff --git a/libcxx/test/std/strings/basic.string/string.modifiers/string_append/pointer_size.pass.cpp b/libcxx/test/std/strings/basic.string/string.modifiers/string_append/pointer_size.pass.cpp
index 93e7500a11967ce..6eac4082fba0216 100644
--- a/libcxx/test/std/strings/basic.string/string.modifiers/string_append/pointer_size.pass.cpp
+++ b/libcxx/test/std/strings/basic.string/string.modifiers/string_append/pointer_size.pass.cpp
@@ -36,6 +36,7 @@ TEST_CONSTEXPR_CXX20 void test_string() {
   test(S(), "12345678901234567890", 1, S("1"));
   test(S(), "12345678901234567890", 3, S("123"));
   test(S(), "12345678901234567890", 20, S("12345678901234567890"));
+  test(S(), "1234567890123456789012345678901234567890", 40, S("1234567890123456789012345678901234567890"));
 
   test(S("12345"), "", 0, S("12345"));
   test(S("12345"), "12345", 5, S("1234512345"));
@@ -44,6 +45,11 @@ TEST_CONSTEXPR_CXX20 void test_string() {
   test(S("12345678901234567890"), "", 0, S("12345678901234567890"));
   test(S("12345678901234567890"), "12345", 5, S("1234567890123456789012345"));
   test(S("12345678901234567890"), "12345678901234567890", 20, S("1234567890123456789012345678901234567890"));
+
+  // Starting from long string (no SSO)
+  test(S("1234567890123456789012345678901234567890"), "", 0, S("1234567890123456789012345678901234567890"));
+  test(S("1234567890123456789012345678901234567890"), "a", 1, S("1234567890123456789012345678901234567890a"));
+  test(S("1234567890123456789012345678901234567890"), "aaaaaaaaaa", 10, S("1234567890123456789012345678901234567890aaaaaaaaaa"));
 }
 
 TEST_CONSTEXPR_CXX20 bool test() {

Copy link

github-actions bot commented Jan 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -44,6 +45,14 @@ TEST_CONSTEXPR_CXX20 void test_string() {
test(S("12345678901234567890"), "", 0, S("12345678901234567890"));
test(S("12345678901234567890"), "12345", 5, S("1234567890123456789012345"));
test(S("12345678901234567890"), "12345678901234567890", 20, S("1234567890123456789012345678901234567890"));

// Starting from long string (no SSO)
test(S("1234567890123456789012345678901234567890"), "", 0, S("1234567890123456789012345678901234567890"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tests fail without the changes in <string>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I think old code is correct (at leas, has correct results) with how the __grow_by_and_replace function is used right now.

However, analogical tests helped me detect a problem some time ago in a different place, so I just added those test cases here for future, as I was already looking at that file.

Trying to find an example of incorrect behavior, I ran locally test below and both, old and new version passed. (I tested with and without short string annotations.)

template <class S>
void test_asan(size_t const a, size_t const b, size_t const c) {
  S sa(a, 'a'), sb(b, 'b'), sc(c, 'c');
  LIBCPP_ASSERT(is_string_asan_correct(sa));
  sa.append(sb.c_str(), sb.size());
  LIBCPP_ASSERT(is_string_asan_correct(sa));
  LIBCPP_ASSERT(is_string_asan_correct(sb));
  LIBCPP_ASSERT(sa.size() == a + b);
  sa.append(sc.c_str(), sc.size());
  LIBCPP_ASSERT(is_string_asan_correct(sa));
  LIBCPP_ASSERT(is_string_asan_correct(sc));
  LIBCPP_ASSERT(sa.size() == a + b + c);
}

void test_loop() {
  for(size_t i = 0; i < 120; ++i)
    for(size_t j = 0; j < 120; ++j)
      for(size_t k = 0; k < 120; ++k) {
        test_asan<std::string>(i, j, k);
      }
}

I briefly review other functions using __grow_by_and_replace and I think there is no possible test case causing incorrect annotation with old code.

But I'm focused on understanding why #79049 got reverted now, so here I was mostly checking that the new version works as expected.

Copy link
Member Author

@AdvenamTacet AdvenamTacet Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, I could create two separate PRs, but those tests are related (one of them tests this function, and rest I added just because I was already modifying that file.).

Edit: I added one more test case, to test changing buffer from long string.

@AdvenamTacet AdvenamTacet merged commit a315fb1 into llvm:main Jan 25, 2024
42 of 43 checks passed
@AdvenamTacet AdvenamTacet deleted the fix-annotation-argument branch January 25, 2024 19:41
@AdvenamTacet
Copy link
Member Author

AdvenamTacet commented Jan 25, 2024

Windows and FreeBSD passed previously and Android failure seems to be unrelated. I will be watching buildkite to confirm that remaining tests passed.

AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request Jan 26, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    [email protected]
    [email protected]
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request May 5, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    [email protected]
    [email protected]
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request May 5, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    [email protected]
    [email protected]
AdvenamTacet pushed a commit to trail-of-forks/llvm-project that referenced this pull request May 5, 2024
This is 3rd attempt to upstream short string annotations, it's the same as the previous one, but other PRs fixed issues withing LLVM:
- llvm#79489
- llvm#79522

Additionaly annotations were updated (but it shouldn't have any impact on anything):
- llvm#79292

Now, as far as I know, all buildbots should work without problems. Both previous reverts were not related to issues with string annotations, but with issues in LLVM/clang.
Read PRs above and below for details.

---

Previous description:

Originally merged here: llvm#75882
Reverted here: llvm#78627

Reverted due to failing buildbots. The problem was not caused by the annotations code, but by code in the `UniqueFunctionBase` class and in the `JSON.h` file. That code caused the program to write to memory that was already being used by string objects, which resulted in an ASan error.

Fixes are implemented in:
- llvm#79065
- llvm#79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here: llvm#72677

Requires to pass CI without fails:
- llvm#75845
- llvm#75858

Annotating `std::basic_string` with default allocator is implemented in llvm#72677 but annotations for short strings (SSO - Short String Optimization) are turned off there. This commit turns them on. This also removes `_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to support turning on and off short string annotations.

Support in ASan API exists since llvm@dd1b7b7. You can turn off annotations for a specific allocator based on changes from llvm@2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in `std::vector` and `std::deque` collections. These enhancements empower ASan to effectively detect instances where the instrumented program attempts to access memory within a collection's internal allocation that remains unused. This includes cases where access occurs before or after the stored elements in `std::deque`, or between the `std::basic_string`'s size (including the null terminator) and capacity bounds.

The introduction of these annotations was spurred by a real-world software bug discovered by Trail of Bits, involving an out-of-bounds memory access during the comparison of two strings using the `std::equals` function. This function was taking iterators (`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison, using a custom comparison function. When the `iter1` object exceeded the length of `iter2`, an out-of-bounds read could occur on the `iter2` object. Container sanitization, upon enabling these annotations, would effectively identify and flag this potential vulnerability.

If you have any questions, please email:

    [email protected]
    [email protected]
AdvenamTacet pushed a commit that referenced this pull request May 7, 2024
This pull request is the third iteration aiming to integrate short
string annotations. This commit includes:
- Enabling basic_string annotations for short strings.
- Setting a value of `__trivially_relocatable` in `std::basic_string` to
`false_type` when compiling with ASan (nothing changes when compiling
without ASan). Short string annotations make `std::basic_string` to not
be trivially relocatable, because memory has to be unpoisoned.
- Adding a `_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS` modifier to two
functions.
- Creating a macro `_LIBCPP_ASAN_VOLATILE_WRAPPER` to prevent
problematic stack optimizations (the macro modifies code behavior only
when compiling with ASan).

Previously we had issues with compiler optimization, which we understand
thanks to @vitalybuka. This commit also addresses smaller changes in
short string, since previous upstream attempts.

Problematic optimization was loading two values in code similar to:
```
__is_long() ? __get_long_size() : __get_short_size();
```
We aim to resolve it with the volatile wrapper.

This commit is built on top of two previous attempts which descriptions
are below.

Additionally, in the meantime, annotations were updated (but it
shouldn't have any impact on anything):
- #79292

---

Previous PR: #79049
Reverted:
a16f81f

Previous description:

Originally merged here: #75882
Reverted here: #78627

Reverted due to failing buildbots. The problem was not caused by the
annotations code, but by code in the `UniqueFunctionBase` class and in
the `JSON.h` file. That code caused the program to write to memory that
was already being used by string objects, which resulted in an ASan
error.

Fixes are implemented in:
- #79065
- #79066

Problematic code from `UniqueFunctionBase` for example:
```cpp
    // In debug builds, we also scribble across the rest of the storage.
    memset(RHS.getInlineStorage(), 0xAD, InlineStorageSize);
```

---

Original description:

This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).

Originally suggested here: https://reviews.llvm.org/D147680

String annotations added here:
#72677

Requires to pass CI without fails:
- #75845
- #75858

Annotating `std::basic_string` with default allocator is implemented in
#72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.

Support in ASan API exists since
dd1b7b7.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec.

This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.

The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.

If you have any questions, please email:

- [email protected]
- [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants