Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby assertion causing panic with shared strings #85

Closed
eileencodes opened this issue Aug 9, 2024 · 5 comments · Fixed by #92
Closed

Ruby assertion causing panic with shared strings #85

eileencodes opened this issue Aug 9, 2024 · 5 comments · Fixed by #92

Comments

@eileencodes
Copy link
Collaborator

eileencodes commented Aug 9, 2024

I've been debugging a panic that I can easily reproduce on my macos machine. I've whittled the reproduction down and what appears to be happening is that in rb_mmtk_gc_ref_update_string the original string is moved but shared string object stays put. In this scenario we've noticed that while the original object has it's underlying buffer the shared object enters rb_mmtk_gc_ref_update_string with no buffer.

After some more debugging and testing we also found that in other cases (unrelated to this particular test) that very often strings are missing their underlying buffer. So we're wondering, if a string is not embedded should it always have an underlying buffer? IE should this assertion always pass:

GC_ASSERT(RSTRING_EXT(str)->strbuf);

We're seeing this not true but aren't sure that the string should always have an underlying buffer (if it's not embedded).

Here's the test I've been running. Note I set the max heap to 1gb because it reproduces more often with a smaller amount of memory. This also only happens on Immix because the object is moving.

You can use this branch which has the test stripped down to the minimum https://github.com/mmtk/ruby/compare/mmtk...eileencodes:ruby:reproduction-for-str-size?expand=1

make test-all RUBY_TESTOPTS="--excludes-dir=test/.excludes-mmtk --seed=39446 -v" RUN_OPTS="--mmtk --mmtk-plan=Immix --mmtk-max-heap=1GiB" RUST_BACKTRACE=1 RUST_LOG=none TESTS="test/rubygems/test_gem_remote_fetcher.rb"

The assertion that's failing is here which expects the difference between orig and shared to be equal to or greater than 0 but instead we have a negative number (because orig moved and shared didn't follow:

ruby/string.c

Lines 1775 to 1783 in 8082532

if (FL_TEST_RAW(orig, STR_SHARED)) {
VALUE shared = RSTRING(orig)->as.heap.aux.shared;
long ofs = RSTRING(orig)->as.heap.ptr - RSTRING_PTR(shared);
long rest = RSTRING_LEN(shared) - ofs - RSTRING_LEN(orig);
RUBY_ASSERT(ofs >= 0);
RUBY_ASSERT(rest >= 0);
RUBY_ASSERT(ofs + rest <= RSTRING_LEN(shared));
RUBY_ASSERT(OBJ_FROZEN(shared));

Here's some of the stacktrace with the debugging info (notes added by me):

marking 0x20100cb1780
done 0x20100cb1780 old_strbuf: 0x20100cb1770 new_strbuf: 0x20100cb1770
[1/2] TestGemRemoteFetcher#test_implicit_upper_case_proxy = 0.01 s
marking 0x201007af028
done 0x201007af028 old_strbuf: 0x201007af018 new_strbuf: 0x201007af018
marking 0x600ffc05018
done 0x600ffc05018 old_strbuf: 0x600ffc05008 new_strbuf: 0x600ffc05008
marking 0x20100cb1780 <==== this is orig
done 0x20100c61c08 old_strbuf: 0x20100cb1770 new_strbuf: 0x20100c61bf8 <==== underlying strbuf moves for orig
marking 0x20100cb1780 <===== this is shared
done 0x20100cb1780 old_strbuf: 0x0 new_strbuf: 0x0 <===== not only does shared not move, it doesn't have a strbuf
[2/2] TestGemRemoteFetcher#test_implicit_proxystrbuf_ptr 0x20100c61c08, orig_ptr 0x20100c61c08, shared_ptr 0x20100cb1780
str addr orig: 0x20100c61c08 shared: 0x20100cb1780 orig_strbuf 0x20100c61bf8 shared_strbuf 0x0
Assertion Failed: string.c:1805:str_new_frozen_buffer:ofs >= 0
ruby 3.4.0dev (2024-08-09T19:42:38Z reproduction-for-s.. a5b29b5277) +MMTk(Immix) [arm64-darwin23]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:             
     * ~/Library/Logs/DiagnosticReports                                     
     * /Library/Logs/DiagnosticReports                                      
   for more details.                                                        
Don't forget to include the above Crash Report log file in bug reports.     

-- Control frame information -----------------------------------------------
c:0030 p:---- s:0172 e:000171 CFUNC  :match
c:0029 p:---- s:0169 e:000168 CFUNC  :match
c:0028 p:0010 s:0164 e:000163 METHOD /Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61
c:0027 p:0037 s:0160 e:000159 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:202
c:0026 p:0052 s:0154 E:001088 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:170
c:0025 p:0083 s:0145 e:000144 BLOCK  /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1756 [FINISH]
c:0024 p:---- s:0138 e:000137 CFUNC  :map
c:0023 p:0103 s:0134 e:000133 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1743
c:0022 p:0035 s:0122 e:000121 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1360
c:0021 p:0010 s:0115 e:000113 BLOCK  /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:821
c:0020 p:0027 s:0109 e:000108 METHOD <internal:array>:54
c:0019 p:0047 s:0103 e:000102 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:819
c:0018 p:0008 s:0096 e:000095 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:856
c:0017 p:0127 s:0089 e:000088 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1689
c:0016 p:0016 s:0076 e:000075 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1297
c:0015 p:0005 s:0071 e:000070 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1854
c:0014 p:0006 s:0067 e:000066 BLOCK  /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1841
c:0013 p:0027 s:0063 e:000062 METHOD <internal:array>:54
c:0012 p:0042 s:0057 e:000056 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1840
c:0011 p:0013 s:0052 e:000051 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1884
c:0010 p:0008 s:0047 e:000046 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1044
c:0009 p:0008 s:0041 e:000040 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:889
c:0008 p:0017 s:0035 e:000034 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:145
c:0007 p:0008 s:0030 e:000029 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1281
c:0006 p:0032 s:0025 e:000024 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1967
c:0005 p:0009 s:0021 e:000020 METHOD /Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1971
c:0004 p:0065 s:0016 e:000014 TOP    /Users/eileencodes/src/github.com/open_source/ruby/tool/test/runner.rb:14 [FINISH]
c:0003 p:---- s:0011 e:000010 CFUNC  :require_relative
c:0002 p:0005 s:0006 e:000005 EVAL   ./test/runner.rb:5 [FINISH]
c:0001 p:0000 s:0003 E:001d10 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
./test/runner.rb:5:in '<main>'
./test/runner.rb:5:in 'require_relative'
/Users/eileencodes/src/github.com/open_source/ruby/tool/test/runner.rb:14:in '<top (required)>'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1971:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1967:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1281:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:145:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:889:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1044:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1884:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1840:in '_run'
<internal:array>:54:in 'each'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1841:in 'block in _run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1854:in 'run_tests'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1297:in '_run_anything'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1689:in '_run_anything'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:856:in '_run_suites'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:819:in '_run_suites'
<internal:array>:54:in 'each'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:821:in 'block in _run_suites'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1360:in '_run_suite'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1743:in '_run_suite'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1743:in 'map'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit.rb:1756:in 'block in _run_suite'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:170:in 'run'
/Users/eileencodes/src/github.com/open_source/ruby/tool/lib/test/unit/testcase.rb:202:in 'run_test'
/Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61:in 'test_implicit_proxy'
/Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61:in 'match'
/Users/eileencodes/src/github.com/open_source/ruby/test/rubygems/test_gem_remote_fetcher.rb:61:in 'match'
Screenshot 2024-08-09 at 3 53 08 PM

cc/ @tenderlove @wks

@wks
Copy link

wks commented Aug 10, 2024

In the past, I thought a non-embedded string must have a non-NULL strbuf. But that may be a mistake.

The old documentation said that STR_SHARED_ROOT implies RSTRING_NO_EMBED, but ruby@c6b3912 updated the documentation and it no longer says so.

So the policy of the RSTRING_EXT(str)->strbuf field is that it is non-null if and only if RSTRING(str)->as.heap.ptr points into an imemo_strbuf instance. Specifically,

  • If the string is embedded, the strbuf field does not exist (used for string payload)
  • If the string owns the buffer (not shared, not shared root), the strbuf must be non-NULL.
  • If the string is a SHARED_ROOT and it is not embedded, strbuf must be non-NULL. If it is embedded, there is no strbuf field.
  • If the string is SHARED, strbuf is non-NULL if the shared root is not embedded.

So the only case where RSTRING_ext(str)->strbuf can be NULL is that str is a shared string, and the shared root is embedded.

@wks
Copy link

wks commented Aug 10, 2024

marking 0x20100cb1780 <===== this is shared
done 0x20100cb1780 old_strbuf: 0x0 new_strbuf: 0x0 <===== not only does shared not move, it doesn't have a strbuf

This shouldn't happen. If the non-embedded string is not shared, or if it is shared but the shared root is not embedded, then it must have a non-null strbuf. This must be a bug.

@wks
Copy link

wks commented Aug 10, 2024

By the way, by default Immix does not move all objects. It only does "opportunistic" defragmentation, i.e. (1) it only does defragmentation if the heap is very full, and (2) even when doing defragmentation, it only moves the most fragmented blocks, and stop defragmenting if it has moved 2% of the full heap size.

However, you can override this behavior and force Immix to do defragmentation in every GC and move as many objects as possible by hacking mmtk-core itself. See the instructions in https://github.com/mmtk/mmtk-core/blob/160b7702fccda133c9407234821ad35103623179/src/policy/immix/mod.rs#L20-L43

@tenderlove
Copy link

@wks this all makes sense. So indeed we're seeing a bug. @eileencodes it sounds like we should be able to add that assertion in the "update references" code. IOW, when we're updating references, if the string isn't embedded or it is pointing to a non-embedded root, it must have a reference to a tmpbuf reference.

My guess is that somewhere we're making shared strings that don't have the tmpbuf set. Adding assertions like this should help us track it down.

@wks
Copy link

wks commented Sep 11, 2024

I found a special case. If a string is created using str_new_static, it will not have an RSTRING_EXT(str)->strbuf because the string contents are not in the MMTk heap. It's not even in the malloc heap. Such string instances have the STR_NOFREE flag.

And if a string is shared, and the shared root has STR_NOFREE, the shared string will also not have RSTRING_EXT(str)->strbuf, either.

eileencodes added a commit that referenced this issue Sep 11, 2024
In CI (and locally) we were seeing a failed assertion for the
`test_gem_remote_fetcher` test. After much debugging it was clear that
for this test the strbuf was missing on `shared` so when `orig` moved,
`shared` didn't follow`. This caused the offset to be very off.

```
Assertion Failed: string.c:1805:str_new_frozen_buffer:ofs >= 0
ruby 3.4.0dev (2024-08-09T19:42:38Z reproduction-for-s.. a5b29b5277) +MMTk(Immix) [arm64-darwin23]
```

With a `GC_ASSERT` we tracked down a missing `rb_mmtk_str_set_strbuf` in
`rb_str_tmp_frozen_no_embed_acquire`.

After fixing that case a standard string like `"hi"` would still fail
the `GC_ASSERT`. This is because we also need to return early if the
`new_root` of a shared object has the `STR_NOFREE` flag. In those cases,
the object does not have a strbuf to copy.

After these changes, both the `test_string.rb` and
`test_gem_remote_fetcher.rb` tests pass.

Fixes: #85
@wks wks closed this as completed in ca89e44 Sep 12, 2024
@wks wks closed this as completed in #92 Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants