Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC Warning: Failed to expand heap #234

Closed
woehrl01 opened this issue Oct 12, 2023 · 15 comments · Fixed by #235
Closed

GC Warning: Failed to expand heap #234

woehrl01 opened this issue Oct 12, 2023 · 15 comments · Fixed by #235

Comments

@woehrl01
Copy link

woehrl01 commented Oct 12, 2023

Hi there,

I'm receiving the following error:

│ 2023-10-12T06:07:26.021556Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Failed to expand heap by 131072 KiB    thread=27                        │
│ 2023-10-12T06:07:26.037565Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Failed to expand heap by 131072 KiB    thread=27                        │
│ 2023-10-12T06:07:26.053438Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Failed to expand heap by 131072 KiB    thread=27                        │
│ 2023-10-12T06:07:26.053456Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Failed to expand heap by 192 KiB    thread=27                           │
│ 2023-10-12T06:07:26.053459Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Out of Memory! Heap size: 706 MiB. Returning NULL!    thread=27         │
│ 2023-10-12T06:07:26.053577Z    error    envoy wasm external/envoy/source/extensions/common/wasm/wasm_vm.cc:38    Function: proxy_on_configure failed: Uncaught RuntimeError: unreachable                         │
│ Proxy-Wasm plugin in-VM backtrace:                                                                                                                                                                               │
│   0:  0xa4150 - runtime._panic                                                                                                                                                                                   │
│   1:  0x171ceb - (*github.com/corazawaf/coraza/v3/internal/corazawaf.RuleGroup).Add                                                                                                                              │
│   2:  0x1cee18 - github.com/corazawaf/coraza/v3/internal/seclang.directiveSecRule                                                                                                                                │
│   3:  0x1d4760 - (*github.com/corazawaf/coraza/v3/internal/seclang.Parser).FromString                                                                                                                            │
│   4:  0x1d3e93 - (*github.com/corazawaf/coraza/v3/internal/seclang.Parser).FromFile                                                                                                                              │
│   5:  0x1d48b2 - (*github.com/corazawaf/coraza/v3/internal/seclang.Parser).FromString                                                                                                                            │
│   6:  0x1da72f - proxy_on_configure    thread=27                                                                                                                                                                 │
│ 2023-10-12T06:07:26.053589Z    error    envoy wasm external/envoy/source/extensions/common/wasm/wasm.cc:109    Wasm VM failed Failed to configure thread-local Wasm plugin    thread=27

I already pumped up the istio pods to 4 GB (instead of 1 GB) but still the same error.

Do you have any idea how to troubleshoot that?

Rules are as follows:

spec:
  phase: AUTHN
  pluginConfig:
    default_directives: default
    directives_map:
      blockAll:
        - Include @recommended-conf
        - SecRuleEngine On
        - Include @crs-setup-conf
        - Include @owasp_crs/*.conf
        - >-
          SecRule REQUEST_HEADERS:user-agent "@contains firefox" "id:1, phase:1,
          deny, log, logdata:'firefox forbidden'"
        - >-
          SecRule REQUEST_HEADERS:user-agent "@contains Firefox" "id:3, phase:1,
          deny, log, logdata:'firefox forbidden'"
        - >-
          SecRule REQUEST_URI "@beginsWith /admin" "id:2, phase:1, deny, log,
          logdata:'admin forbidden'"
      default:
        - Include @recommended-conf
        - SecRuleEngine DetectionOnly
        - Include @crs-setup-conf
        - Include @owasp_crs/*.conf
    per_authority_directives:
      special.domain.com: blockAll

I'm running Istio 1.18.*

We build our own plugin because we include additional files. And it looks like that's related to the embedded files being too big. As soon as I remove the files, it works as expected.

@woehrl01
Copy link
Author

Looks it it could be related to using "nottinygc" wasilibs/nottinygc#13

@woehrl01
Copy link
Author

I investigated and the plugin works normaly with the following patch applied (disableing nottinygc)

diff --git a/init_tinygo.go b/init_tinygo.go
index 11fb7e9..15dd236 100644
--- a/init_tinygo.go
+++ b/init_tinygo.go
@@ -7,8 +7,7 @@ package main
 
 import (
 	"unsafe"
-
-	_ "github.com/wasilibs/nottinygc"
+	//_ "github.com/wasilibs/nottinygc"
 )
 
 // Some host functions that are not implemented by Envoy end up getting imported anyways
diff --git a/magefiles/magefile.go b/magefiles/magefile.go
index 5607cf4..91aa73c 100644
--- a/magefiles/magefile.go
+++ b/magefiles/magefile.go
@@ -188,7 +188,7 @@ func Build() error {
 	}
 
 	buildTags := []string{
-		"custommalloc",     // https://github.com/wasilibs/nottinygc#usage
+		//"custommalloc",     // https://github.com/wasilibs/nottinygc#usage
 		"no_fs_access",     // https://github.com/corazawaf/coraza#build-tags
 		"memoize_builders", // https://github.com/corazawaf/coraza#build-tags
 	}
@@ -214,8 +214,8 @@ func Build() error {
 			initialPages = ip
 		}
 	}
-
-	if err := sh.RunV("tinygo", "build", "-gc=custom", "-opt=2", "-o", filepath.Join("build", "mainraw.wasm"), "-scheduler=none", "-target=wasi", buildTagArg); err != nil {
+	
+	if err := sh.RunV("tinygo", "build", "-opt=2", "-o", filepath.Join("build", "mainraw.wasm"), "-scheduler=none", "-target=wasi", buildTagArg); err != nil {
 		return err
 	}

@M4tteoP
Copy link
Member

M4tteoP commented Oct 12, 2023

Hi @woehrl01, thanks for raising it.

We build our own plugin because we include additional files. And it looks like that's related to the embedded files being too big. As soon as I remove the files, it works as expected.

Would you mind sharing the rough size of the additional files you are including? Are your changes strictly related to adding files with rules (no changes in the build process)?

Maybe also giving it a go with this debug branch might print some hints.

Thanks to @anuraaga, there has been a significant effort moving to nottinygc in order to overcome out-of-heap problems with the default GC, and overall it provided a great improvement. I'm kind of surprised the default one is behaving better.

@woehrl01
Copy link
Author

@M4tteoP Thank you, absolutely!

Sizes without the additional file:
Bildschirmfoto 2023-10-12 um 12 07 38

Sizes with the additional file (so approximately 6MB more)
Bildschirmfoto 2023-10-12 um 12 07 30

So to be fully transparent about what I'm doing: I'm trying to get geoip based blocking to work in this wasm.

Therefore, I created a custom branch from the extended plugin: corazawaf/coraza-geoip@master...woehrl01:coraza-geoip:support_v3_4

And I add the loading of the database as an //go:embed and call that method RegisterGeoDatabase from the main.go

By purely adding the 6MB database as //go:embed (without loading the plugin) the booting of the module fails, with the mentioned error.

By switching to the "default" gc, everything works and the blocking also works based on the geoip.

@M4tteoP
Copy link
Member

M4tteoP commented Oct 12, 2023

I did some tries only embedding different files and spinning up envoy. It seems to be not directly related to the size of the embedded file but rather to their content 🤔

▶ head -c 30000000 /dev/urandom | base64 > wasmplugin/rules/random_file

Total size: 48M, containing only ASCII characters, is correctly booted.

▶ head -c 1000000 /dev/urandom > wasmplugin/rules/random_file

Total size: 11M, random bytes, fails with:

[error][wasm] [source/extensions/common/wasm/context.cc:1154] wasm log my_vm_id: GC Warning: Failed to expand heap by 131072 KiB
[error][wasm] [source/extensions/common/wasm/context.cc:1154] wasm log my_vm_id: GC Warning: Failed to expand heap by 131072 KiB
[error][wasm] [source/extensions/common/wasm/context.cc:1154] wasm log my_vm_id: GC Warning: Failed to expand heap by 192 KiB
[error][wasm] [source/extensions/common/wasm/context.cc:1154] wasm log my_vm_id: GC Warning: Out of Memory! Heap size: 858 MiB. Returning NULL!
[info][wasm] [source/extensions/common/wasm/context.cc:1148] wasm log my_vm_id: panic: out of memory
[error][wasm] [source/extensions/common/wasm/wasm_vm.cc:38] Function: proxy_on_configure failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
 0:  0xa3b6c - runtime._panic
 1:  0x161919 - (*github.com/corazawaf/coraza/v3/internal/corazawaf.RuleGroup).Add
 2:  0x1c2703 - github.com/corazawaf/coraza/v3/internal/seclang.directiveSecRule
 3:  0x1c804b - (*github.com/corazawaf/coraza/v3/internal/seclang.Parser).FromString
 4:  0x1c777e - (*github.com/corazawaf/coraza/v3/internal/seclang.Parser).FromFile
 5:  0x1c819d - (*github.com/corazawaf/coraza/v3/internal/seclang.Parser).FromString
 6:  0x1cdff1 - proxy_on_configure

@woehrl01
Copy link
Author

woehrl01 commented Oct 12, 2023

@M4tteoP Great, that's a very interesting finding, and makes sense because the geopip database is a binary file.

@anuraaga
Copy link
Contributor

anuraaga commented Oct 13, 2023

Thanks for the investigation @M4tteoP! Yeah this does confirm a suspicion I have had - random data will cause issues for the GC. bdwgc is by default a conservative GC and relies on finding values in memory that look like pointers to identify what to keep. Random data looks like pointers in 32-bits, which Wasm is (it's funny seeing the Wasm is the next compute hype when we see it taking us back 10 years :P). Generally random data within request parsing is not as big of a deal since the buffers themselves eventually get released and stop causing any false positives, but embedded data is there forever and holding up GC of pointers that look like those values.

For reference, bdwgc has this note on 2GB heaps on 32-bit machines, while I don't know if we're running close to that in these cases, we are definitely using a reasonably high amount of memory on a 32-bit machine

https://github.com/ivmai/bdwgc/blob/806537be2dec4f49056cb2fe091ac7f7d78728a8/docs/faq.md?plain=1#L84

TinyGo with conservative GC will probably have similar issues but may run into them less by having more knowledge of the way it lays out memory while bdwgc is treating it as a complete black box. If it works for you, it could be a workaround for now, but just for reference, when I try the patch to use default GC and run go run mage.go ftw, luckily it doesn't seem to OOM, but there are obvious 1-2s GC pauses throughout the run, with the total taking 90s vs 40, the majority of that time is probably GC pauses. Something to keep in mind if trying that out in production.

bdwgc does have some support for precise GC which shouldn't have these issues, but both it and TinyGo's related functionality for it aren't that well documented and I was having trouble using it when experimenting once. I wasn't sure it would have a real benefit at the time so stopped exploring it, but now seeing these cases, I think I'll try again but can't guarantee succeeding anytime soon if at all.

https://github.com/ivmai/bdwgc/blob/806537be2dec4f49056cb2fe091ac7f7d78728a8/include/gc/gc_typed.h#L54

@anuraaga
Copy link
Contributor

anuraaga commented Oct 13, 2023

BTW @M4tteoP would you be able to share a branch with the random file, and for which I'm guessing the step to repo would be just go run mage.go e2e?

I realized there is no step at all but having the file in the folder since we auto embed it, and go run mage.go ftw seems to trigger it. I'm able to repro fine.

@anuraaga
Copy link
Contributor

Luckily tinygo does have the docs that I was missing before this time around

https://github.com/tinygo-org/tinygo/blob/release/src/runtime/gc_precise.go#L24

I will look at adding more support for precise GC, but for now I made the trivial change to handle just empty objects, which means embedded byte arrays too 😎 Looks like it fixes it and probably improves performance/reduces memory usage in general a lot

https://github.com/wasilibs/nottinygc/pull/24/files

This was referenced Oct 13, 2023
@jcchavezs
Copy link
Member

Thanks @anuraaga, I was also wondering because at least in proxy-wasm embedding files as variables is common so we probably need give the visibility to the practice of freeing the variable after loading its data. Specifically talking about proxy wasm it should happen after we create the waf.

@anuraaga
Copy link
Contributor

anuraaga commented Oct 13, 2023

Hmm - I'm not sure if in practice it matters, embedded data is part of the executable and not part of anything that can be GC'd or released I think. On a normal OS, embedded data is just mmap'd in from the executable, so it will be paged in by the OS when needed, and if not used will be paged out automatically by the OS (not Go). I'm not familiar with how Wasm runtimes deal with the data section in the binary, but I guess it could be possible for them to do something similar, but I suspect more likely the memory's used and not releasable in any way.

@woehrl01
Copy link
Author

woehrl01 commented Oct 13, 2023

Awesome, thanks for the incredible quick fix.

The wasm plugin now loads correctly but spills the following errors:

2023-10-13T06:39:08.534901Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Repeated allocation of very large block (appr. size 92 KiB):
    May lead to memory leak and poor performance    thread=27
2023-10-13T06:39:08.540249Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Repeated allocation of very large block (appr. size 92 KiB):
    May lead to memory leak and poor performance    thread=29
2023-10-13T06:39:08.594505Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Repeated allocation of very large block (appr. size 68 KiB):
    May lead to memory leak and poor performance    thread=27
2023-10-13T06:39:08.611791Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Repeated allocation of very large block (appr. size 68 KiB):
    May lead to memory leak and poor performance    thread=29
2023-10-13T06:39:08.788888Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Repeated allocation of very large block (appr. size 92 KiB):
    May lead to memory leak and poor performance    thread=27
2023-10-13T06:39:08.804079Z    error    envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1154    wasm log: GC Warning: Repeated allocation of very large block (appr. size 92 KiB):
    May lead to memory leak and poor performance    thread=29

@M4tteoP @jcchavezs Would be interested in a PR which adds GeoIP support (using an empty file for the public) which could be easily extended by dropping the database file into the repo for self compilation?

@anuraaga
Copy link
Contributor

Thanks for the report about the log message, I have released a patch release to suppress the message by default as bdwgc recommends for release builds

https://github.com/ivmai/bdwgc/blob/242a3a7b6040c2680e009367e96a77b0e167619a/misc.c#L1961

There is information about the message itself here

https://github.com/ivmai/bdwgc/blob/master/docs/debugging.md#warning-messages-about-needing-to-allocate-blacklisted-blocks

I believe it shows up because due to some limitations integrating bdwgc with TinyGo, we can't be fully precise. A few messages on startup should generally not be an issue and I think bdwgc recommends suppressing the message to not worry about it if the app performs well enough in practice, and if it doesn't then we'll need to see if there's some way to get the more precise behavior in

@woehrl01
Copy link
Author

@anuraaga awesome, I just confirmed that the error is now gone. 👍🏼 Thank you 🙏🏼

@M4tteoP
Copy link
Member

M4tteoP commented Oct 13, 2023

Would be interested in a PR which adds GeoIP support (using an empty file for the public) which could be easily extended by dropping the database file into the repo for self compilation?

I think that would be great! Being something that requires a self-compilation I would make it opt-in, maybe with a dedicated build tag and some documented steps in the readme that could be something like adding your own GeoIP database and enabling the tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants