Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: more forgiving base64 transformation [custom implementation] #944

Merged
merged 10 commits into from
Jan 3, 2024

Conversation

M4tteoP
Copy link
Member

@M4tteoP M4tteoP commented Dec 18, 2023

Sibling of #940, providing a custom implementation of base64 decoding. The same tests are executed.

The implementation is a refactored version of #758 to allow missing padding and decode up to an illegal character.

We have to decide between:

  • Rely (hacky) on a the std library transformation (definitely more confident about the implementation)
  • Rely on and maintain our custom implementation (faster, less confident about the implementation, no unexpected changes not relying on undocumented std library functions behaviors)

Benchmarks:

PR: custom implementation (this PR)

goos: darwin
goarch: arm64
pkg: github.com/corazawaf/coraza/v3/internal/transformations
BenchmarkB64Decode/VGVzdENhc2U=-10         	27859436	        36.83 ns/op	      16 B/op	       1 allocs/op
BenchmarkB64Decode/VGVzdABDYXNl-10         	30617679	        38.87 ns/op	      16 B/op	       1 allocs/op
BenchmarkB64Decode/VGVzdENhc2U-10          	32812024	        36.56 ns/op	      16 B/op	       1 allocs/op
BenchmarkB64Decode/PA==-10                 	69833588	        16.98 ns/op	       4 B/op	       1 allocs/op
BenchmarkB64Decode/PFRFU1Q+-10             	42157344	        28.29 ns/op	       8 B/op	       1 allocs/op
BenchmarkB64Decode/PHNjcmlwd-10            	36910915	        33.60 ns/op	      16 B/op	       1 allocs/op
BenchmarkB64Decode/PFR_FU1Q+-10            	57075021	        22.50 ns/op	      16 B/op	       1 allocs/op
BenchmarkB64Decode/P.HNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==-10         	58432546	        19.89 ns/op	      48 B/op	       1 allocs/op
BenchmarkB64Decode/PHNjcmlwd.D5hbGVydCgxKTwvc2NyaXB0Pg==-10         	33036349	        35.46 ns/op	      48 B/op	       1 allocs/op
BenchmarkB64Decode/PHNjcmlwdD.5hbGVydCgxKTwvc2NyaXB0Pg==-10         	31625136	        37.47 ns/op	      48 B/op	       1 allocs/op
BenchmarkB64Decode/PFRFU1Q--10                                      	43652700	        27.65 ns/op	       8 B/op	       1 allocs/op
PASS
ok  	github.com/corazawaf/coraza/v3/internal/transformations	13.494s

PR: Std library (#940)

goos: darwin
goarch: arm64
pkg: github.com/corazawaf/coraza/v3/internal/transformations
BenchmarkB64Decode/VGVzdENhc2U=-10         	48350179	        26.08 ns/op	       8 B/op	       1 allocs/op
BenchmarkB64Decode/VGVzdABDYXNl-10         	41106975	        27.79 ns/op	      16 B/op	       1 allocs/op
BenchmarkB64Decode/VGVzdENhc2U-10          	49100253	        24.27 ns/op	       8 B/op	       1 allocs/op
BenchmarkB64Decode/PA==-10                 	57690343	        20.77 ns/op	       1 B/op	       1 allocs/op
BenchmarkB64Decode/PFRFU1Q+-10             	46958845	        24.56 ns/op	       8 B/op	       1 allocs/op
BenchmarkB64Decode/PHNjcmlwd-10            	23393598	        51.72 ns/op	      16 B/op	       2 allocs/op
BenchmarkB64Decode/PFR_FU1Q+-10            	27806850	        43.29 ns/op	       8 B/op	       2 allocs/op
BenchmarkB64Decode/P.HNjcmlwdD5hbGVydCgxKTwvc2NyaXB0Pg==-10         	20983162	        55.50 ns/op	      80 B/op	       2 allocs/op
BenchmarkB64Decode/PHNjcmlwd.D5hbGVydCgxKTwvc2NyaXB0Pg==-10         	15266341	        78.58 ns/op	      88 B/op	       3 allocs/op
BenchmarkB64Decode/PHNjcmlwdD.5hbGVydCgxKTwvc2NyaXB0Pg==-10         	16138278	        74.43 ns/op	      88 B/op	       3 allocs/op
BenchmarkB64Decode/PFRFU1Q--10                                      	24491835	        48.89 ns/op	      16 B/op	       2 allocs/op
PASS
ok  	github.com/corazawaf/coraza/v3/internal/transformations	14.896s

Tentatively closes #926, it should also fix 934131-5 and 934131-7 CRS 4.0.0-rc2 failing test (#899)

@M4tteoP M4tteoP requested a review from a team as a code owner December 18, 2023 22:52
Copy link

codecov bot commented Dec 18, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f1cfd13) 82.65% compared to head (8753b80) 82.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #944      +/-   ##
==========================================
+ Coverage   82.65%   82.71%   +0.06%     
==========================================
  Files         162      162              
  Lines        9028     9062      +34     
==========================================
+ Hits         7462     7496      +34     
  Misses       1317     1317              
  Partials      249      249              
Flag Coverage Δ
default 77.82% <95.00%> (+0.06%) ⬆️
examples 26.39% <0.00%> (-0.11%) ⬇️
ftw 47.33% <100.00%> (+0.21%) ⬆️
ftw-multiphase 49.51% <100.00%> (+0.20%) ⬆️
tinygo 75.38% <95.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

return stringsutil.WrapUnsafe(dec), true, nil

// Handle any remaining characters
if n == 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still make sense to executer these when we break above on illegal character?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early returning dst.String() when we break above on illegal character (so not when padding reached) leads to some failing tests:

--- FAIL: TestBase64Decode (0.00s)
    --- FAIL: TestBase64Decode/decoded_up_to_the_space_(invalid_character) (0.00s)
        /Users/matteopace/Repo/coraza/internal/transformations/base64decode_test.go:83: Expected "<T", but got ""
    --- FAIL: TestBase64Decode/decoded_up_to_the_dot_(invalid_character)#02 (0.00s)
        /Users/matteopace/Repo/coraza/internal/transformations/base64decode_test.go:83: Expected "<script", but got "<scrip"
    --- FAIL: TestBase64Decode/decoded_up_to_the_dash_(invalid_character_for_base64,_only_valid_for_Base64url) (0.00s)
        /Users/matteopace/Repo/coraza/internal/transformations/base64decode_test.go:83: Expected "<TEST", but got "<TE"

Trailing characters require to be rearranged even if that case, as if the end of the string was reached

@@ -31,7 +100,7 @@ func BenchmarkB64Decode(b *testing.B) {

func FuzzB64Decode(f *testing.F) {
for _, tc := range b64DecodeTests {
f.Add(tc)
f.Add(tc.input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the fuzz test was still set up for this forgiving version, that's nice


for ; srcc < slen; srcc++ {
// If invalid character or padding reached, we stop decoding
if src[srcc] == '=' || src[srcc] == ' ' || src[srcc] > 127 || base64DecMap[src[srcc]] == 127 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract variable for src[srcc]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also while it means having two conditionals that can break, I think it's worth extracting a variable for base64DecMap[src[srcc]] rather than do it twice

internal/transformations/base64decode.go Outdated Show resolved Hide resolved
Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better than the stdlib option

internal/transformations/base64decode.go Outdated Show resolved Hide resolved
@M4tteoP
Copy link
Member Author

M4tteoP commented Dec 20, 2023

Thanks @anuraaga for the review and all the guidance. I will wait a bit for any other feedback from others before merging this one and closing the other

Copy link
Member

@fzipi fzipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, this is ready to go.

@jcchavezs
Copy link
Member

jcchavezs commented Dec 27, 2023 via email

@M4tteoP M4tteoP merged commit b887a58 into corazawaf:main Jan 3, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

t:base64decode is too strict (padding required, no partial decoding)
4 participants