Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APPSEC-10967] ASM parse response body #3153

Merged
merged 9 commits into from
Oct 23, 2023

Conversation

GustavoCaso
Copy link
Member

@GustavoCaso GustavoCaso commented Sep 25, 2023

What does this PR do?

Parsed the response body and passed it to the waf as server.response.body.

We only parse bodies that are either an Array or a Rack::BodyProxy. Since we might not be the last middleware in the customer application, we can not consume the response body directly by calling each. To circumvent that, we make a copy of the body.

Parsing the response body could lead to performance implications for our customers. Since we have yet to learn how this would impact our customers, I added a configuration entry for them to skip the response body parsing altogether.

This documentation would remain undocumented, and we would only mention it to customers if they experience any performance degradation.

Motivation:

Additional Notes:

How to test the change?

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@github-actions github-actions bot added appsec Application Security monitoring product integrations Involves tracing integrations labels Sep 25, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #3153 (22106fc) into master (1511f99) will increase coverage by 0.00%.
Report is 44 commits behind head on master.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #3153   +/-   ##
=======================================
  Coverage   98.16%   98.16%           
=======================================
  Files        1283     1283           
  Lines       73915    74000   +85     
  Branches     3425     3433    +8     
=======================================
+ Hits        72559    72645   +86     
+ Misses       1356     1355    -1     
Files Changed Coverage Δ
...ib/datadog/appsec/contrib/rack/gateway/response.rb 100.00% <100.00%> (ø)
...b/datadog/appsec/contrib/rack/reactive/response.rb 100.00% <100.00%> (ø)
...tadog/appsec/contrib/rack/gateway/response_spec.rb 100.00% <100.00%> (ø)
...adog/appsec/contrib/rack/reactive/response_spec.rb 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@GustavoCaso GustavoCaso marked this pull request as ready for review September 25, 2023 13:31
@GustavoCaso GustavoCaso requested a review from a team as a code owner September 25, 2023 13:31
@GustavoCaso GustavoCaso changed the title Asm response body waf address [APPSEC-10967] ASM parse response body Sep 25, 2023
Copy link
Member

@lloeki lloeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall ok, but I'm concerned about the fuzziness around content type conditions.

Comment on lines 192 to 196
option :parse_response_body do |o|
o.type :bool
o.default true
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good move! I was going to suggest doing that but you outran me!

Comment on lines 26 to 28
if result.timeout
Datadog.logger.debug do
"Unable to parse response body because of unsupported body type: #{body.class}"
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that doesn't seem like the intended log message there.

end
return unless supported_response_type

body_dup = body.dup # avoid interating over the body. This is just in case code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Array this is not needed, so is that for Rack::BodyProxy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Since Rack::BodyProxy acts as a proxy. If we call each directly, we might iterate over the body and consume it.

https://github.com/rack/rack/blob/main/lib/rack/body_proxy.rb#L45-L58

If we call to to_ary we also close the body, and we do not want that, since we might not be the last middleware. That is why I thought using dup would save us here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then you're duping the Rack::BodyProxy instance, which would still point to the same underlying wrapped object, which we don't know the type of, and may not be traversable twice.

end

def json?
headers['content-type'].include?('json')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be strict and explicitly list supported content types. I think those would be:

  • application/json (the official one registered at IANA)
  • text/json because someone from the team added it? I don't think I've seen it in the (Ruby) wild, ever.

end

def text?
headers['content-type'].include?('text')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather be strict and explicitly list supported content types.

Indeed as is this would match at least text/plain and text/html but is there any point of passing those raw to libddwaf in the non-raw address that aims to carry parsed, structured data?

Is it for text/xml, which according to RFC 3023 says:

If an XML document -- that is, the unprocessed, source XML document
-- is readable by casual users, text/xml is preferable to
application/xml. MIME user agents (and web user agents) that do not
have explicit support for text/xml will treat it as text/plain, for
example, by displaying the XML MIME entity as plain text.
Application/xml is preferable when the XML MIME entity is unreadable
by casual users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, since we do not parse, there is no schema information to extract from it

I will remove the entire support for text/* content-type

return unless all_body_parts_are_string

if json?
JSON.parse(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That json? tests that folks have been advertising the body as being JSON, but if content is broken JSON it'd blow up with a JSON::ParserError.

We should guard against that and return.

if json?
JSON.parse(result)
else
result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it's not really parsed, is it? Either it's raw (server.response.body.raw) or it's parsed (server.response.body).

Since the goal is to parse it - notably to perform schema extraction - it seems to me there's not much point doing that.

return unless Datadog.configuration.appsec.parse_response_body

unless body.instance_of?(Array) || body.instance_of?(::Rack::BodyProxy)
if result.timeout
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how that came to be there :weird:

@GustavoCaso GustavoCaso force-pushed the asm-response-body-waf-address branch 2 times, most recently from 62d7be1 to 0a2d379 Compare September 27, 2023 13:26
@GustavoCaso GustavoCaso force-pushed the asm-response-body-waf-address branch 2 times, most recently from bcd6e90 to 2a36a14 Compare October 23, 2023 10:31
@GustavoCaso GustavoCaso merged commit f725c43 into master Oct 23, 2023
216 of 217 checks passed
@GustavoCaso GustavoCaso deleted the asm-response-body-waf-address branch October 23, 2023 12:39
@marcotc marcotc added this to the 1.16.0 milestone Oct 25, 2023
@GustavoCaso GustavoCaso restored the asm-response-body-waf-address branch November 10, 2023 14:22
GustavoCaso added a commit that referenced this pull request Nov 10, 2023
GustavoCaso added a commit that referenced this pull request Nov 10, 2023
ekump pushed a commit that referenced this pull request Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants