-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APPSEC-10967] ASM parse response body #3153
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3153 +/- ##
=======================================
Coverage 98.16% 98.16%
=======================================
Files 1283 1283
Lines 73915 74000 +85
Branches 3425 3433 +8
=======================================
+ Hits 72559 72645 +86
+ Misses 1356 1355 -1
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
22106fc
to
9ed1001
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall ok, but I'm concerned about the fuzziness around content type conditions.
option :parse_response_body do |o| | ||
o.type :bool | ||
o.default true | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good move! I was going to suggest doing that but you outran me!
if result.timeout | ||
Datadog.logger.debug do | ||
"Unable to parse response body because of unsupported body type: #{body.class}" | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that doesn't seem like the intended log message there.
end | ||
return unless supported_response_type | ||
|
||
body_dup = body.dup # avoid interating over the body. This is just in case code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Array
this is not needed, so is that for Rack::BodyProxy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Since Rack::BodyProxy
acts as a proxy. If we call each
directly, we might iterate over the body and consume it.
https://github.com/rack/rack/blob/main/lib/rack/body_proxy.rb#L45-L58
If we call to to_ary
we also close the body, and we do not want that, since we might not be the last middleware. That is why I thought using dup
would save us here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then you're duping the Rack::BodyProxy
instance, which would still point to the same underlying wrapped object, which we don't know the type of, and may not be traversable twice.
end | ||
|
||
def json? | ||
headers['content-type'].include?('json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather be strict and explicitly list supported content types. I think those would be:
application/json
(the official one registered at IANA)text/json
because someone from the team added it? I don't think I've seen it in the (Ruby) wild, ever.
end | ||
|
||
def text? | ||
headers['content-type'].include?('text') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather be strict and explicitly list supported content types.
Indeed as is this would match at least text/plain
and text/html
but is there any point of passing those raw to libddwaf in the non-raw address that aims to carry parsed, structured data?
Is it for text/xml
, which according to RFC 3023 says:
If an XML document -- that is, the unprocessed, source XML document
-- is readable by casual users, text/xml is preferable to
application/xml. MIME user agents (and web user agents) that do not
have explicit support for text/xml will treat it as text/plain, for
example, by displaying the XML MIME entity as plain text.
Application/xml is preferable when the XML MIME entity is unreadable
by casual users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, since we do not parse, there is no schema information to extract from it
I will remove the entire support for text/*
content-type
return unless all_body_parts_are_string | ||
|
||
if json? | ||
JSON.parse(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That json?
tests that folks have been advertising the body as being JSON, but if content is broken JSON it'd blow up with a JSON::ParserError
.
We should guard against that and return
.
if json? | ||
JSON.parse(result) | ||
else | ||
result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it's not really parsed, is it? Either it's raw (server.response.body.raw
) or it's parsed (server.response.body
).
Since the goal is to parse it - notably to perform schema extraction - it seems to me there's not much point doing that.
return unless Datadog.configuration.appsec.parse_response_body | ||
|
||
unless body.instance_of?(Array) || body.instance_of?(::Rack::BodyProxy) | ||
if result.timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how that came to be there :weird:
62d7be1
to
0a2d379
Compare
bcd6e90
to
2a36a14
Compare
2a36a14
to
72ace59
Compare
What does this PR do?
Parsed the response body and passed it to the waf as
server.response.body
.We only parse bodies that are either an
Array
or aRack::BodyProxy
. Since we might not be the last middleware in the customer application, we can not consume the response body directly by callingeach
. To circumvent that, we make a copy of the body.Parsing the response body could lead to performance implications for our customers. Since we have yet to learn how this would impact our customers, I added a configuration entry for them to skip the response body parsing altogether.
This documentation would remain undocumented, and we would only mention it to customers if they experience any performance degradation.
Motivation:
Additional Notes:
How to test the change?
For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance
.Unsure? Have a question? Request a review!