Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix L7NP enable logging wrong packet #6651

Merged
merged 1 commit into from
Sep 25, 2024
Merged

Conversation

qiyueyao
Copy link
Contributor

@qiyueyao qiyueyao commented Sep 4, 2024

Current logs by Suricata when enableLogging is set, logs the wrong packet of RST instead of the original TCP packet, and remains an existing bug in Suricata.

This solution modifies the design to remove enableLogging in L7, keep it in L4, and instead configure Suricata to log Packet info for all alert events.

Also add Packet test to E2E tests.

Decode the packet dtwWezuaHlOhfWpNgQAAAQgARQAAjbT0QABABm9cCgoBBAoKAQOJUgBQa2w1WZlax6yAGAH7nAIAAAEBCAorcsv8RSTwQkdFVCAvYWRtaW4vaW5kZXguaHRtbCBIVFRQLzEuMQ0KSG9zdDogMTAuMTAuMS4zDQpVc2VyLUFnZW50OiBjdXJsLzcuNzQuMA0KQWNjZXB0OiAqLyoNCg0K

-> 10.10.1.4 → 10.10.1.3 HTTP GET /admin/index.html HTTP/1.1

Fixes #6636.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could validate the logged packet in our e2e test?

"dest_port": 80,
"proto": "TCP",
"pkt_src": "wire/pcap",
"tenant_id": 2,
"pkt_src": "stream (flow timeout)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pkt src is stream (flow timeout), you are probably looking at the wrong message in the logs. There should be a message with src wire/pcap that shows up immediately, instead of after a flow timeout (which takes 30s to a minute IIRC)

pkg/agent/controller/networkpolicy/l7engine/reconciler.go Outdated Show resolved Hide resolved
@@ -178,7 +178,7 @@ func generateTenantRulesData(policyName string, protoKeywords map[string]sets.Se
// Refer to Suricata detect engine in codebase for detailed tag keyword configuration.
var tagKeyword string
if enableLogging {
tagKeyword = " tag: session, 30, seconds;"
tagKeyword = " tag:host;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the previous version was not working, it feels like session should also have included the request packet.

BTW, should we use something like tag:host,1,packets as we only need the first packet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some further investigation and found some problems:
Logging the wrong packet is a known bug/expected behavior documented https://redmine.openinfosecfoundation.org/issues/3480 and https://redmine.openinfosecfoundation.org/issues/2069. Basically the idea is that the log advanced past the first packet that triggered the event, so this will be the case for each session. If changed to host, the first alert stays the same, but the second alert would capture the initial packet as required.
Below is the complete log:

-----(trigger 1st time)
alert
10.10.1.3 → 10.10.1.4 TCP 80 → 47588 [RST, ACK] 
-----(trigger 2nd time)
10.10.1.4 → 10.10.1.3 TCP 55344 → 80 [SYN] SACK_PERM 
10.10.1.3 → 10.10.1.4 TCP 80 → 55344 [SYN, ACK] SACK_PERM 
10.10.1.4 → 10.10.1.3 TCP 55344 → 80 [ACK]
alert
10.10.1.4 → 10.10.1.3 HTTP GET /admin/index.html HTTP/1.1 
10.10.1.3 → 10.10.1.4 TCP 80 → 55344 [RST, ACK] 
alert: timeout
http: timeout

Given this fact, for the e2e it is possible to test it by triggering twice and look for the original packet. But I'm not sure of the necessity of enableLogging actually if this issue continues in Suricata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the first packet is dropped and we have to wait for the next request, then I would argue that this is not a correct implementation of the feature. It's a different request in a different TCP session (source port is different).
So it looks like it is not possible to capture the correct packet at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK if we use the existing Suricata log for this feature, it's not possible to capture that initial packet =(
I can remove the tagged-packet if capturing the initial packet is the only purpose 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean removing the feature completely?

Copy link
Contributor Author

@qiyueyao qiyueyao Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I modified the alert configuration, it was able to log the initial packet

{"timestamp":"2024-09-06T22:56:25.262709+0000","flow_id":561336788716570,"in_iface":"antrea-l7-tap0","event_type":"alert","vlan":[2],"src_ip":"10.10.1.4","src_port":46094,"dest_ip":"10.10.1.3","dest_port":80,"proto":"TCP","pkt_src":"wire/pcap","tenant_id":2,"alert":{"action":"blocked","gid":1,"signature_id":1,"rev":0,"signature":"Reject by AntreaNetworkPolicy:default/allow-privileged-url-to-admin-role","category":"","severity":3,"tenant_id":2},"app_proto":"http","direction":"to_server","flow":{"pkts_toserver":3,"pkts_toclient":1,"bytes_toserver":307,"bytes_toclient":78,"start":"2024-09-06T22:56:25.261768+0000","src_ip":"10.10.1.4","dest_ip":"10.10.1.3","src_port":46094,"dest_port":80},"packet":"dtwWezuaHlOhfWpNgQAAAggARQAAjd+4QABABkSYCgoBBAoKAQO0DgBQH6jA1rH7p7qAGAH7zPsAAAEBCAou2X6GSIuizUdFVCAvYWRtaW4vaW5kZXguaHRtbCBIVFRQLzEuMQ0KSG9zdDogMTAuMTAuMS4zDQpVc2VyLUFnZW50OiBjdXJsLzcuNzQuMA0KQWNjZXB0OiAqLyoNCg0K","packet_info":{"linktype":1}}

But it applies to all rules because it's in the Suricata configuration, so cannot be controlled by enableLogging. Does it sound good to remove enableLogging which relates to tagged-packet, but add the packet support in alert that applies to all rules?
Or we can configure whether we want packet to be logged additionally in alert upon start of Suricata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be a good idea to discuss this at the community meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if this is for the alert event type, it only applies to rules that reject traffic, right?
We don't need the packet dump for allowed traffic, as we already have the app-layer metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tested that it only applies to rules rejecting traffic ( packet will only be added to alert logged for reject, only http will be logged for allow )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do some perf testing using ab as @tnqn suggested, just to make sure that the number of rejected requests that we can handle stays roughly the same after enabling packet logging.

@qiyueyao
Copy link
Contributor Author

qiyueyao commented Sep 14, 2024

AB testing done:

With packet: yes for alert:

$ ab -c 10 -n 500 -r http://10.10.1.3/admin/index.html
Concurrency Level:      10
Time taken for tests:   3.151 seconds
Complete requests:      500
Failed requests:        1000
   (Connect: 0, Receive: 500, Length: 0, Exceptions: 500)
Total transferred:      0 bytes
HTML transferred:       0 bytes
Requests per second:    158.70 [#/sec] (mean)
Time per request:       63.011 [ms] (mean)
Time per request:       6.301 [ms] (mean, across all concurrent requests)
Transfer rate:          0.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    1   0.5      1       7
Processing:     4    5   1.4      5      23
Waiting:        0    0   0.0      0       0
Total:          4    6   1.6      6      26

With packet: no for alert:

$ ab -c 10 -n 500 -r http://10.10.1.3/admin/index.html
Concurrency Level:      10
Time taken for tests:   3.011 seconds
Complete requests:      500
Failed requests:        1000
   (Connect: 0, Receive: 500, Length: 0, Exceptions: 500)
Total transferred:      0 bytes
HTML transferred:       0 bytes
Requests per second:    166.08 [#/sec] (mean)
Time per request:       60.210 [ms] (mean)
Time per request:       6.021 [ms] (mean, across all concurrent requests)
Transfer rate:          0.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    1   0.6      1       8
Processing:     4    5   1.6      4      34
Waiting:        0    0   0.0      0       0
Total:          5    6   1.8      6      37

@antoninbas
Copy link
Contributor

antoninbas commented Sep 16, 2024

Thanks @qiyueyao. It seems that there is a noticeable impact, but nothing too significant (around 5%). Based on these results, I would lean towards enabling this by default, without introducing a new configuration parameter. But I'll wait until @tnqn chimes in, as he requested the benchmarking.

Out of curiosity, do you have a baseline (same testbed) for the same traffic but without any L7NP enforced?

Complete requests:      500
Failed requests:        1000

I am having trouble understanding these numbers. Shouldn't all requests fail (be rejected by Suricata)?

@qiyueyao
Copy link
Contributor Author

qiyueyao commented Sep 16, 2024

Thanks @qiyueyao. It seems that there is a noticeable impact, but nothing too significant (around 5%). Based on these results, I would lean towards enabling this by default, without introducing a new configuration parameter. But I'll wait until @tnqn chimes in, as he requested the benchmarking.

Out of curiosity, do you have a baseline (same testbed) for the same traffic but without any L7NP enforced?

Complete requests:      500
Failed requests:        1000

I am having trouble understanding these numbers. Shouldn't all requests fail (be rejected by Suricata)?

I only sent 500 requests for both cases, and kept the socket open on receive errors. Based on online sources, Receive means count of err_recv when status is not APR_SUCCESS, Exceptions means exceptions thrown.

Without specifying -r, the socket exits with just one request, output apr_socket_recv: Connection reset by peer (104).

Without the source code of Apache ab tool, my guess is that for each rejected packet, apr_socket_recv is counted towards Receive and Complete requests, reject is counted towards Exceptions, they in total sum as 1000 Failed requests.

@tnqn
Copy link
Member

tnqn commented Sep 20, 2024

Thanks @qiyueyao. It seems that there is a noticeable impact, but nothing too significant (around 5%). Based on these results, I would lean towards enabling this by default, without introducing a new configuration parameter. But I'll wait until @tnqn chimes in, as he requested the benchmarking.

5% difference seems fine, enabling it without a new configuration parameter sounds good to me. We can add a configuration when there is a real need.

test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Sep 20, 2024
Comment on lines 453 to 469
packetMatch := func(e, x *L7LogEntry) bool {
var expectedPacket *regexp.Regexp
var packetBytes string
if e.expectedPacketRegex != nil {
expectedPacket = e.expectedPacketRegex
packetBytes = string(x.PacketBytes)
} else if x.expectedPacketRegex != nil {
expectedPacket = x.expectedPacketRegex
packetBytes = string(e.PacketBytes)
} else {
return bytes.Equal(e.PacketBytes, x.PacketBytes)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make it more symmetric:

if e.expectedPacketRegex != nil {
    // return false if x.PacketBytes don't match the regex
}
if x.expectedPacketRegex != nil {
    // return false if e.PacketBytes don't match the regex
}
if e.expectedPacketRegex == nil && x.expectedPacketRegex == nil {
    // return false if the byte slices don't match
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, updated.

test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
test/e2e/l7networkpolicy_test.go Outdated Show resolved Hide resolved
Comment on lines 454 to 461
if e.expectedPacketRegex != nil {
return e.expectedPacketRegex.MatchString(string(x.PacketBytes))
}
if x.expectedPacketRegex != nil {
return x.expectedPacketRegex.MatchString(string(e.PacketBytes))
}
// Compare packet bytes if neither e nor x are expected log entry.
return bytes.Equal(e.PacketBytes, x.PacketBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the logic I described in my last review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I misunderstood, didn't consider there to be two expected log entries, but yeah it would be more comprehensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, is the only difference between the logics "check if both L7LogEntry contain expectedPacketRegex"? Then in that case, we should not compare any PacketBytes? (Am I missing something)

Comment on lines 455 to 470
if e.expectedPacketRegex != nil && !e.expectedPacketRegex.MatchString(string(x.PacketBytes)) ||
x.expectedPacketRegex != nil && !x.expectedPacketRegex.MatchString(string(e.PacketBytes)) ||
x.expectedPacketRegex == nil && e.expectedPacketRegex == nil && !bytes.Equal(e.PacketBytes, x.PacketBytes) {
return false
}
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use Match instead of MatchString and avoid the string conversion?

Yes this is the code I had in mind, but I find it more readable with multiple if statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I'm still a bit curious on whether we need to compare PacketBytes at all, if both L7LogEntry contain expectedPacketRegex. Because in current case it would fail, as "" PacketBytes would not match the regex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both entries contain expectedPacketRegex, then I think we only need to match PacketBytes against the expectedPacketRegex of the other entry.

Also in our current test code, I don't think this is an actual case (both entries containing expectedPacketRegex)? But regardless, the idea is to have a meaningful Match implementation here, and have Match(a,b) always be equivalent to Match(b,a).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining "have Match(a,b) always be equivalent to Match(b,a)"!

Current logs by Suricata when enableLogging is set, logs the
wrong packet of RST instead of the original TCP packet, and
remains an existing bug in Suricata.

This solution modifies the design to remove enableLogging in
L7, keep it in L4, and instead configure Suricata to log Packet
info for all alert events.

Fixes antrea-io#6636.

Signed-off-by: Qiyue Yao <[email protected]>
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

Let's not backport this

@antoninbas
Copy link
Contributor

/test-all

@antoninbas antoninbas merged commit 48ce631 into antrea-io:main Sep 25, 2024
53 of 58 checks passed
@qiyueyao qiyueyao deleted the fix-l7-log branch September 25, 2024 22:43
hangyan pushed a commit to hangyan/antrea that referenced this pull request Oct 29, 2024
Currently when enableLogging is enabled for an L7NP rule,
we enable "tagged" packet logging in the appropriate
Suricata reject rule. Unfortunately, this leads to the wrong
packet being logged: the RST packet generated by Suricata
instead of the original TCP packet containing the HTTP
request from the client.

This solution modifies the design to remove enableLogging in
L7, keep it in L4, and instead configure Suricata to log packet
info for all alert events. This is a global configuration, and no
longer a per-rule configuration. Experiments revealed that
enabling packet logging for all reject rules only had a minor
impact (5%) in performance. Therefore, logging is always
enabled, unconditionally.

Fixes antrea-io#6636.

Signed-off-by: Qiyue Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong packet is logged by Suricata for L7NP when enabling logging for rule
3 participants