Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dhcpmon] Fix dhcpmon socket filter and tx count issue #13065

Merged
merged 4 commits into from
Jan 6, 2023

Conversation

kellyyeh
Copy link
Contributor

@kellyyeh kellyyeh commented Dec 15, 2022

Why I did it

  1. Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering #11201
    This issue exists for version 20201231.73 - 20201231.86 on single and dual tor

    • From the above PR, dhcpmon only listens for packets on active ToR and not standby ToR, but dhcp packet count is more than 3 times the count. The inbound filter listening on all interfaces causes it to pick up more than one copy of the packet, including Ethxx, lo, Bridge. Tx counter does not increase count accordingly due to filter set to inbound.
      Below is an example of dhcpmon counter without fix:
      -dhcp_relay#dhcpmon[58]: DHCPv4 [ Agg-Vlan2000-Snapshot rx/tx] Discover: 4/ 0, Offer: 4/ 0, Request: 4/ 0, ACK: 4/ 0
      In opposed to dhcpmon with correct counter:
      -dhcp_relay#dhcpmon[58]: DHCPv4 [ Agg-Vlan2000-Snapshot rx/tx] Discover: 1/ 4, Offer: 1/ 4, Request: 1/ 4, ACK: 1/ 4
  2. Improve code

How I did it

  1. On single ToR, packets received count was duplicated due to socket filter set to "inbound"
  2. Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
  3. Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
  4. Exit when socket fails to initialize to allow dhcp_relay docker to restart

How to verify it

Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

  • Correct number of rx increases
  • Tx does not increase when ToR is on standby

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Description for the changelog

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

  1. Improve code

How I did it

  1. On single ToR, packets received count was duplicated due to socket filter set to "inbound"
  2. Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
  3. Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
  4. Exit when socket fails to initialize to allow dhcp_relay docker to restart

How to verify it

Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

  • Correct number of tx increases
  • Tx does not increase when ToR is on standby

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Description for the changelog

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

while ((offset < (dhcp_option_sz + 1)) && dhcp_option[offset] != 255) {
switch (dhcp_option[offset])
{
case 53:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use enum for option53, like:
enum {
OptionDHCPMessageType = 53,
}
If we only care about option53, maybe we don;t need switch case here, just use if and remove temp variable stop_dhcp_processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@jcaiMR
Copy link
Contributor

jcaiMR commented Dec 15, 2022

I think current diff still has duplicate input packets issue,
Each time we call dhcp_devman_add_intf( ) will have a rx socket monitor on all interfaces, kernel will duplicate packets to all matched sockets.

Normally, for socket listen on all interfaces, we just need one instance and give this socket a big enough rcvd buffer size.
For TX socket as it bind to particular interface, so each interface has a socket would be fine.

And if we use one socket for all input interface binding, better to use batch process for better performance, that is one read ready event will trigger a batch of recvfrom operations.

@kellyyeh
Copy link
Contributor Author

Each dhcpmon process will only monitor one vlan

command=/usr/sbin/dhcpmon -id {{ vlan_name }}

@kellyyeh kellyyeh marked this pull request as ready for review December 22, 2022 01:30
@kellyyeh kellyyeh requested a review from jcaiMR January 3, 2023 17:48
@@ -252,62 +308,28 @@ static void read_callback_dual_tor(int fd, short event, void *arg)
while ((event == EV_READ) &&
Copy link
Contributor

@jcaiMR jcaiMR Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

*
* @param fd socket to read from
* @param event libevent triggered event
* @param arg user provided argument for callback (interface context)
*
* @return none
*/
static void read_callback(int fd, short event, void *arg)
static void read_tx_callback(int fd, short event, void *arg)
{
dhcp_device_context_t *context = (dhcp_device_context_t*) arg;
ssize_t buffer_sz;

while ((event == EV_READ) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here "event == EV_READ" will always true as we register this handler for EV_READ, we don't need to check it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event could be EV_PERIST or EV_READ

Copy link
Contributor

@jcaiMR jcaiMR Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be accurate my first comments should be event & EV_READ will always true. But in logic, EV_PERIST controls the event is persistent, it makes you don't need to use event_add() again after callback. So when callback is called it must be a READ event, as you don't register read_tx_callback to any WRITE/DELETE events.
So we don't need this check here, unless sometime you defined a unique function for all READ/WRITE/DELETE events, then in that function we may need this event value check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I see. Removed event check

tx_addr.sll_ifindex = if_nametoindex(intf);
tx_addr.sll_family = AF_PACKET;
tx_addr.sll_protocol = htons(ETH_P_ALL);
if (bind(context->tx_sock, (struct sockaddr *) &tx_addr, sizeof(tx_addr))) {
syslog(LOG_ALERT, "bind: failed to bind to interface '%s' with '%s'\n", intf, strerror(errno));
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we failed to bind socket, we may just call exit() to let container restart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

if (context->sock < 0) {
context->rx_sock = socket(AF_PACKET, SOCK_RAW | SOCK_NONBLOCK, htons(ETH_P_ALL));
context->tx_sock = socket(AF_PACKET, SOCK_RAW | SOCK_NONBLOCK, htons(ETH_P_ALL));
if (context->rx_sock < 0 || context->tx_sock < 0) {
syslog(LOG_ALERT, "socket: failed to open socket with '%s'\n", strerror(errno));
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if socket failed to create, we can just exit program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -666,21 +700,27 @@ int dhcp_device_start_capture(dhcp_device_context_t *context,
}
context->snaplen = snaplen;

if (setsockopt(context->sock, SOL_SOCKET, SO_ATTACH_FILTER, &dhcp_sock_bfp, sizeof(dhcp_sock_bfp)) != 0) {
if (setsockopt(context->rx_sock, SOL_SOCKET, SO_ATTACH_FILTER, &dhcp_inbound_sock_bfp, sizeof(dhcp_inbound_sock_bfp)) != 0) {
syslog(LOG_ALERT, "setsockopt: failed to attach filter with '%s'\n", strerror(errno));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same like above, maybe we just exit program if any error happened in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated for the function

@kellyyeh kellyyeh requested a review from jcaiMR January 5, 2023 02:58
@yxieca yxieca merged commit 2c410b4 into sonic-net:master Jan 6, 2023
@kellyyeh kellyyeh deleted the dhcpmon-fix branch January 6, 2023 22:43
@qiluo-msft
Copy link
Collaborator

What issue is fixed by this PR? could you explain in PR description?

@yxieca
Copy link
Contributor

yxieca commented Jan 12, 2023

@kellyyeh is there a particular reason you didn't request to backport to 202211 branch?

@mssonicbld
Copy link
Collaborator

@kellyyeh PR conflicts with 202205 branch

@yxieca
Copy link
Contributor

yxieca commented Jan 12, 2023

@kellyyeh please create PR for 202205 branch for identify missing dependencies

@kellyyeh
Copy link
Contributor Author

@yxieca PR #11201 is not cherry-picked to 202211 yet. Just added label for both PRs

@qiluo-msft
Copy link
Collaborator

This commit could not be cleanly cherry-picked to 202012. Please submit another PR.

@yxieca
Copy link
Contributor

yxieca commented Jan 19, 2023

@kellyyeh this change cannot be cherry-picked cleanly to 20205 branch. can you raise separate PR?

kellyyeh added a commit to kellyyeh/sonic-buildimage that referenced this pull request Jan 19, 2023
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering sonic-net#11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

Correct number of tx increases
Tx does not increase when ToR is on standby
kellyyeh added a commit to kellyyeh/sonic-buildimage that referenced this pull request Jan 19, 2023
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering sonic-net#11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

Correct number of tx increases
Tx does not increase when ToR is on standby
yxieca pushed a commit that referenced this pull request Jan 23, 2023
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering #11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

Correct number of tx increases
Tx does not increase when ToR is on standby
yxieca pushed a commit that referenced this pull request Jan 30, 2023
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering #11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

Correct number of tx increases
Tx does not increase when ToR is on standby
@liuh-80
Copy link
Contributor

liuh-80 commented Feb 2, 2023

202012 branch PR merged: #13441

mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request Feb 3, 2023
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering sonic-net#11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

Correct number of tx increases
Tx does not increase when ToR is on standby
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202211: #13638

mssonicbld pushed a commit that referenced this pull request Feb 4, 2023
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering #11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct

Correct number of tx increases
Tx does not increase when ToR is on standby
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants