Skip to content

Commit

Permalink
net/ipv6: release expired exception dst cached in socket
Browse files Browse the repository at this point in the history
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
  resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
  start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
  before TCP executes ip6_negative_advice() for the expired exception dst

When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.

As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain e5f80fc ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2

Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was
92f1655 ("net: fix __dst_negative_advice() race"). But 92f1655
merely refactored the code with regards to the dst refcount so the issue
was present even before 92f1655. The bug was introduced in
54c1a85 ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.

The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst.

Fixes: 92f1655 ("net: fix __dst_negative_advice() race")
Fixes: 54c1a85 ("ipv6: Don't drop cache route entry unless timer actually expired.")
Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u
Signed-off-by: Jiri Wiesner <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/20241128085950.GA4505@incl
Signed-off-by: Jakub Kicinski <[email protected]>
  • Loading branch information
Jiri Wiesner authored and kuba-moo committed Dec 3, 2024
1 parent ccb989e commit 3301ab7
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions net/ipv6/route.c
Original file line number Diff line number Diff line change
Expand Up @@ -2780,10 +2780,10 @@ static void ip6_negative_advice(struct sock *sk,
if (rt->rt6i_flags & RTF_CACHE) {
rcu_read_lock();
if (rt6_check_expired(rt)) {
/* counteract the dst_release() in sk_dst_reset() */
dst_hold(dst);
/* rt/dst can not be destroyed yet,
* because of rcu_read_lock()
*/
sk_dst_reset(sk);

rt6_remove_exception_rt(rt);
}
rcu_read_unlock();
Expand Down

0 comments on commit 3301ab7

Please sign in to comment.