Skip to content

Commit

Permalink
zebra: reuse old nhe when only recursive nexthops changed, method2
Browse files Browse the repository at this point in the history
A general flush is done on the nhg depend of the protocol nexthop group.
Actually, the NHG should not be removed, if there are routes attached to
it. In the same time, it seems the route count does not propagate to
the nhg_depends.

The con of this method is that there is still ASAN, and by comparing
the refcount value of the old way (allocation), the count is less
than expectd, for nexthop group with route count only:

Allocation method in proto_add():

> 2024/10/14 10:57:24.915401 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:57:24.915510 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 1
> 2024/10/14 10:57:24.915513 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2012
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:57:24.915515 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:57:24.915516 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915517 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:57:24.915517 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2007
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:57:24.915519 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:57:24.915520 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:57:24.915521 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:57:24.915522 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Method using general flush, but keep old pointer:

> 2024/10/14 10:51:17.229799 ZEBRA: [VB8P9-5F2GE] zebra_nhg_proto_add: BEFORE NHE 71428576, (71428576[39/49/59]) cnt 2002
> 2024/10/14 10:51:17.229909 ZEBRA: [HCTBK-W37K2] zebra_nhg_proto_add: NHE 71428576, (71428576[49/59/65]) cnt 2002
> 2024/10/14 10:51:17.229912 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 49, (49[50]) cnt 2011
> 2024/10/14 10:51:17.229914 ZEBRA: [VP9H1-EV2BN] 	(71428573)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428574)
> 2024/10/14 10:51:17.229915 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229916 ZEBRA: [VP9H1-EV2BN] 	(71428578)
> 2024/10/14 10:51:17.229916 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 59, (59[60]) cnt 2006
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428575)
> 2024/10/14 10:51:17.229918 ZEBRA: [VP9H1-EV2BN] 	(71428576)
> 2024/10/14 10:51:17.229919 ZEBRA: [RM3ZQ-V7JN5] zebra_nhg_proto_add:            NHE 65, (65[42]) cnt 4
> 2024/10/14 10:51:17.229920 ZEBRA: [VP9H1-EV2BN] 	(71428571)
> 2024/10/14 10:51:17.229921 ZEBRA: [VP9H1-EV2BN] 	(71428576)

Resulting ASAN error when running bgp_nhg_zapi_notification, on the
test_bgp_ipv4_simulate_r5_machine_going_down() function:

> r1: zebra triggered an exception by AddressSanitizer
> AddressSanitizer error in topotest `test_bgp_nhg_zapi_scalability.py`, test `teardown_module`, router `r1`
>
> ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0000de580 at pc 0x558a7d98cd8e bp 0x7fff4915a6e0 sp 0x7fff4915a6d0
> READ of size 4 at 0x60e0000de580 thread T0
>     #0 0x558a7d98cd8d in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1858
>     FRRouting#1 0x558a7d98cfee in zebra_nhg_free_members zebra/zebra_nhg.c:1752
>     FRRouting#2 0x558a7d98cfee in zebra_nhg_free zebra/zebra_nhg.c:1772
>     FRRouting#3 0x558a7d9901ff in zebra_nhg_proto_add zebra/zebra_nhg.c:3861
>     FRRouting#4 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#5 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#6 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#7 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#8 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#9 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#10 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#11 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#12 0x7fa262829e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#13 0x558a7d87ab84 in _start (/usr/lib/frr/zebra+0x1acb84)
>
> 0x60e0000de580 is located 96 bytes inside of 160-byte region [0x60e0000de520,0x60e0000de5c0)
> freed by thread T0 here:
>     #0 0x7fa2632b4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
>     FRRouting#1 0x558a7d9908a1 in zebra_nhg_proto_add zebra/zebra_nhg.c:3854
>     FRRouting#2 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#3 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#4 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#5 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#6 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#7 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#8 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#9 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> previously allocated by thread T0 here:
>     #0 0x7fa2632b4a57 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     FRRouting#1 0x7fa262e3e98e in qcalloc lib/memory.c:106
>     FRRouting#2 0x558a7d98849e in zebra_nhg_alloc zebra/zebra_nhg.c:392
>     FRRouting#3 0x558a7d98849e in zebra_nhe_copy zebra/zebra_nhg.c:499
>     FRRouting#4 0x558a7d98881f in zebra_nhg_hash_alloc zebra/zebra_nhg.c:538
>     FRRouting#5 0x7fa262dfbf0d in hash_get lib/hash.c:147
>     FRRouting#6 0x558a7d98b2ea in zebra_nhe_find zebra/zebra_nhg.c:832
>     FRRouting#7 0x558a7d98b95f in zebra_nhg_find zebra/zebra_nhg.c:1014
>     FRRouting#8 0x558a7d98bdcd in zebra_nhg_find_nexthop zebra/zebra_nhg.c:1031
>     FRRouting#9 0x558a7d98a5e8 in depends_find_recursive zebra/zebra_nhg.c:1514
>     FRRouting#10 0x558a7d98a5e8 in depends_find zebra/zebra_nhg.c:1563
>     FRRouting#11 0x558a7d98a5e8 in depends_find_add zebra/zebra_nhg.c:1602
>     FRRouting#12 0x558a7d990378 in zebra_nhg_update_nhe zebra/zebra_nhg.c:3739
>     FRRouting#13 0x558a7d990378 in zebra_nhg_proto_add zebra/zebra_nhg.c:3822
>     FRRouting#14 0x558a7d9ba365 in process_subq_nhg zebra/zebra_rib.c:2738
>     FRRouting#15 0x558a7d9ba365 in process_subq zebra/zebra_rib.c:3344
>     FRRouting#16 0x558a7d9ba365 in meta_queue_process zebra/zebra_rib.c:3397
>     FRRouting#17 0x7fa262f16fef in work_queue_run lib/workqueue.c:282
>     FRRouting#18 0x7fa262ef863b in event_call lib/event.c:1996
>     FRRouting#19 0x7fa262e1e527 in frr_run lib/libfrr.c:1237
>     FRRouting#20 0x558a7d877c74 in main zebra/main.c:526
>     FRRouting#21 0x7fa262829d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>
> SUMMARY: AddressSanitizer: heap-use-after-free zebra/zebra_nhg.c:1858 in zebra_nhg_decrement_ref
> Shadow bytes around the buggy address:
>   0x0c1c80013c60: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013c70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013c80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
>   0x0c1c80013c90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
>   0x0c1c80013ca0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
> =>0x0c1c80013cb0:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
>   0x0c1c80013cc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
>   0x0c1c80013cd0: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
>   0x0c1c80013ce0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
>   0x0c1c80013cf0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
>   0x0c1c80013d00: 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:           00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:       fa
>   Freed heap region:       fd
>   Stack left redzone:      f1
>   Stack mid redzone:       f2
>   Stack right redzone:     f3
>   Stack after return:      f5
>   Stack use after scope:   f8
>   Global redzone:          f9
>   Global init order:       f6
>   Poisoned by user:        f7
>   Container overflow:      fc
>   Array cookie:            ac
>   Intra object redzone:    bb
>   ASan internal:           fe
>   Left alloca redzone:     ca
>   Right alloca redzone:    cb
>   Shadow gap:              cc
>

Signed-off-by: Philippe Guibert <[email protected]>
  • Loading branch information
pguibert6WIND committed Oct 14, 2024
1 parent f666edb commit 15c501f
Showing 1 changed file with 14 additions and 36 deletions.
50 changes: 14 additions & 36 deletions zebra/zebra_nhg.c
Original file line number Diff line number Diff line change
Expand Up @@ -3716,47 +3716,25 @@ static void zebra_nhg_update_nhe(struct nhg_hash_entry *nhe, struct nexthop_grou
new_nhg->nexthop = NULL;
nexthops_free(temp);

/* flag all the nhg_depend NHGs to remember which one to remove
* when the dependency refresh is done
*/
frr_each (nhg_connected_tree, &nhe->nhg_depends, rb_node_dep)
SET_FLAG(rb_node_dep->nhe->flags, NEXTHOP_GROUP_DEPEND_TO_DETACH);
/* Free all the things */
zebra_nhg_release_all_deps(nhe);

if (!zebra_nhg_depends_is_empty(nhe)) {
frr_each (nhg_connected_tree, &nhe->nhg_depends,
rb_node_dep)
zebra_nhg_decrement_ref(
rb_node_dep->nhe);
nhg_connected_tree_free(&nhe->nhg_depends);
}

/* instead of creating a new nhe, just update the dependencies
* from the new nexthops
*/
for (newhop = nhe->nhg.nexthop; newhop; newhop = newhop->next) {
if (IS_ZEBRA_DEBUG_NHG_DETAIL)
zlog_debug("%s: depends NH %pNHv %s", __func__, newhop,
CHECK_FLAG(newhop->flags, NEXTHOP_FLAG_RECURSIVE) ? "(R)" : "");

nhe_temp = depends_find_add(&nhe->nhg_depends, newhop, afi, nhe->type, false);
if (nhe_temp) {
if (CHECK_FLAG(nhe_temp->flags, NEXTHOP_GROUP_DEPEND_TO_DETACH))
UNSET_FLAG(nhe_temp->flags, NEXTHOP_GROUP_DEPEND_TO_DETACH);
else
/* a new dependency is found, which was not already present.
* update the refcnt
*/
zebra_nhg_increment_ref(nhe_temp);
}
}

frr_each_safe (nhg_connected_tree, &nhe->nhg_depends, rb_node_dep) {
if (CHECK_FLAG(rb_node_dep->nhe->flags, NEXTHOP_GROUP_DEPEND_TO_DETACH)) {
/* detach the useless dependency from the NHG
*/
UNSET_FLAG(rb_node_dep->nhe->flags, NEXTHOP_GROUP_DEPEND_TO_DETACH);
if (IS_ZEBRA_DEBUG_NHG_DETAIL)
zlog_debug("%s: NHE %u, removed dependency %u (%pNG)", __func__,
nhe->id, rb_node_dep->nhe->id, rb_node_dep->nhe);
nhe_temp = nhg_connected_tree_del_nhe(&nhe->nhg_depends, rb_node_dep->nhe);
if (nhe_temp) {
/* detach nhg_dependent */
nhg_connected_tree_del_nhe(&nhe_temp->nhg_dependents, nhe);
zebra_nhg_decrement_ref(nhe_temp);
}
}
nhe_temp = depends_find_add(&nhe->nhg_depends, newhop, afi,
nhe->type, false);
if (nhe_temp)
zebra_nhg_increment_ref(nhe_temp);
}

/* Attach dependent backpointers to singletons
Expand Down

0 comments on commit 15c501f

Please sign in to comment.