-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathrift-applicability.txt
1904 lines (1344 loc) · 80.2 KB
/
rift-applicability.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
RIFT WG Yuehua. Wei, Ed.
Internet-Draft Zheng. Zhang
Intended status: Informational ZTE Corporation
Expires: 11 November 2021 Dmitry. Afanasiev
Yandex
P. Thubert
Cisco Systems
Tom. Verhaeg
Juniper Networks
Jaroslaw. Kowalczyk
Orange Polska
10 May 2021
RIFT Applicability
draft-ietf-rift-applicability-06
Abstract
This document discusses the properties, applicability and operational
considerations of RIFT in different network scenarios. It intends to
provide a rough guide how RIFT can be deployed to simplify routing
operations in Clos topologies and their variations.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 11 November 2021.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
Wei, et al. Expires 11 November 2021 [Page 1]
Internet-Draft RIFT Applicability Statement May 2021
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text
as described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Problem Statement of Routing in Modern IP Fabric Fat Tree
Networks . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 4
3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 4
3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 6
3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 7
3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 7
3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 7
3.2.4. Reachability of Internal Nodes in the Fabric . . . . 9
3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1. Data Center Topologies . . . . . . . . . . . . . . . 9
3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 11
3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 11
3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 11
3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 11
4. Operational Considerations . . . . . . . . . . . . . . . . . 13
4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 14
4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 14
4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 16
4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 17
4.5. Mis-cabling Examples . . . . . . . . . . . . . . . . . . 18
4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 20
4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 22
4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 23
4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 24
4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 26
4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 27
4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 27
4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 27
4.12. Internet Connectivity With Underlay . . . . . . . . . . . 28
4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 28
4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 28
4.13. Subnet Mismatch and Address Families . . . . . . . . . . 28
4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 29
4.15. IoT Applicability . . . . . . . . . . . . . . . . . . . . 30
4.16. Key Management . . . . . . . . . . . . . . . . . . . . . 30
Wei, et al. Expires 11 November 2021 [Page 2]
Internet-Draft RIFT Applicability Statement May 2021
5. Security Considerations . . . . . . . . . . . . . . . . . . . 31
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 31
7. Normative References . . . . . . . . . . . . . . . . . . . . 31
8. Informative References . . . . . . . . . . . . . . . . . . . 33
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33
1. Introduction
This document discusses the properties and applicability of "Routing
in Fat Trees" [RIFT] (RIFT) in different deployment scenarios and
highlights the operational simplicity of the technology compared to
traditional routing solutions. It also documents special
considerations when RIFT is used with or without overlays and/or
controllers, and how RIFT identifies topology mis-cablings and
reroutes around node and link failures.
2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks
Clos [CLOS] and fat tree [FATTREE] topologies have gained prominence
in today's networking, primarily as a result of the paradigm shift
towards a centralized data-center based architecture that deliver a
majority of computation and storage services.
Today's current routing protocols were geared towards a network with
an irregular topology with isotropic properties, and low degree of
connectivity. When applied to Fat Tree topologies:
* They tend to need extensive configuration or provisioning during
bring up and re-dimensioning.
* All nodes including spine and leaf nodes learn the entire network
topology and routing information, which is in fact, not needed on
the leaf nodes during normal operation.
* They flood significant amounts of duplicate link state information
between spine and leaf nodes during topology updates and
convergence events, requiring that additional CPU and link
bandwidth be consumed. This may impact the stability and
scalability of the fabric, make the fabric less reactive to
failures, and prevent the use of cheaper hardware at the lower
levels (i.e. spine and leaf nodes).
Wei, et al. Expires 11 November 2021 [Page 3]
Internet-Draft RIFT Applicability Statement May 2021
3. Applicability of RIFT to Clos IP Fabrics
Further content of this document assumes that the reader is familiar
with the terms and concepts used in OSPF [RFC2328] and IS-IS
[ISO10589-Second-Edition] link-state protocols. The sections of RIFT
[RIFT] outline the requirements of routing in IP fabrics and RIFT
protocol concepts.
3.1. Overview of RIFT
RIFT is a dynamic routing protocol that is tailored for use in Clos,
Fat-Tree, and other anisotropic topologies. A core property of RIFT
is that its operation is sensitive to the structure of the fabric -
it is anisotropic. RIFT acts as a link-state protocol when "pointing
north" - advertising southwards routes to northwards peer routers
(parents) through flooding and database synchronization- but operates
hop-by-hop like a distance-vector protocol when "pointing south" -
typically advertising a fabric default route directed towards the Top
of Fabric (ToF, aka superspine) to southwards peer routers (children)
-.
The fabric default is typically the default route, as described in
Section 3.2.3.8 "Southbound Default Route Origination" of RIFT
[RIFT]. The ToF nodes may alternatively originate more specific
prefixes (P') southbound instead of the default route. In such a
scenario, all addresses carried within the RIFT domain MUST be
contained within P', and it is possible for a leaf that acts as
gateway to the internet to advertise the default route instead.
RIFT floods flat link-state information northbound only so that each
level obtains the full topology of levels south of it. That
information is never flooded east-west or back south again. So a top
tier node has full set of prefixes from the Shortest Path First (SPF)
calculation.
In the southbound direction, the protocol operates like a "fully
summarizing, unidirectional" path-vector protocol or rather a
distance-vector with implicit split horizon. Routing information,
normally just the default route, propagates one hop south and is 're-
advertised' by nodes at next lower level.
Wei, et al. Expires 11 November 2021 [Page 4]
Internet-Draft RIFT Applicability Statement May 2021
+-----------+ +-----------+
| ToF | | ToF | LEVEL 2
+ +-----+--+--+ +-+--+------+
| | | | | | | | | ^
+ | | | +-------------------------+ |
Distance | +-------------------+ | | | | |
Vector | | | | | | | | +
South | | | | +--------+ | | | Link-State
+ | | | | | | | | Flooding
| | | +-------------+ | | | North
v | | | | | | | | +
+-+--+-+ +------+ +-------+ +--+--+-+ |
|SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1
+ ++----++ ++---+-+ +--+--+-+ ++----+-+ |
+ | | | | | | | | | ^ N
Distance | +-------+ | | +--------+ | | | E
Vector | | | | | | | | | +------>
South | +-------+ | | | +-------+ | | | |
+ | | | | | | | | | +
v ++--++ +-+-++ ++-+-+ +-+--++ +
|LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0
+----+ +----+ +----+ +-----+
Figure 1: RIFT overview
A spine node has only information necessary for its level, which is
all destinations south of the node based on SPF calculation, default
route, and potential disaggregated routes.
RIFT combines the advantage of both link-state and distance-vector:
* Fastest possible convergence
* Automatic detection of topology
* Minimal routes/info on Top-of-Rack (ToR) switches, aka leaf nodes
* High degree of ECMP
* Fast de-commissioning of nodes
* Maximum propagation speed with flexible prefixes in an update
So there are two types of link-state database which are "north
representation" North Topology Information Elements (N-TIEs) and
"south representation" South Topology Information Elements (S-TIEs).
The N-TIEs contain a link-state topology description of lower levels
and S-TIEs carry simply default routes for the lower levels.
Wei, et al. Expires 11 November 2021 [Page 5]
Internet-Draft RIFT Applicability Statement May 2021
RIFT also eliminates major disadvantages of link-state and distance-
vector with:
* Reduced and balanced flooding
* Automatic neighbor detection
To achieve this, RIFT builds on the art of IGPs, not only OSPF and
IS-IS but also MANET and IoT, to provide unique features:
* Automatic (positive or negative) route disaggregation of
northwards routes upon fallen leaves
* Recursive operation in the case of negative route disaggregation
* Anisotropic routing that extends a principle seen in RPL [RFC6550]
to wide superspines
* Optimal Flooding Reduction that derives from the concept of a
"multipoint relay" (MPR) found in OLSR [RFC3626] and balances the
flooding load over northbound links and nodes.
Additional advantages that are unique to RIFT are listed below, the
details of which can be found in RIFT [RIFT].
* True ZTP
* Minimal blast radius on failures
* Can utilize all Paths through fabric without looping
* Simple leaf implementation that can scale down to servers
* Key-Value store
* Horizontal links used for protection only
* Supports non-equal cost multipath (NECMP) and can replace multi-
chassis link aggregation group (MLAG or MC-LAG)
3.2. Applicable Topologies
Albeit RIFT is specified primarily for "proper" Clos or Fat Tree
topologies, the protocol natively supports Points of Delivery (PoD)
concepts, which, strictly speaking, are not found in the original
Clos concept.
Wei, et al. Expires 11 November 2021 [Page 6]
Internet-Draft RIFT Applicability Statement May 2021
Further, the specification explains and supports operations of multi-
plane Clos variants where the protocol recommends the use of inter-
plane rings at the Top-of-Fabric level to allow the reconciliation of
topology view of different planes to make the negative disaggregation
viable in case of failures within a plane. These observations hold
not only in case of RIFT but also in the generic case of dynamic
routing on Clos variants with multiple planes and failures in bi-
sectional bandwidth, especially on the leafs.
3.2.1. Horizontal Links
RIFT is not limited to pure Clos divided into PoD and multi-planes
but supports horizontal (East-West) links below the top of fabric
level. Those links are used only for last resort northbound routes
when a spine loses all its northbound links or cannot compute a
default route through them.
A possible configuration is a "ring" of horizontal links at a level.
In presence of such a "ring" in any level (except Top of Fabric (ToF)
level) neither North SPF (N-SPF) nor South SPF (S-SPF) will provide a
"ring-based protection" scheme since such a computation would have to
deal necessarily with breaking of "loops" in Dijkstra sense; an
application for which RIFT is not intended.
A full-mesh connectivity between nodes on the same level can be
employed and that allows N-SPF to provide for any node loosing all
its northbound adjacencies (as long as any of the other nodes in the
level are northbound connected) to still participate in northbound
forwarding.
3.2.2. Vertical Shortcuts
Through relaxations of the specified adjacency forming rules, RIFT
implementations can be extended to support vertical "shortcuts" as
proposed by e.g. [I-D.white-distoptflood]. The RIFT specification
itself does not provide the exact details since the resulting
solution suffers from either much larger blast radius with increased
flooding volumes or in case of maximum aggregation routing, bow-tie
problems.
3.2.3. Generalizing to any Directed Acyclic Graph
RIFT is an anisotropic routing protocol, meaning that it has a sense
of direction (northbound, southbound, east-west) and that it operates
differently depending on the direction.
Wei, et al. Expires 11 November 2021 [Page 7]
Internet-Draft RIFT Applicability Statement May 2021
* Northbound, RIFT operates as a link-state protocol, whereby the
control packets are reflooded first all the way north and only
interpreted later. All the individual fine grained routes are
advertised.
* Southbound, RIFT operates as a distance-vector protocol, whereby
the control packets are flooded only one-hop, interpreted, and the
consequence of that computation is what gets flooded one more hop
south. In the most common use-cases, a ToF node can reach most of
the prefixes in the fabric. If that is the case, the ToF node
advertises the fabric default and disaggregates the prefixes that
it cannot reach. On the other hand, a ToF node that can reach
only a small subset of the prefixes in the fabric will preferably
advertise those prefixes and refrain from aggregating.
In the general case, what gets advertised south is in more
details:
1. A fabric default that aggregates all the prefixes that are
reachable within the fabric, and that could be a default route
or a prefix that is dedicated to this particular fabric.
2. The loopback addresses of the northbound nodes, e.g., for
inband management.
3. The disaggregated prefixes for the dynamic exceptions to the
fabric default, advertised to route around the black hole that
may form.
* East-West routing can optionally be used, with specific
restrictions. It is used when a sibling has access to the fabric
default but this node does not.
A Directed Acyclic Graph (DAG) provides a sense of north (the
direction of the DAG) and of south (the reverse), which can be used
to apply RIFT. For the purpose of RIFT, an edge in the DAG that has
only incoming vertices is a ToF node.
There are a number of caveats though:
* The DAG structure must exist before RIFT starts, so there is a
need for a companion protocol to establish the logical DAG
structure.
Wei, et al. Expires 11 November 2021 [Page 8]
Internet-Draft RIFT Applicability Statement May 2021
* A generic DAG does not have a sense of east and west. The
operation specified for east-west links and the southbound
reflection between nodes are not applicable. Also ZTP will derive
a sense of depth that will eliminate some links. Variations of
ZTP could be derived to meet specific objectives, e.g., make it so
that most routers have at least 2 parents to reach the ToF.
* RIFT applies to any Destination-Oriented DAG (DODAG) where there's
only one ToF node and the problem of disaggregation does not
exist. In that case, RIFT operates very much like RPL [RFC6550],
but using Link State for southbound routes (downwards in RPL's
terms). For an arbitrary DAG with multiple destinations (ToFs)
the way disaggragation happens has to be considered.
* Positive disaggregation expects that most of the ToF nodes reach
most of the leaves, so disaggragation is the exception as opposed
to the rule. When this is no more true, is makes sense to turn
off disaggragation and route between the ToF nodes over a ring, a
full mesh, transit network, or a form of area zero. There again,
this operation is similar to RPL operating as a single DODAG with
a virtual root.
* In order to aggregate and disaggregate routes, RIFT requires that
all the ToF nodes share the full knowledge of the prefixes in the
fabric.
* This can be achieved with a ring as suggested by the RIFT main
specification, by some preconfiguration, or using a
synchronization with a common repository where all the active
prefixes are registered.
3.2.4. Reachability of Internal Nodes in the Fabric
RIFT does not require that nodes have reachable addresses in the
fabric, though it is clearly desirable for operational purposes.
Under normal operating conditions this can be easily achieved by
injecting the node's loopback address into North and South Prefix
TIEs or other implementation specific mechanisms.
Special considerations arise when a node loses all northbound
adjacencies, but is not at the top of the fabric. These are outside
the scope of this document and could be discussed in a separate
document.
3.3. Use Cases
3.3.1. Data Center Topologies
Wei, et al. Expires 11 November 2021 [Page 9]
Internet-Draft RIFT Applicability Statement May 2021
3.3.1.1. Data Center Fabrics
RIFT is suited for applying in data center (DC) IP fabrics underlay
routing, vast majority of which seem to be currently (and for the
foreseeable future) Clos architectures. It significantly simplifies
operation and deployment of such fabrics as described in Section 4
for environments compared to extensive proprietary provisioning and
operational solutions.
3.3.1.2. Adaptations to Other Proposed Data Center Topologies
. +-----+ +-----+
. | | | |
.+-+ S0 | | S1 |
.| ++---++ ++---++
.| | | | |
.| | +------------+ |
.| | | +------------+ |
.| | | | |
.| ++-+--+ +--+-++
.| | | | |
.| | A0 | | A1 |
.| +-+--++ ++---++
.| | | | |
.| | +------------+ |
.| | +-----------+ | |
.| | | | |
.| +-+-+-+ +--+-++
.+-+ | | |
. | L0 | | L1 |
. +-----+ +-----+
Figure 2: Level Shortcut
RIFT is not strictly limited to Clos topologies. The protocol only
requires a sense of "compass rose directionality" either achieved
through configuration or derivation of levels. So, conceptually,
shortcuts between levels could be included. Figure 2 depicts an
example of a shortcut between levels. In this example, sub-optimal
routing will occur when traffic is sent from L0 to L1 via S0's
default route and back down through A0 or A1. In order to ensure
that only default routes from A0 or A1 are used, all leaves would be
required to install each others routes.
While various technical and operational challenges may require the
use of such modifications, discussion of those topics are outside the
scope of this document.
Wei, et al. Expires 11 November 2021 [Page 10]
Internet-Draft RIFT Applicability Statement May 2021
3.3.2. Metro Fabrics
The demand for bandwidth is increasing steadily, driven primarily by
environments close to content producers (server farms connection via
DC fabrics) but in proximity to content consumers as well. Consumers
are often clustered in metro areas with their own network
architectures that can benefit from simplified, regular Clos
structures and hence from RIFT.
3.3.3. Building Cabling
Commercial edifices are often cabled in topologies that are either
Clos or its isomorphic equivalents. The Clos can grow rather high
with many floors. That presents a challenge for traditional routing
protocols (except BGP and by now largely phased-out PNNI) which do
not support an arbitrary number of levels which RIFT does naturally.
Moreover, due to the limited sizes of forwarding tables in network
elements of building cabling, the minimum FIB size RIFT maintains
under normal conditions is cost-effective in terms of hardware and
operational costs.
3.3.4. Internal Router Switching Fabrics
It is common in high-speed communications switching and routing
devices to use fabrics when a crossbar is not feasible due to cost,
head-of-line blocking or size trade-offs. Normally such fabrics are
not self-healing or rely on 1:/+1 protection schemes but it is
conceivable to use RIFT to operate Clos fabrics that can deal
effectively with interconnections or subsystem failures in such
module. RIFT is neither IP specific and hence any link addressing
connecting internal device subnets is conceivable.
3.3.5. CloudCO
The Cloud Central Office (CloudCO) is a new stage of telecom Central
Office. It takes the advantage of Software Defined Networking (SDN)
and Network Function Virtualization (NFV) in conjunction with general
purpose hardware to optimize current networks. The following figure
illustrates this architecture at a high level. It describes a single
instance or macro-node of cloud CO that provides a number of Value
Added Services (VAS), a Broadband Access Abstraction (BAA), and
virtualized nerwork services. An Access I/O module faces a Cloud CO
access node, and the Customer Premises Equipments (CPEs) behind it.
A Network I/O module is facing the core network. The two I/O modules
are interconnected by a leaf and spine fabric [TR-384].
Wei, et al. Expires 11 November 2021 [Page 11]
Internet-Draft RIFT Applicability Statement May 2021
+---------------------+ +----------------------+
| Spine | | Spine |
| Switch | | Switch |
+------+---+------+-+-+ +--+-+-+-+-----+-------+
| | | | | | | | | | | |
| | | | | +-------------------------------+ |
| | | | | | | | | | | |
| | | | +-------------------------+ | | |
| | | | | | | | | | | |
| | +----------------------+ | | | | | | | |
| | | | | | | | | | | |
| +---------------------------------+ | | | | | | |
| | | | | | | | | | | |
| | | +-----------------------------+ | | | | |
| | | | | | | | | | | |
| | | | | +--------------------+ | | | |
| | | | | | | | | | | |
+--+ +-+---+--+ +-+---+--+ +--+----+--+ +-+--+--+ +--+
|L | | Leaf | | Leaf | | Leaf | | Leaf | |L |
|S | | Switch | | Switch | | Switch | | Switch| |S |
++-+ +-+-+-+--+ +-+-+-+--+ +--+-+--+--+ ++-+--+-+ +-++
| | | | | | | | | | | | | |
| +-+-+-+--+ +-+-+-+--+ +--+-+--+--+ ++-+--+-+ |
| |Compute | |Compute | | Compute | |Compute| |
| |Node | |Node | | Node | |Node | |
| +--------+ +--------+ +----------+ +-------+ |
| || VAS5 || || vDHCP|| || vRouter|| ||VAS1 || |
| |--------| |--------| |----------| |-------| |
| |--------| |--------| |----------| |-------| |
| || VAS6 || || VAS3 || || v802.1x|| ||VAS2 || |
| |--------| |--------| |----------| |-------| |
| |--------| |--------| |----------| |-------| |
| || VAS7 || || VAS4 || || vIGMP || ||BAA || |
| |--------| |--------| |----------| |-------| |
| +--------+ +--------+ +----------+ +-------+ |
| |
++-----------+ +---------++
|Network I/O | |Access I/O|
+------------+ +----------+
Figure 3: An example of CloudCO architecture
The Spine-Leaf architecture deployed inside CloudCO meets the network
requirements of adaptable, agile, scalable and dynamic.
Wei, et al. Expires 11 November 2021 [Page 12]
Internet-Draft RIFT Applicability Statement May 2021
4. Operational Considerations
RIFT presents the opportunity for organizations building and
operating IP fabrics to simplify their operation and deployments
while achieving many desirable properties of a dynamic routing on
such a substrate:
* RIFT only floods routing information to the devices that
absolutely need it. RIFT design follows minimum blast radius and
minimum necessary epistemological scope philosophy which leads to
good scaling properties while delivering maximum reactiveness.
* RIFT allows for extensive Zero Touch Provisioning within the
protocol. In its most extreme version RIFT does not rely on any
specific addressing and for IP fabric can operate using IPv6 ND
[RFC4861] only.
* RIFT has provisions to detect common IP fabric mis-cabling
scenarios.
* RIFT negotiates automatically BFD per link allowing this way for
IP and micro-BFD [RFC7130] to replace Link Aggregation Groups
(LAGs) which do hide bandwidth imbalances in case of constituent
failures. Further automatic link validation techniques similar to
[RFC5357] could be supported as well.
* RIFT inherently solves many difficult problems associated with the
use of traditional routing topologies with dense meshes and high
degrees of ECMP by including automatic bandwidth balancing, flood
reduction and automatic disaggregation on failures while providing
maximum aggregation of prefixes in default scenarios.
* RIFT reduces FIB size towards the bottom of the IP fabric where
most nodes reside and allows with that for cheaper hardware on the
edges and introduction of modern IP fabric architectures that
encompass e.g. server multi-homing.
* RIFT provides valley-free routing and with that is loop free.
This allows the use of any such valley-free path in bi-sectional
fabric bandwidth between two destination irrespective of their
metrics which can be used to balance load on the fabric in
different ways.
* RIFT includes a key-value distribution mechanism which allows for
many future applications such as automatic provisioning of basic
overlay services or automatic key roll-overs over whole fabrics.
Wei, et al. Expires 11 November 2021 [Page 13]
Internet-Draft RIFT Applicability Statement May 2021
* RIFT is designed for minimum delay in case of prefix mobility on
the fabric. In conjunction with [RFC8505], RIFT can differentiate
anycast advertisements from mobility events and retain only the
most recent advertisement in the latter case.
* Many further operational and design points collected over many
years of routing protocol deployments have been incorporated in
RIFT such as fast flooding rates, protection of information
lifetimes and operationally easily recognizable remote ends of
links and node names.
4.1. South Reflection
South reflection is a mechanism that South Node TIEs are "reflected"
back up north to allow nodes in same level without East-west links to
"see" each other.
For example, Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs
from ToF21 to ToF22 separately. Respectively,
Spine111\Spine112\Spine121\Spine122 reflects Node S-TIEs from ToF22
to ToF21 separately. So ToF22 and ToF21 see each other's node
information as level 2 nodes.
In an equivalent fashion, as the result of the south reflection
between Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122,
Spine121 and Spine 122 knows each other at level 1.
4.2. Suboptimal Routing on Link Failures
Wei, et al. Expires 11 November 2021 [Page 14]
Internet-Draft RIFT Applicability Statement May 2021
+--------+ +--------+
| ToF21 | | ToF22 | LEVEL 2
++--+-+-++ ++-+--+-++
| | | | | | | +
| | | | | | | linkTS8
+-------------+ | +-+linkTS3+-+ | | | +-------------+
| | | | | | + |
| +----------------------------+ | linkTS7 |
| | | | + + + |
| | | +-------+linkTS4+------------+ |
| | | + + | | |
| | | +------------+--+ | |
| | | | | linkTS6 | |
+-+----+-+ +-----+--+ ++--------+ +-+----+-+
|Spine111| |Spine112| |Spine121 | |Spine122| LEVEL 1
+-+---+--+ +----+---+ +-+---+---+ +-+---+--+
| | | | | | | |
| +--------------+ | + ++XX+linkSL6+---+ +
| | | | linkSL5 | | linkSL8
| +------------+ | | + +---+linkSL7+-+ | +
| | | | | | | |
+-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+
|Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0
+-+-----+ ++------+ +-----+-+ +-+-----+
+ + + +
Prefix111 Prefix112 Prefix121 Prefix122
Figure 4: Suboptimal routing upon link failure use case
As shown in Figure 4, as the result of the south reflection between
Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and
Spine 122 knows each other at level 1.
Without disaggregation mechanism, when linkSL6 fails, the packet from
leaf121 to prefix122 will probably go up through linkSL5 to linkTS3
then go down through linkTS4 to linkSL8 to Leaf122 or go up through
linkSL5 to linkTS6 then go down through linkTS4 and linkSL8 to
Leaf122 based on pure default route. It's the case of suboptimal
routing or bow-tieing.
With disaggregation mechanism, when linkSL6 fails, Spine122 will
detect the failure according to the reflected node S-TIE from
Spine121. Based on the disaggregation algorithm provided by RIFT,
Spine122 will explicitly advertise prefix122 in Disaggregated Prefix
S-TIE PrefixesElement(prefix122, cost 1). The packet from leaf121 to
prefix122 will only be sent to linkSL7 following a longest-prefix
match to prefix 122 directly then go down through linkSL8 to Leaf122
.
Wei, et al. Expires 11 November 2021 [Page 15]
Internet-Draft RIFT Applicability Statement May 2021
4.3. Black-Holing on Link Failures
+--------+ +--------+
| ToF 21 | | ToF 22 | LEVEL 2
++-+--+-++ ++-+--+-++
| | | | | | | +
| | | | | | | linkTS8
+--------------+ | +-+linkTS3+X+ | | | +--------------+
linkTS1 | | | | | + |
+ +-----------------------------+ | linkTS7 |
| | + | + + + |
| | linkTS2 +-------+linkTS4+X+----------+ |
| + + + + | | |
| linkTS5 +-+ +------------+--+ | |
| + | | | linkTS6 | |
+-+----+-+ +-+----+-+ ++-------+ +-+-----++
|Spine111| |Spine112| |Spine121| |Spine122| LEVEL 1
+-+---+--+ ++----+--+ +-+---+--+ +-+---+--+
| | | | | | | |
+ +---------------+ | + +---+linkSL6+---+ +
linkSL1 | | | linkSL5 | | linkSL8
+ +--+linkSL3+--+ | | + +---+linkSL7+-+ | +
| | | | | | | |
+-+---+-+ +--+--+-+ +-+---+-+ +--+-+--+
|Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0
+-+-----+ ++------+ +-----+-+ +-+-----+
+ + + +
Prefix111 Prefix112 Prefix121 Prefix122
Figure 5: Black-holing upon link failure use case
This scenario illustrates a case when double link failure occurs and
with that black-holing can happen.
Without disaggregation mechanism, when linkTS3 and linkTS4 both fail,
the packet from leaf111 to prefix122 would suffer 50% black-holing
based on pure default route. The packet supposed to go up through
linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be
dropped. The packet supposed to go up through linkSL3 to linkTS2
then go down through linkTS3 or linkTS4 will be dropped as well.
It's the case of black-holing.
With disaggregation mechanism, when linkTS3 and linkTS4 both fail,
ToF22 will detect the failure according to the reflected node S-TIE
of ToF21 from Spine111\Spine112. Based on the disaggregation
algorithm provided by RITF, ToF22 will explicitly originate an S-TIE
with prefix 121 and prefix 122, that is flooded to spines 111, 112,
121 and 122.
Wei, et al. Expires 11 November 2021 [Page 16]
Internet-Draft RIFT Applicability Statement May 2021
The packet from leaf111 to prefix122 will not be routed to linkTS1 or
linkTS2. The packet from leaf111 to prefix122 will only be routed to
linkTS5 or linkTS7 following a longest-prefix match to prefix122.
4.4. Zero Touch Provisioning (ZTP)
RIFT is designed to require a very minimal configuration to simplify
its operation and avoid human errors; based on that minimal
information, Zero Touch Provisioning (ZTP) autoconfigures the key
operational parameters of all the RIFT nodes, that is, on the one
hand, the SystemID of the node that must be unique in the RIFT
network, and on the other hand the level of the node in the Fat Tree,
which determines which peers are northwards "parents" and which are
southwards "children".
ZTP is always on, but its decisions can be overridden when a network
administrator prefers to impose its own configuration. In that case,
it is the responsibility of the administrator to ensure that the
configured parameters are correct, in other words that the SystemID
of each node is unique, and that the administratively set levels
truly reflect the relative position of the nodes in the fabric. It
is recommended to let ZTP configure the network, and when not, it is
recommended to configure the level of all the nodes but those that
are forced as leaves to avoid an undesirable interaction between ZTP
and the manual configuration.
ZTP requires that the administrator points out the Top-of-Fabric
(ToF) nodes to set the baseline from which the fabric topology is
derived. The Top-of-Fabric nodes are configured with TOP_OF_FABRIC
flag which are initial 'seeds' needed for other ZTP nodes to derive
their level in the topology. ZTP computes the level of each node
based on the Highest Available Level (HAL) of the potential parent(s)
nearest that baseline, which represents the superspine. In a
fashion, RIFT can be seen a s a distance-vector protocol that
computes a set offeasible successors towards the superspine and auto-
configures the rest of the topology.
The autoconfiguration mechanism computes a global maximum of levels
by diffusion. The derivation of the level of each node happens then
based on Link Information Elements (LIEs) received from its neighbors
whereas each node (with possibly exceptions of configured leaves)
tries to attach at the highest possible point in the fabric. This
guarantees that even if the diffusion front reaches a node from
"below" faster than from "above", it will greedily abandon already
negotiated level derived from nodes topologically below it and
properly peer with nodes above.
Wei, et al. Expires 11 November 2021 [Page 17]
Internet-Draft RIFT Applicability Statement May 2021
The achieved equilibrium can be disturbed massively by all nodes with
highest level either leaving or entering the domain (with some finer
distinctions not explained further). It is therefore recommended
that each node is multi-homed towards nodes with respective HAL
offerings. Fortunately, this is the natural state of things for the
topology variants considered in RIFT.
A RIFT node may also be configured to confine it to the leaf role
with the LEAF_ONLY flag. A leaf node can also be configured to
support leaf-2-leaf procedures with the LEAF_2_LEAF flag. In either
case the node cannot be TOP_OF_FABRIC and its level cannot be
configured. RIFT will fully configure the node's level after it is
attached to the topology and ensure that the node is at the "bottom
of the hierarchy" (southernmost).
4.5. Mis-cabling Examples
+----------------+ +-----------------+
| ToF21 | +------+ ToF22 | LEVEL 2
+-------+----+---+ | +----+---+--------+
| | | | | | | | |
| | | +----------------------------+ |
| +---------------------------+ | | | |
| | | | | | | | |
| | | | +-----------------------+ | |
| | +------------------------+ | | |
| | | | | | | | |
+-+---+--+ +-+---+--+ | +--+---+-+ +--+---+-+
|Spine111| |Spine112| | |Spine121| |Spine122| LEVEL 1
+-+---+--+ ++----+--+ | +--+---+-+ +-+----+-+
| | | | | | | | |
| +---------+ | link-M | +---------+ |
| | | | | | | | |
| +-------+ | | | | +-------+ | |
| | | | | | | | |
+-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+
|Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0
+-------+ +-------+ +-------+ +-------+
Figure 6: A single plane mis-cabling example
Figure 6 shows a single plane mis-cabling example. It's a perfect
Fat Tree fabric except link-M connecting Leaf112 to ToF22.