-
Notifications
You must be signed in to change notification settings - Fork 3
/
scheduling.tex
206 lines (150 loc) · 13.9 KB
/
scheduling.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
\chapter{Scheduling}
Scheduling is a key tool in QoS and it is complex enough to deserve its own chapter\footnote{A word of warning is needed. In different books, articles and information sources, different names are used for the same scheduling techniques. To add to the confusion, a same technique name has different meanings when looking at different sources.}.
A scheduler serves two or more queues and has to take decissions about which queue has to be served first.
The combination of queues and scheduler can change the order of the packets.
For this reason it is important that all packets of a same flow are mapped to the same queue, to prevent packet re-ordering.
The QoS-aware queues are called software queues.
The software queues do not store the actual packets.
They just store pointers to the actual packets that are stored in memory.
The queue sizes are typically expressed in number of packets.
Software queues are placed before the scheduler.
The scheduler takes packets from the software queues and places them in the Tx-ring.
Considered as whole, the system of a software queues is not FIFO.
The packets do not necessarily leave the system in the same order that they entered.
However, each of the queues conforming the system, when considered on its own is FIFO.
The Tx-ring is placed in an area of memory that is directly accessible by the interface hardware.
The hardware can continuously send the packets found in the tx-ring without requiring the intervention of the CPU.
The tx-ring is FIFO and the packets leave the tx-ring in the same ordered that they entered.
For the scheduling policy to be effective, it is necessary that the tx-ring is relatively small, typically 2 or 3 packets.
Otherwise, the queue of packets is build into the tx-ring where the scheduler cannot affect the order in which the packets are sent.
\section{Strict Priority Queues}
Prioritizing is a core idea in QoS.
Imagine that a big packet belonging to backup tool such as Dropbox arrives to a router.
Right after that, a second packet arrives to the same roter.
Imagine that this second packet is a small VoIP packet.
The VoIP packet is in a hurry, because it must make it to the decoder before the playout time.
For this reason, the combination of queues and scheduler will allow the VoIP packet to advance the backup packet and be transmitted in the first place.
In the simplest case we have only two queues: the high priority queue and low priority queue.
A classifier maps each of the packets to one of the queues.
The scheduler will serve the high priority queue whenever there is a packet in that queue.
Only when the high priory queue is empty, the scheduler will serve the low priority queue.
Note that this concept can be easily generalized to the case in which there are more than two queues.
The principle is that a queue will be served only when all the higher priority queues are empty.
\subsection{Preemptive strict priority}
In preemptive strict priority, if a high priority packet arrives while a low priority packet is being served, the service of the low priority queue is interrupted and the high priority packet is served immediately.
Theoretically, the service of the low priority packet is resumed when the high priority queue becomes empty.
This approach is very beneficial for high priority packets as they are never disturbed by low priority packets.
The only problem with this approach is that, in practice, it is not trivial to stop a transmission of a packet and later resume it.
\subsection{Non-preemptive strict priority}
In non-preemptive strict priority, if a high priority packet arrives while a low priority packet is being served, the transmission is not interrupted.
The high priority packet will wait for the transmission of the low priority packet to finish and then it will be transmitted.
This is the approach that is used in practice in high speed transmission lines.
The problem with this solution is that, in a slow transmission line, a high priority packet might need to wait for a long time if it arrives while a long low priority packet starts being serviced.
\subsection{Fragmenting and interleaving}
An intermediate solution between the two mentioned before is fragmenting each packets in chunks and serving one chunk at a time.
Using this technique we can emulate a behaviour similar to preemptive strict priority.
As it has been explained in the previous chapter, fragmenting and interleaving are used in low speed lines.
In higher speed lines there is less benefit in interleaving, as the duration of a packet transmission is very short.
Furthermore, the process of interleaving consumes CPU and introduces some processing delay.
As the number of packets per second increases, these two negative aspects become more accentuated.
\subsection{Starvation and policing}
The problem of strict priority queueing is that the high priority queue can easily eat up all the bandwidth.
If there are always packets in the high priority queue, the other queues will never be serviced.
This might be an undesirable situation.
In the following we will review other queueing strategies that avoid this problem.
Nevertheless, priority queueing is still used for real-time traffic such as VoIP, as it minimizes the delay for high priority traffic.
The solution to prevent starvation, is two police the high priority queue.
The high priority traffic is policed before entering the queue to a fraction of the total bandwidth available in the interface (e.g., 10\% or 30\%).
By doing so, the other queues are safe from starvation.
Even if a virus, a worm, or a misconfigured device generates a low of high priority traffic, it will not kill the network.
Another reason for policing a queue with strict priority is to provide bounds on delays.
The maximum delay suffered by the packets is the depth of the token bucket divided by the line rate.
\section{Round Robin Scheduling and Weighted Round Robin}
One of the most direct ways of sharing the available bandwidth among different queues is to serve one packet of each queue at a time in order.
As an example, if there are three queues, a round robin (RR) scheduler will serve Q1, Q2 and Q3.
And then again Q1.
If one of the queues is empty, it is simply skipped.
This property is called the work-conserving property, and guarantees the full utilization of the link while there are packets waiting.
If we want to prioritize Q1 over Q2 and Q3 and Q2 over Q3, we can configure the scheduler to serve the queues in this order: Q1, Q2, Q1, Q3, Q1, Q2.
And then repeat the same schedule again.
Note that with this particular schedule, Q1 is served thrice in each round, Q2 is served twice and Q3 is served once.
This particular approach in which different queues have different weights is called Weighted Round Robin (WRR)
Round robin is flexible and precise when it comes to adjust the number of packets transmitted by each queue.
The actual bandwidth allocated to each queue depends on the schedule, but also on the length of the packets.
Recovering our example of three queues, if the packets in Q3 are 1200 bytes long and the packets of Q1 are 200 bytes long, Q3 will receive twice as much bandwidth as Q1.
If we want to allocate a fraction of the available bandwidth to each of the queues and we don't know the length of the packets in advance, it will be impossible to use WRR.
\section{General Processor Sharing}
General Processor Sharing is an idealized sharing approach that assumes a fluid model.
It assumes that the scheduler can serve an infinitesimal amount from each of the queues.
It is easy to imagine with the example of liquid reservoirs instead of queues, and pipes draining liquid from each of the reservoirs.
One pipe may be draining one liter per hour and the other two liter per hours.
Still, at any given point of time, both reservoirs are drained simultaneously, even though one is served twice as fast as the other.
In the practice of packet networks, it is not possible to implement this idealized model.
An approximative implementation is used and we explain it in the next section.
\section{Deficit Weighted Round Robin}
In deficit Weighted Round Robin (DWRR) each queue has an associated ``token bucket''.
It is called the ``deficit counter'', and it is measured in bytes.
The scheduler visits all the queues, one by one.
In every visit, the scheduler increases the deficit counter by a value which is called ``quantum''.
If the value of the deficit counter after being increased is larger than the size of the first packet in the queue, the packet is served.
The deficit counter is decreased by a value equal to the size of the packet that has been served.
If, after being decremented, the deficit counter is larger than the size of the following packet, another packet is served and the deficit counter is decremented accordingly.
This process is repeated until the deficit counter is smaller than the first packet of the queue or the queue is empty.
In the former case, the value of the deficit counter is saved for the next round.
In the latter case, the value of the deficit counter is reset.
At this point, the scheduler moves on to serve the next queue.
By adjusting the quantum of each queue, we can set the weight (in terms of bandwidth) of each of the queues.
As an example, if we set a quantum of 400 bytes for the high priority queue and a quantum of 200 bytes for the low priority queue, the high priority queue will receive twice as much bandwidth.
Let's look at an example.
The high priority queue contains a 500 bytes packet, and the low priority queue a 400 bytes packet.
Both deficit counters are zero.
In the first round, the deficit counters are incremented to 400 and 200, and no packet is served as the values of the counters are lower than the size of the packets.
In the second round, the counter of the high priority queue is incremented to 800.
The first packet is serviced and de counter is decremented to 300.
While the first packet is being serviced, a 100 bytes packet and a 500 bytes packet arrive to the high priority queue.
When the service of the first packet finishes, the deficit counter of the first queue (300 bytes) is larger than the size of the head-of-queue packet (100 bytes).
Consequently, the head-of-queue packet is served and the deficit counter is decremented to 200.
At this point, the counter is smaller than the size of the head-of-queue packet and therefore the scheduler jumps to the next queue.
It increases the counter of the second queue by 200, and the resultant value (400) is equal to the size of the head-of-queue packet of the second queue.
This packet is being served and the counter decremented.
The scheduler moves to the first queue again, it increases the counter by the quantum (400) to 600.
Then it serves the head-of-queue packet, which has 500 bytes, and decrements the deficit counter by 500.
\section{Actual Implementations}
Actual implementations are very different to the simple and structured material presented in this chapter.
The implementations that we find in practice combine many different elements such as classifiers, policers and schedulers in a single tool.
\subsection{Stochastic Fair Queueing}
Strictly speaking, this is not a scheduling policy.
It is more a classification approach.
However, as it is usually presented as a queue, it is worth mentioning here.
In stochastic fair queueing a large number of queues is used, e.g. 1024.
Then, for each packet, a hash from the IP addresses, layer 4 protocol and port numbers fields is derived.
This has is used to place the packet in one of the queues.
This ensures that all the packets of the same flow are in the same queue.
Finally, some kind of scheduling discipline (such as RR or DRR) is used to serve the queues.
This approach favors flows with less traffic and therefore benefits interactive applications such as SSH.
It is still possible that, by chance, two flows end up in the same queue.
To prevent that this detrimental situation affects the flows for a long time, a perturbation to the hash is periodically introduced to randomly compute new queues for all the flows.
\subsection{Some Cisco Queues}
\subsubsection{Cisco's Weighted Fair Queueing}
Cisco's weighted fair queueing is used in low speed interfaces of Cisco routers by default.
This scheduling algorithm is similar to the above mentioned fair queueing but it uses the concept of sequence number for its implementation.
It takes into account both the size of the packets and the Type of Service field of the IP header to compute a sequence number that is used to decide the order in which the packets will be transmitted.
\subsubsection{Cisco's Class-Based WFQ}
This scheduler considers different classes and it is possible to reserver a fraction of the total bandwidth for each class.
The bandwidth reservation can be either an absolute value or a percentage.
\subsubsection{Cisco's Low-latency Queueing}
This a strict priority tool that can be combined with CBWFQ to offer better delay and jitter guarantees to real time traffic.
The strict priority queue is policed to prevent the starvation of the other queues.
\subsection{Some Linux Queues}
\subsubsection{Linux's PFIFO\_FAST}
This is the default discipline.
It contains three queues that use strict priority and packets are placed in different queues according to their Type-of-Service field in the IP header.
We can see the queue length with the command
\texttt{cat /sys/class/net/eth0/tx\_queue\_len}
\subsubsection{Linux's Stochastic Fair Queueing}
It uses 1024 different queues.
The parameters include a \emph{limit} which is the size of each of the queues, a \emph{perturb} which determines how often the hashing process will be changed and a \emph{quantum}, which is the amount of bytes to be send from each queue in each round.
\subsection{Linux's Hierarchical Token Bucket}
It allows setting the bandwidth reserved for each class of traffic (\emph{rate}).
Also the maximum amount of bandwidth consumed by a class of traffic (\emph{ceil}).
It also offers the possibility of configuring a hierarchy to distribute the bandwidth that is unused by one of the traffic classes.