BATS Protocol is a network communication protocol based on BATS codes, designed to provide high throughput and low latency data transmission in lossy networks. In this project, we evaluate and compare the performance of BATS Protocol with other techniques/protocols in terms of various performance metrics such as throughput, latency and reliability.
To see performance test results directly, please jump to the section BTP/BRTP throughput performance, BRTP Latency performance.
Also, performance test results can be found in the https://n-hop.github.io/bats-documentation.
For further contact, please visit n-hop technologies Limited. Or email to [email protected].
- To demonstrate the benefits/gain of BATS Protocol in different network scenarios.
- BATS Protocol Performance Test
The BATS Protocol performance evaluation focuses on the capability of the communication logic, such as coding (including feedback), congestion control, routing, etc. Therefore, in the testbed, we try to reduce the effect of the implementation by limiting the network link bandwidth. The performance compared with other techniques is relative, and the gain should be scalable with the network bandwidth.
This is NOT a software performance test. In other words, we do not test how fast the BATS Protocol software can run a device. Tough the testbed considers various practical scenarios, we do not test performance for specific applications, like video streaming.
Three kinds of network topologies will be mainly used in the tests:
- one-hop network
- multi-hop networks
- multi-path networks (in progress)
In this section, we will introduce the network tools used in the test framework.
-
Iperf3 is an open-source tool used for measuring network performance. It provides a straightforward way to assess the bandwidth, latency, and other parameters of a network link. iPerf operates in client-server mode, where one system acts as the server and another as the client. The client sends a controlled amount of data to the server, and the tool calculates metrics like throughput, packet loss, and jitter. iPerf is commonly used to measure TCP and UDP performance on networks.
-
Bperf is a specialized network performance testing tool developed by n-hop technologies Limited. It is designed to measure and assess the performance of the BATS protocol.
-
Linux Traffic Control is a powerful framework built into the Linux kernel that offers advanced traffic management and Quality of Service (QoS) capabilities. It allows administrators to shape and control network traffic by configuring various parameters like bandwidth, latency, packet scheduling, and prioritization. With tc, you can implement policies that prioritize certain types of traffic, limit bandwidth for specific applications, and manage congestion to ensure fair allocation of network resources.
-
Customized application is used to measure the end-to-end latency by sending messages of any fixed length at a given rate, and the source code is in the file
src/pvp_game_endpoint.cc
. For HOWTO, please refer to pvp_game_endpoint.
- Link bandwidth:
- The bandwidth of the link is the maximum rate at which data can be transmitted over the link. The bandwidth of a link is a critical parameter that determines the maximum data rate that can be achieved between two nodes. The bandwidth of a link is determined by the physical characteristics of the link, such as the cable type, signal strength, and interference.
- Link latency:
- The latency of a link is the time it takes for a packet to travel from the source node to the destination node. It represents the propagation delay of the signal over the link.
- Link loss rate:
- The loss rate of a link is the possibility that a packet transmitted over the link will be lost. Packet loss can occur due to various reasons, such as network congestion, buffer overflow, or link errors. In terms of the behavior of the loss, we can classify the loss into two types: independent loss and burst loss. In the section packet loss pattern, we will introduce the loss pattern in detail.
- Link jitter:
- In the context of computer networks, packet jitter or packet delay variation (PDV) is the variation in latency as measured in the variability over time of the end-to-end delay across a network. A network with constant delay has no packet jitter. Link jitter is the latency variation of the link.
In assessing the efficacy of network protocols like TCP, UDP, and BATS, it's crucial to scrutinize their performances under varying degrees of packet loss. We will explore 3 principal packet loss models for the evaluation.
- Random Loss Model: The random loss model involves dropping packets independently based on a given percentage p. Each packet loss event is unrelated to others, making it a simple yet unpredictable scenario. This model reflects scenarios where packet loss occurs randomly and does not follow any specific pattern.
- State Loss Model: The state loss model utilizes a 4-state Markov chain to depict packet loss behaviors. The Markov chain states are as follows:
- State 1: Good Packet Reception (no loss)
- State 2: Good Reception within a Burst (no loss)
- State 3: Burst Losses (loss probability is 100%)
- State 4: Independent Losses (loss probability is 100%)
Utilizing transition probabilities such as p13, p31, p23, p32, and p14 , the Markov chain's transition matrix is described as:
Furthermore, we derive the probabilities of each state πi,i=1...4 in terms of the aforementioned transition probabilities. Notably, State 4 is associated to isolated packet losses, after that the system must come back to State 1, so p41 is assumed to be 1. Solving the system yields:
Consequently, the theoretical packet loss probability is given by:
This model effectively captures scenarios involving structured packet losses, like bursts or isolated instances.
- Gilbert-Elliot (GeModel) Loss Model: The Gilbert-Elliot model, synonymous with the burst loss model, is characterized by distinctive states and transition probabilities:
- Probability of transitioning to a bad (lossy) state.
- Probability of transitioning from a bad state to a good state.
- Loss probability in the bad state.
- Loss probability in the good state.
The Gilbert-Elliott loss model is a Markov chain with two states (G and B, G is in the good state, and B is in the bad state. πG is the probability of being in the good state, and πB is the probability of being in the bad state.). The transition matrix of the Markov chain is as follows:
Markov chains can achieve a stationary distribution independent of the initial state. Solving the system, we get the states probabilities:
As a result, the theoretical packet loss probability is:
The Gilbert-Elliott model is particularly useful for simulating bursty packet loss behaviors, where packets are more likely to be lost during periods of degraded network conditions (bad state) compared to periods of normal operation (good state). This model is suitable for scenarios with intermittent but extended periods of packet loss.
In conclusion, these loss models provide a structured way to simulate different types of packet loss patterns in network environments. Understanding how network protocols perform under these loss models helps in designing and optimizing protocols for real-world scenarios with varying degrees of packet loss. Please refer to Definition of a general and intuitive loss model for packet networks and its implementation in the Netem module in the Linux kernel for more details.
And all the supported loss pattern in the tool tc-netem can be found in https://www.man7.org/linux/man-pages/man8/tc-netem.8.html.
In terms of performance evaluation, we care about the following metrics:
- Throughput: the average throughput during the whole transmission.
Throughput
here means how many payload data can be transmitted per second at the application layer. For example, for TCP transmission, the throughput is the number of TCP payload bytes transmitted per second. - Latency: the average latency during the whole transmission, it is an end-to-end measurement at the application layer.
- Reliability: the reliability of the system which is defined as the ratio of the number of successfully received(or packets after decoding if we use BATS codes) packets to the number of sent(or packets before encoding if we use BATS codes) packets. If we don't have restrictions on the latency of the feedback control, the reliability should be 1.0. If we have restrictions on the latency of the feedback, the reliability is less than 1.0.
- Residual loss rate: the residual loss rate over the BATS protocol transmission. For the BRTP(BATS Reliable Transmission Protocol), the residual loss rate should be 0.0. For the BTP(BATS Transmission Protocol, none reliable version), the residual loss rate may be slightly larger than 0.0.
To evaluate the efficiency of the BATS protocol, we need to monitor the following metrics:
- Transmission rate of the link:
Throughput
is not equal to the link bandwidth due to the overhead of the protocol. So in the test, we need to know the Link layer transmit/receive rate. And Link layer transmit/receive rate refers to the real networking load of the link. This can be used to evaluate the redundancy introduced by the BATS protocol. - Link loss statistics: number of packet loss of each batch. Recoding number statistics is used to evaluate the coding efficiency of the BATS protocol in those middle nodes.
- Recoding number statistics: recoding number of each batch.
We will compare with different existing techniques such as TCP, TCP-BBR, UDP, KCP, etc. In future, we will also compare with PEP, QUIC and other protocols.
- TCP: Linux implementation version of 6.5.0-14-generic.
- KCP: A Fast and Reliable ARQ Protocol. We use the KCP instance provided by kcptun.
- TCP-bbr: The latest network congestion control algorithm in TCP.
In order to be compatible with the existing network infrastructure, BATS protocol is usually deployed as an overlay network protocol, it runs on top of the existing network protocol stack TCP/IP. The following diagram shows how the BATS protocol works in the network.
All the nodes in the network which run the BATS protocol are called BATS nodes. All BATS nodes form a BATS network; the following diagram shows the structure of a classic BATS network.
To see design considerations and the architecture of the BATS protocol, please refer to BATS Protocol.
BTP is a none reliable transmission protocol based on BATS codes. It is designed to provide high throughput and low latency data transmission in lossy networks. The BTP employed restricted feedback, utilizing link loss rate and coding statistics to adjust the coding redundancy.
This enabled acceptable reliability while not relying on exhaustive feedback mechanisms. The BTP is suitable for scenarios where the feedback latency is high, and the reliability is not the primary concern.
Even though BTP is not a reliable protocol, but it can still provide a high reliability(99%) in most cases due to our advanced coding technology.
BRTP is a reliable transmission protocol based on BATS codes. As an enhancement of BTP, BRTP ensures 100% reliable data transmission by utilizing feedback to control the retransmission of unsolvable file trunks. Here retransmission is performed in network coding way, which is more efficient than transmitting the original data.
BRTP is suitable for scenarios where the reliability is the primary concern, and can endure increased latency.
Compared to the traditional TCP protocol, BRTP can be more efficient in lossy networks in terms of throughput and average latency, especially in the case of high loss rate.
In order to improve the performance of TCP over BATS protocol, BATS protocol has developed an transparent proxy mode for TCP flows. It takes the idea from On-board satellite "split TCP" proxy.
- Link bandwidth limit: We set the bandwidth limit of the link to a sufficiently small value that the BATS protocol can easily fullfil, so that we can eliminate the uncertainty brought by software performance.
- Source rate limit: We set the data source rate to be smaller than the bandwidth limit in each test scenario, so that there is no congestion, and we can focus on the specific targets of each test scenario. Then, for each scenario, we can gradually increase the source rate towards the bandwidth limit, until exceeds it, to see how the communication behavior changes. In this case, we need to turn off the BATS protocol's congestion control mechanism.
- Testing Platform: Ubuntu 22.04 LTS, Linux kernel 6.5.0-14-generic.
- Fixed parameters in the test:
- Link bandwidth: 300Mbps
- Purpose: Study the outer code performance without worry about the inner code and congestion control. We focus on the evaluation of the end-to-end throughput, latency and reliability.
- Scenery: A one-hop network that includes two nodes, connected directly by a network link. The packets transmitted through the network link suffer from both packet loss and delay.
- Purpose: Study the inner code performance and the congestion control mechanism. We focus on the evaluation of the throughput, latency and reliability of different BATS protocols/mode in a multi-hop network.
- Scenery: A multi-hop chain network that includes multiple nodes, connected by network links. The packets transmitted through the network links suffer from both packet loss, delay and congestion.
- Accumulated loss rate: When the length of the network path is long, the accumulated loss rate will be high. The following diagram shows the accumulated loss rate of a 6-hop network with different link loss rates.
- Topology:
-
Parameters:
- Hop number: 6
- Link latency: < 1ms
- Random Link loss rate: 0%, 5%, 10%, 15%, 20%
-
Test Method:
- Running Iperf3 on the source node and the destination node, and measure the end-to-end throughput of the TCP, BTP and BRTP protocol, respectively. Six seconds after the start of the test, the system begins to simulate packet loss. The entire test lasts for 60 seconds.
In the following diagram, the item TCP over BATS proxy
is the testing for BRTP
.
Note: Reliability is calculated as: Reliability = Average Receive Rate / Average Send Rate. The theoretical reliability of BATS code is 100%. However, since the BTP does not provide unrestricted feedback and iperf/bperf transmission and reception are not entirely synchronized, the results indicate that the reliability of original BATS is less than 100%. Due to the near-zero transmission of TCP in a multi-hop environment with packet loss, calculating the reliability of TCP is not meaningful.
- Topology:
H0 --> H1 --> H2 --> H3
-
Parameters:
- Hop number: 3
- Link latency: = 5ms
- Link latency jitter: 1ms
- Random Link loss rate: 0%, 2%
- Link bandwidth: 200Mbps
-
Test Method:
- We uses a PvP game endpoint to simulate sending messages over TCP at fixed rate, fixed message size. The receiver will echo back the messages it received.
- Source code of the PvP game endpoint is in the file
src/pvp_game_endpoint.cc
; For HOWTO of the PvP game endpoint, please refer to pvp_game_endpoint. - In the above topology, the PvP game endpoint is running on
H0
andH3
,H0
send messages toH3
, andH3
echo the message back toH0
; thenH0
calculate the averageRTT
for each consecutive 10 packets.
-
Test Protocol:
-
BATS: BATS protocol is running in TCP proxy mode with BRTP(BATS Reliable transmission protocol), and all congestion/feedback mechanism are enabled.
-
TCP: TCP with default configurations, and default congestion control algorithm
cubic
; -
KCP: Using the KCP instance from https://github.com/xtaci/kcptun, this
kcptun
implement high-efficient reliable transmission over UDP; and it has the Reed-Solomon codes for error correction. We used the following command to start then KCPTUN client:kcptun/client_linux_amd64 -r "{dst_host.IP()}:4000" -l ":{forwarding_port}" \ -mode fast3 -nocomp -autoexpire 900 -sockbuf 16777217 -dscp 46 --crypt=none
-
-
Test Cases:
In order to simulate different scenarios, we had tested the latency in three cases with the following changes:
- Case 1: No packet loss on each link; the PvP game endpoint send messages at a rate of 100 packets/s, each message size is 1024 bytes;
- Case 2: 2% packet loss on each link; the PvP game endpoint send messages at a rate of 100 packets/s, each message size is 1024 bytes;
- Case 3: 2% packet loss on each link; the PvP game endpoint send messages at a rate of 100 packets/s, each message size is 128 bytes;
Case 1
shows performance of protocols under perfect network conditions, andCase 2
andCase 3
show performance of protocols in a lossy network.The difference between
Case 2
andCase 3
is that:Case 2
simulates the scenario of video streaming at fixed rate;Case 3
simulates the scenario of signaling messages in a real-time communication system; -
End-to-End Throughput Measurement:
Before latency test, we had measured the end-to-end throughput of each protocol from
H0
toH3
with link loss rate 2%:Protocol End-to-End throughput BATS 42.4 Mb/s TCP 1.96 Mb/s KCP 16.7 Mb/s -
Latency test result:
The latency test results are shown as follows:
Fig 1.3 Latency test result of case 1Fig 1.4 Latency test result of case 2Fig 1.5 Latency test result of case 3 -
Conclusion of latency evaluation:
-
- In no packet loss scenario, BATS protocol has slightly higher latency than TCP and KCP, this can be optimized in the future version of the BATS protocol;
-
- In 2% packet loss scenario and large message size, BATS protocol has the lowest latency and stable latency; it can improve the quality of the real-time video streaming greatly;
-
- In 2% packet loss scenario and small message size, BATS protocol still performs far better than TCP, and is close to KCP; but compared to KCP, BATS protocol has a more stable latency or smaller latency jitter. BATS protocol still can bring better benefits to the real-time singling communication system.
-