Loading [MathJax]/extensions/MathMenu.js
On Using Raft Over Networks: Improving Leader Election | IEEE Journals & Magazine | IEEE Xplore

On Using Raft Over Networks: Improving Leader Election


Abstract:

Raft is a state-of-the-art consensus algorithm for state replication over a distributed system of nodes. According to Raft, all state updates occurring anywhere in the sy...Show More

Abstract:

Raft is a state-of-the-art consensus algorithm for state replication over a distributed system of nodes. According to Raft, all state updates occurring anywhere in the system are forwarded to the leader, which is elected among the system nodes to collect and replicate these updates to all other nodes. Thus, the time required for the state replication, named as system response time, depends on the delays between the leader and all other nodes. After multiple node failures and leadership transitions, each node can be leader with a probability that affects the expected response time. The leadership probabilities, in turn, are affected by the random intervals that nodes are waiting, after detecting a leader failure and before competing for the successive leadership. The Raft designers suggest the ranges of these intervals to be equal for all nodes. However, this may result in increased expected response time. In this paper, mathematical models are presented for estimating the ranges resulting in the desired leadership probabilities. The presented theoretical results are also confirmed by testbed experimentation with an open-source and widely used Raft implementation.
Published in: IEEE Transactions on Network and Service Management ( Volume: 19, Issue: 2, June 2022)
Page(s): 1129 - 1141
Date of Publication: 31 January 2022

ISSN Information:

Funding Agency:


I. Introduction

Distributed systems receive extensive attention nowadays, that networking technologies are flourishing and time-critical system functions are spread over multiple interconnected nodes. The emerging Software Defined Networking (SDN) excels in assisting distributed systems, while in parallel is assisted by distributed systems, since distributed SDN controller clusters are more efficient than single-instance controllers [2]. The most popular open-source SDN controllers, such as ODL [3] and ONOS [4], are fundamentally designed to support clustering. Similarly, Necklace [5] provides an architecture for distributed Service Function Chaining that performs surprisingly well. Finally, Kubernetes [6], OpenStack [7] and Hyperledger-Fabric [8] are a few examples of widely used systems with increased scalability and efficiency, due to their distributed operation, which is assisted by etcd [9] with distributed key-value store. However, these systems require a protocol for reaching consensus between their nodes.

Contact IEEE to Subscribe

References

References is not available for this document.