Journals & Magazines >IEEE Access >Volume: 11

A Scalable Byzantine Fault Tolerance Algorithm Based on a Tree Topology Network

According to the tree topology model, firstly, the algorithm consensus process is described, which graphically demonstrates that this article can significantly reduce com...

Abstract:

The consortium chain is the main form of application of blockchain technology in the actual industry, and its consensus mechanism mostly adopts the practical Byzantine fa...Show More

Metadata

Abstract:

The consortium chain is the main form of application of blockchain technology in the actual industry, and its consensus mechanism mostly adopts the practical Byzantine fault tolerance (PBFT) algorithm. The traditional PBFT algorithm is only suitable for small-scale local area networks, but in large-scale wide area network environments, its scalability bottleneck has a serious impact on the performance of the system. Therefore, in this paper, a scalable Byzantine fault tolerance algorithm based on a tree topology network is proposed (STBFT), which can take different steps to reach consensus according to the abnormal situation of the system. First, the STBFT algorithm divides the consensus nodes into different layers and groups based on the tree topology network structure, which transforms from global consensus to local consensus and drastically reduces communication consumption. Then, the division method of the group is based on a verifiable random function (VRF), with the purpose of preventing targeted attacks and colluding Byzantine nodes from affecting the normal consensus of the system. Finally, a feedback mechanism is proposed for the first time to reduce the influence of Byzantine failure on hierarchical network systems. The simulation results show that the proposed algorithm reduces the communication complexity and improves the fault tolerance of the system, and the scalability of the tree topology network structure can be better applied in large-scale scenarios such as IoT and health care.

According to the tree topology model, firstly, the algorithm consensus process is described, which graphically demonstrates that this article can significantly reduce com...

Published in: IEEE Access ( Volume: 11)

Page(s): 33509 - 33519

Date of Publication: 03 April 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3264011

Funding Agency:

Contents

SECTION I.

Introduction

Since “an electronic cash currency completely realized through peer-to-peer technology” (i.e., Bitcoin) [1] was proposed in 2008, blockchain technology has received increasing attention. The essence of blockchain is a decentralized distributed ledger database [2], which integrates peer-to-peer (P2P) networks, consensus algorithms, cryptology principles [3], smart contracts and other technologies. Due to its security, decentralization, tamper-proof, and traceability, blockchain technology is widely used in digital currency, finance, health care, energy, IoT, and other fields [4].

The consensus algorithm, as the core technology of blockchain, especially permissioned blockchain, enables distrustful participants to verify the validity of data in the blockchain system in a decentralized manner, ensuring that each node of the blockchain can record and store data according to the same uniform standard of behavior, thus reaching a consensus of the whole network. According to the degree of decentralization and application scenarios, blockchain can be divided into three main categories: private chain, public chain, and consortium chain [5]. Private chains are blockchain systems that are open to individuals or organizations. The most representative consensus algorithms of private chains are Paxos [6], Raft [7], etc. These consensus algorithms generally only consider failures due to node and network problems (such as node downtime, network failure, etc.) and do not consider the malicious nodes in the cluster. Therefore, the Byzantine fault tolerance problem cannot be solved. Compared to other chains, public chains are the most centralized chains, such as Bitcoin [1] and Ethereum [8]. The most famous consensus algorithm of the public chain is proof-of-work (PoW), which is completely decentralized and open to all nodes that can access the internet. Although PoW can effectively resist Sybil attacks and provide security for the system when the computing power of malicious nodes does not exceed 50%, its mechanism of relying on the computing power of computers to complete consensus brings the problem of wasted resources, its transaction confirmation latency is high, and throughput is low. Compared with the PoW algorithm, the proof-of-stake (PoS) [9] algorithm introduces coin days to dynamically adjust the mining difficulty to reduce computing power consumption. As a result, the PoS algorithm mines faster and consumes fewer resources. Delegated proof of stake (DPoS) [10] uses a scheme based on node weights to elect proxy nodes to cancel mining. As the number of nodes participating in the consensus is greatly reduced, the time to generate blocks is shortened, and consensus efficiency is improved. However, it may cause some nodes with more resources to control the election process, which can easily destroy the network.

To better protect user privacy and supervise data, consortium chains are proposed. In the application scenario of the consortium blockchain, the Byzantine fault tolerance (BFT) algorithm is the most commonly used consensus algorithm, which was proposed by Lamport [11] in 1982, creating a precedent for the Byzantine Fault Tolerance algorithm. Due to the high complexity of the BFT algorithm, it has yet to be practically applied. Castro et al. [12] proposed a practical Byzantine fault tolerance (PBFT) algorithm in 1999, which improved the BFT algorithm’s inefficiency and achieved fault tolerance for a certain number of Byzantine nodes to provide security and activity in asynchronously distributed systems. Additionally, the algorithm complexity is reduced from an exponential level to a polynomial level. Compared with other consensus algorithms, the PBFT algorithm does not need to consume many computing resources and has the characteristics of high throughput and low latency, so it is widely used in permissioned chain systems.

At present, the evaluation indicators of consensus algorithms are different, and different scenarios have different focuses, including fairness and decentralization degree from the perspective of sociology, consensus energy consumption and economic cost from the perspective of economics, and scalability, fault tolerance, and security from the perspective of computer science [13]. How to adaptively design a consensus mechanism for a specific performance evaluation target in combination with specific requirements and application scenarios to optimizethe algorithm [14], [15] is an urgent problem to be solved. With the gradual expansion of the scale of the application scenario of the permission chain, the scalability problem of the PBFT algorithm has become increasingly prominent. Therefore, to reduce the communication complexity of the PBFT algorithm and improve the applicability of the permission chain in large-scale application scenarios, this paper proposes a scalable Byzantine fault-tolerant algorithm based on tree topology networks (STBFT). Specifically, the main contributions of this paper are as follows:

First, we change the mesh network topology of the PBFT algorithm to a tree network topology so that the STBFT algorithm is suitable for hierarchical network systems. The STBFT algorithm divides the network nodes into several subnets, and each subnet has a primary node to send the consensus result to the client.

Second, for the hierarchical network system, the higher the Byzantine node level is, the greater the threat to the consensus network. To make the Byzantine node unable to predict the grouping information, a random grouping strategy based on a verifiable random function (VRF) is proposed to prevent Byzantine node targeted attacks and collusion from affecting the normal consensus of the system, thus ensuring the security of the system.

Third, scalability and fault tolerance cannot be well balanced in hierarchical network systems. Therefore, the feedback mechanism is proposed to monitor and relay the nodes’ behavior, reduce Byzantine failure’s influence on hierarchical network systems, improve the fault tolerance performance of algorithms, and enhance the security and activity of the system.

SECTION II.

Related Work

The most significant difference between the blockchain and the traditional distributed consensus system is its complex operating environment. Even in the strict consortium chain, Byzantine nodes will be introduced. Therefore, since the birth of blockchain technology, Byzantine fault-tolerant consensus algorithms have gradually increased. Given the deficiencies of the PBFT algorithm, in recent years, researchers have mainly researched the process optimization of the PBFT algorithm and the consensus system architecture, resulting in the birth of many excellent Byzantine fault tolerance consensus algorithms.

For example, to improve the performance of the consortium chain, the tPBFT algorithm [16] introduces a trust equity scoring mechanism. Consensus nodes can be dynamically adjusted and simplify the process of consensus, and the efficiency of consensus and node scalability outperforms the PBFT algorithm. Yin et al. [17] proposed HotStuff, which changed the network topology of PBFT to star network topology, adopted threshold signature technology, collected signatures by the primary node, instead of using the many-to-many broadcast communication mode, and used parallel pipelining to process transactions. The HotStuff algorithm significantly improves the efficiency of consensus, but the primary node bears a greater communication burden, which will quickly lead to network congestion. The DPNPBFT algorithm [18] selects dual primary nodes to avoid excessive centralization of a single primary node. The two primary nodes check balance and supervise each other. Therefore, the DPNPBFT algorithm has good system security. Wan et al. [19] proposed the AnonymousFox consensus algorithm, designed an anonymous leader node sorting algorithm to hide the leader node identity, and reduced node communication overhead through one-to-many message communication. The AnonymousFox algorithm improves the algorithm’s throughput and guarantees the system’s security. In addition, the number of nodes participating in the consensus directly impacts the communication complexity of the PBFT consensus algorithm, so researchers try to reduce the number of nodes participating in the consensus. A series of algorithms based on the “committee + PBFT” model have been proposed [20], [21], [22], [23], [24], [25], which selects a subset of the nodes as the committee from all nodes in the system based on their credit, tokens, etc. The committee node participates in the consensus protocol on behalf of all nodes, and other nodes conduct verification and synchronization of the entire network, realizing the rapid consensus of the network. To prevent the selected nodes from being too concentrated to threaten the system’s security, the Algorand algorithm [26] uses VRF [27] to randomly select a subset of network nodes as a committee to ensure the unpredictability of committee members. In the consensus algorithm of the “committee + PBFT” model, the performance and scalability of the algorithm are improved due to the reduction of the consensus network scale. However, the functions of the committee node are higher than those of the general node, and the power is too concentrated in the committee node, which violates the decentralized nature of the blockchain. The above algorithms significantly improve the consensus efficiency, but malicious nodes remain in the consensus system, and system security will not be guaranteed. Therefore, Hao et al. [28] and Jiang and Lian [29] verified the identity information of nodes joining the system by introducing centralized authentication nodes and designing related protocols for node joining, exiting, and other operations. These algorithms not only inherit the advantages of the PBFT algorithm but also have better dynamic properties and higher robustness. Kim et al. [30] considered that the “committee + PBFT” model has a limit that a malicious cartel with more than 1/3 of the total stack can launch persistent center attacks, which disrupt the consensus process. Therefore, this problem is solved by introducing the concept of main-validators and sub-validator and proposing a behavior-based credit-scoring function.

The biggest problem with the PBFT algorithm is its high communication complexity and poor scalability. However, optimizing the PBFT algorithm process alone is not enough to solve application problems in large-scale scenarios. A series of new consensus frameworks have been proposed, whether horizontal sharding or vertical hierarchical expansion is adopted, which essentially divides nodes into independent spaces. Elastico [31] is the first Byzantine fault-tolerant algorithm based on sharding technology. It selects committee nodes in the network through PoW algorithm competition and assigns the elected committee nodes to different groups. Each group executes the PBFT algorithm to process different transaction sets in parallel, and the sharding technology improves system performance through parallel processing. The acceptable number of malicious users using the BFT algorithm is limited, if users collude with malicious users and exceed the limit, consensus cannot be reached. Jeon et al. [32] proposed a randomized mesh blockchain using the diversity of opinion Byzantine fault tolerance to solve this problem. Some efficient network partitioning methods have been proposed to address the specific requirements of consensus algorithms in certain blockchain application scenarios and achieve better results in practical applications. Lin et al. [33] proposed a grouping strategy based on link identification with a supervision mechanism that can better adapt to the consortium chain supervision system. Yoo et al. [34] proposed location-based sharding, which divides nodes into many groups based on location. Consensus nodes are clustered by an improved K-medoids clustering algorithm [35], and the nodes with higher trust are selected as clustering centers. However, only from the perspective of plane architecture can the status quo no longer be met, and an increasing number of algorithms have been researched on hierarchical structures. Feng et al. [36] proposed a scalable dynamic multi-agent hierarchical SDMA-PBFA algorithm, which divides nodes into several groups. Each group elects an agent and executes the PBFT algorithm in each group. Finally, all agents send the consensus result to the client. Li et al. [37] proposed SHBFT, which optimized many aspects based on PBFT, designed a hierarchical structure to speed up the consensus process and achieved global consensus through local consensus. Li et al. [38] proposed a scalable multi-layer consensus mechanism based on the PBFT algorithm, which significantly reduces communication complexity and provides guidance for the design and performance analysis of hierarchical systems.

It can be seen from the above research that these algorithms optimize the PBFT algorithm from different angles. Although many problems in the PBFT algorithm can be solved, in the case of many nodes, only optimizing the PBFT algorithm process is far from sufficient. Therefore, a series of consensus algorithms start from the system architecture and divide the nodes into local consensus through a group-based or hierarchical structure, which significantly reduces the communication complexity and improves the system scalability, fundamentally solving the problem that the PBFT algorithm cannot be applied to a large-scale network environment. However, the consensus algorithm of a group-based or hierarchical structure has the following shortcomings. First, the existing research method of network node division is determined by the system model and has not been authenticated by the nodes of the whole network. Second, the current node division rarely considers the collusion between malicious Byzantine nodes, so a considerable threat to the system’s security exists. Third, scalability and fault tolerance are relatively large contradictions in the hierarchical system. Although the hierarchical system can solve the scalability problem of the PBFT algorithm, the system also has great instability and is easily affected by Byzantine nodes.

To this end, we conduct in-depth research from the system model of the PBFT algorithm. To solve the application problem of the PBFT algorithm in specific scenarios in the consortium chain, we propose the STBFT algorithm, which is also a hierarchical structure based on tree topology network. However, our algorithm is different from other algorithms in several ways.

First, network nodes are randomly and secretly assigned to different layers and groups, and the grouping information of each node is verified and confirmed by other nodes.

Second, since the tree topology is a unique network structure, we design the system model in a targeted manner. The nodes of each layer will collect corresponding evidence from supervising and providing feedback to the nodes of the upper and lower layers.

Finally, we have solved how to activate the system to ensure the consensus consistency of the algorithm when the nodes are Byzantine nodes.

Therefore, our algorithm can not only improve the system’s scalability but also deal with scenarios that require high fault tolerance, making the blockchain consensus algorithm more applicable.

SECTION III.

Styling STBFT Algorithm Design

In this section, we elaborate on the design of the STBFT algorithm in detail. We first describe the system model of the STBFT algorithm. Then, we overview the process of the consensus algorithm. Finally, we present random grouping strategy and feedback mechanism to minimize the impact of malicious nodes on the system.

A. System Model

The nodes in the tree topology network are arranged in a tree shape, which resembles an inverted tree, hence the name. The tree topology is easy to expand, so new branches or nodes can be added to the network. It is easy to isolate the fault site from the whole system; for example, if one branch node fails, it mainly affects the local area. The tree topology network structure is a hierarchical structure that is suitable for a hierarchical network system. Therefore, we propose a multi-centralized, “group-based and hierarchical” STBFT algorithm based on a tree topology network. We divide the nodes in the network into several layers. Each layer is composed of different areas, and the whole network consensus is divided into several subnetwork local consensuses. Each parent node (nonleaf node) forms a subnetwork with its child nodes, which we call the consensus domain. The network composed of child nodes is called the group domain (group). Figure 1 briefly shows the system model of the STBFT algorithm.

FIGURE 1.

STBFT system model.

Show All

B. STBFT Algorithm Consensus Process

According to the hierarchical structure of the tree topology network, the STBFT algorithm improves the consensus protocol of the PBFT algorithm. First, the primary node in the consensus domain no longer participates in the consensus of its child nodes (slave nodes), and the primary node of each layer is only responsible for sending the request message to the child nodes and collecting the voting results of its child nodes. Second, only the node that has generated the pre-prepare message of certificate $_{< commit>}$ in each layer can be accepted by its child nodes. The certificate is to prove that the group domain of the node has reached consensus.

Figure 2 shows the consensus process of the two-layer STBFT algorithm. We assume that the number of consensus nodes in each group domain is $k$ , and the maximum number of Byzantine nodes that can be tolerated in each group domain is $f_{g}$ , where $k\ge 3f_{g} +1$ and $L_{i}G_{j}N_{o}$ represents the node number. $L_{i}G_{j}$ represents the group number of each layer, where the superscript $L$ represents the layer number, the subscript $i$ represents the index of each layer, the superscript $G$ represents the group number, and the subscript $j$ represents the index of each group, the superscript $N$ represents the node, and the subscript $o$ represents the index of each node. The specific consensus process is as follows:

FIGURE 2.

Consensus process of the STBFT algorithm.

Show All

The root node of the system forms a consensus domain with the nodes in the group domain $L_{1}G_{1}$ . In the request stage, the root node (primary node) collects the transaction requests from the client and sorts and packages the transaction requests. In the pre-prepare phase, the root node multicasts pre-prepare messages to nodes in the group domain $L_{1}G_{1}$ . After receiving the valid pre-prepare message, the node enters the prepare phase by multicasting prepare messages to the other nodes in the group domain. If a node receives more than $2f_{g}$ matching prepare messages from other group domain $L_{1}G_{1}$ nodes, the prepared nodes enter the commit phase by multicasting commit messages to other nodes in the group domain.

When a node receives more than $2f_{g}$ matching commit messages from other group domain $L_{1}G_{1}$ nodes, that operation is committed on this node, and this node forms a certificate. A committed node will perform the following two steps at the same time: first, a committed node enters the reply stage by sending reply message to the root node. second, a committed node is considered as the primary node in its consensus domain and starts consensus reaching by multicasting new pre-prepare messages with certificate $_{< commit>}$ to its child nodes. On the one hand, when the root node receives more than $f_{g} +$ 1 matching reply messages from group domain $L_{1}G_{1}$ nodes, it indicates that this group domain have reached consensus. The root node sends an agree message to the client. On the other hand, a committed node in group domain $L_{1}G_{1}$ forms a consensus domain with its child node. The four stages of pre-prepare ₁, prepare ₁, commit ₁, and reply ₁(the subscript 1 indicates that the message comes from the node of layer 1) mentioned above are carried out in the consensus domain. Finally, the primary nodes in each consensus domain send an agree ₁ message to the client. The client only accepts results replied from primary nodes in all consensus domains when at least half of them are consistent.

C. Random Grouping Strategy Based on VRF

For hierarchical systems, the higher the level of nodes, the greater the impact on the system. If Byzantine nodes collude maliciously and are at a high level of the system, the system’s security will be threatened. Therefore, to make each node randomly distributed in the system, the grouping information of nodes can be verified and confirmed by other nodes. In the grouping stage, the nodes can be divided into provers and verifiers, and the grouping information of each prover needs to be confirmed by the verifier. Therefore, the random grouping process is divided into two parts: the group formation stage and the group verification stage.

1) GST CENTER

The group status table (GST) is used to store the grouping status and information of each node, and some key information of GST is shown in Table 1. At the same time, the GST Center needs to be introduced to maintain the GST. The GST Center is a security service organization that all alliances and institutions recognize and rarely exhibit error behavior. The nodes fully trust all the information recorded by the GST Center. However, introducing the central node in the distributed network environment is not a good solution. In this paper, the GST Center is only used for information storage and does not participate in any grouping strategy. The introduction of the GST Center provides the following functions:

Provide a random seed to the prover and determine the grouping of nodes according to the random number $v$ generated by the prover. Store the time each group formed.
Store the prover’s proof, $v$ , group information, and timestamp $t$ for each group formed, and record basic information such as the ip address and pk.
When the identities of most nodes are exposed, a few nodes can be attacked. Therefore, we first do not publish the grouping information of each node, but rather the grouping information of the nodes is saved in the GST Center.

TABLE 1 Group State Table

2) The Group Formation Stage

Assuming that there are $n$ nodes in the system, the $n$ nodes are divided into $w$ group domains by a random grouping strategy based on VRF, and each group domain should carry $n/w$ nodes. Given a random seed, VRF outputs a pseudorandom hash value $v$ that is basically uniformly distributed between 0 and $2^{256}-1$ . We can obtain $[v/(2^{256}-1)]\in (0,1)$ , so the interval for dividing each group domain is $1/w$ .

Provide a random seed to the prover and determine the grouping of nodes according to the random number $v$ generated by the prover. Store the time each group formed. All consensus nodes in the blockchain network are numbered, and {1, 2, 3... $p$ } is used to represent the node’s ID. The prover participating in the lottery is randomly selected from all nodes. The prover $p$ obtains the random number $v$ and proof according to the VRF, and the generation process is as follows:
$\begin{equation*} {VRF}[({seed},{num}),{sk}]\to {v}+{proof} \tag{1}\end{equation*}$ View Source where seed is provided by the GST Center, num is the counter, the initial value is 0, and sk is the node’s private key.
The prover $p$ sends information such as $v$ and proof to the GST Center. After verification, the GST Center will assign the prover to the corresponding group according to the random number value $v$ . When a group member reaches the threshold, the timestamp $t$ of the threshold is recorded. The position of each group in the STBFT algorithm is determined according to the timestamp. The purpose is to allow each group to be randomly assigned to the hierarchical system according to the timestamp.
When a group member has reached the threshold, if there is still a prover $p_{1}$ assigned to this group, the prover $p_{1}$ will not be added. The GST Center will provide a random seed for prover $p_{1}$ again, and the value of num will be increased by one. Continue to calculate random number $v$ by seed and num until randomly assigned to the appropriate group.

3) Group Verification Stage

When all nodes are randomly assigned to different groups at different levels, the grouping information of each node will be confirmed by all verifiers. The group verification stage is shown in Figure 3.

Step 1:
GST Center multicasts < broadcast, $m$ (gst), $d>_{\& g}$ message to all verifiers including the prover, where $m$ (gst) contains random number $v$ , proof, timestamp $t$ , and group information, $d$ is the digest of $m$ (gst), and &_gthe subscript is the signature of the GST Center.
Step 2:
After receiving the broadcast message from the GST Center, the prover verifies whether it is valid and confirms its grouping information. At the same time, when all verifiers receive the broadcast message from the GST Center, they first verify the information of the verifier according to seed, v, proof, and pk in $m$ (gst) and then confirm the grouping information of the verifier according to the group information, timestamp $t$ , and other information provided by the GST Center. If the broadcast message passes the verification, they will sign with pk and send the < response, $d>_{\&i}$ message to the prover.
Step 3:
If the prover receives more than $2f$ matching response messages, it multicasts the < commit, $d>_{\& s-list}$ message to all verifiers. $f$ represents the maximum number of Byzantine nodes that can be tolerated in the total network, and $n\ge 3f+{ }1$ , & $_{s-list}$ represents the set of signatures of all verifiers.
Step 4:
After receiving the commit message, all verifiers confirm whether the verifier has received more than $2f$ valid signatures. If the verification passes, it means that the prover has completed the grouping, all verifiers will send < reply $>_{\& i}$ messages to the GST Center and prover. The GST Center and the prover receiving more than $2f$ matching reply messages indicates that the whole system has agreed to the prover’s grouping information.

FIGURE 3.

The group verification stage.

Show All

D. Feedback Mechanism

The basic idea of the STBFT algorithm is to reduce the communication complexity through the internal communication of each group and effectively improve the consensus efficiency. In Section B of III, we know that in the normal consensus process, if a node has reached consensus in the group domain, it will generate corresponding certificates and reply messages, send reply messages to high-level nodes, and send certificates to low-level nodes, these are important support for reaching consensus. However, the hierarchical structure based on the tree topology network will also face some problems. In the case where the high-level nodes are Byzantine nodes,

the lower-level nodes of the entire system cannot receive any correct messages, and frequent view change is needed; however, the cost of view change is expensive. Therefore, for the STBFT algorithm proposed in this paper, a feedback mechanism is designed to supervise and relay the behavior of nodes so that the system can minimize the influence of malicious nodes on the system when the nodes have Byzantine behavior, thereby improving the fault tolerance performance.

First, we define the feedback domain, which is a special autonomous region. As shown in Figure 4, node $N$ is the root node of the feedback domain, the root node forms a feedback domain with the two-layer nodes of its branches, and the feedback domain is also composed of many group domains or consensus domains. In the normal consensus process, there is no concept of a feedback domain, such as the consensus in the tree topology, consensus is reached in the high-level consensus domain and consensus results are transmitted to the lower level in turn. When a node in the feedback domain fails, it temporarily forms a feedback domain. The upper and lower nodes in the feedback domain monitor each other, feedback the consensus results, and trigger the feedback mechanism to address the impact of malicious nodes on the consensus. Figure 4 is one of the feedback domains of Figure 1. Therefore, the network structure of the STBFT algorithm consists of many feedback domains. Second, feedback mechanism only be implemented in each feedback domain; for convenience, we will describe the feedback mechanism according to Figure 4. Finally, we show how the STBFT algorithm can reach consensus based on a feedback mechanism in different situations.

FIGURE 4.

Feedback domain.

Show All

Case $A:$ Assume that the root node $N$ in the feedback domain is an honest node, and the number of Byzantine nodes in the group domain $L_{i}G_{j}$ exceeds $f_{g}$ .

Under normal circumstances, each node in the group domain $L_{i}G_{j}$ receives the pre-prepare $_{i-1}$ message (the subscript $i$ – 1 indicates that the message comes from the node of layer $i$ – 1) from node $N$ . However, when Byzantine nodes in the group domain $L_{i}G_{j}$ generate malicious behavior

(such as sending inconsistent messages or not sending messages within a specified time), at the consensus stage, $certificate< commit_{i-1} >$ cannot be formed because the nodes cannot receive more than $2f_{g}$ matching commit $_{i-1}$ messages in the group domain $L_{i}G_{j}$ . Therefore, during the reply $_{i-1}$ phase, node $N$ does not receive $f_{g} +$ 1 matching reply $_{i-1}$ messages from group domain $L_{i}G_{j}$ . At the same time, during the pre-prepare _i stage, all nodes at layer $i +$ 1 cannot receive the pre-prepare _i message with $certificate< commit_{i} >$ from the group domain $L_{i}G_{j}$ . In this case, once at the end of the reply $_{i-1}$ phase, node $N$ will send new $pre-prepare_{i-1}^{N}$ messages with $certificate< fail-reply_{i-1} >$ to all nodes in layer $i +1$ (once the root node generates $certificate< fail-reply_{i-1} >$ , it means that in the reply $_{i-1}$ phase, node $N$ has not received $f_{g} +$ 1 matching reply $_{i-1}$ messages from the group domain $L_{i}G_{j}$ ).

In addition, in the group domain $L_{i}G_{j}$ , even if consensus cannot be reached and a certificate cannot be generated, the honest node still sends a pre-prepare _i message to its child nodes, but the Byzantine node will generate Byzantine behavior. The pre-prepare _i message from each node in the group domain $L_{i}G_{j}$ will be verified by its child nodes. An honest node will be authenticated by its child nodes, but its child nodes will not accept the pre-prepare _i message (because child nodes cannot receive the $certificate< commit_{i} >$ from the node in group domain $L_{i}G_{j}$ ). If it is a Byzantine node, the system will start view change to replace the primary node in the consensus domain, and all nodes in layer $i +$ 1 cannot receive the $certificate< commit_{i} >$ from the node in group domain $L_{i}G_{j}$ and will wait for a new $pre-prepare_{i-1}^{N}$ message from node $N$ . If all nodes in layer $i +$ 1 receive a valid new $pre-prepare_{i-1}^{N}$ message from node $N$ before the end of the prepare _i phase (normal situation), it accepts the $pre-prepare_{i-1}^{N}$ message again and then executes the $prepare_{i-1}^{N}$ phase and the $commit_{i-1}^{N}$ phase in each group domain in layer $i +1$ . If the primary node corresponding to the group domain in layer $i +$ 1 is verified as an honest node, in the $reply_{i-1}^{N}$ stage, the committed nodes in the group domain not only send the $reply_{i-1}^{N}$ message to node $N$ but also need to send the $reply_{i-1}^{N}$ message to the corresponding primary node of the group domain; otherwise, they just send the $reply_{i-1}^{N}$ message to node $N$ . Finally, node $N$ sends the consensus result to the client.

Case $B:$ Assume that the root node $N$ in the feedback domain is an honest node, and the number of Byzantine nodes in the group domain $L_{i}G_{j}$ is less than $f_{g}$ .

When each node in group domain $L_{i}G_{j}$ receives the valid pre-prepare $_{i-1}$ message from node $N$ , after the prepare $_{i-1}$ phase and the commit $_{i-1}$ phase, the group domain $L_{i}G_{j}$ has reached consensus if node $N$ receives $f_{g} +$ 1 matching reply $_{i-1}$ messages, and the committed and honest nodes in group domain $L_{i}G_{j}$ will send pre-prepare _i messages with $certificate< commit_{i} >$ to their child nodes, while Byzantine nodes in group domain $L_{i}G_{j}$ will produce malicious behavior.

Assume that node $L_{i}G_{j}N_{o}$ is a Byzantine node, When the Byzantine node $L_{i}G_{j}N_{o}$ in the group domain $L_{i}G_{j}$ produces malicious behavior, in the pre-prepare _i phase, the child nodes of node $L_{i}G_{j}N_{o}$ cannot receive messages within the specified time or receive invalid messages. Therefore, if the child nodes of node $L_{i}G_{j}N_{o}$ cannot receive a valid $pre-prepare_{i-1}^{N}$ message from node $N$ before the end of the prepare _i phase, when the prepare _i phase ends, it sends the feedback message with $certificate< L_{i} G_{j} N_{o} -fault>$ to node $N$ (distinguish case $A$ ). When node $N$ receives more than $2f_{g}$ matching feedback messages, it sends $pre-prepare_{i-1}^{N}$ messages to the child nodes of node $L_{i}G_{j}N_{o}$ . After the $prepare_{i-1}^{N}$ stage and the $commit_{i-1}^{N}$ stage in the group domain, the committed nodes send the $reply_{i-1}^{N}$ message to node $N$ to enter the $reply_{i-1}^{N}$ stage. Similarly, node $N$ sends the consensus result to the client.

Case $C:$ Assume that the root node $N$ in the feedback domain is a Byzantine node, the number of Byzantine nodes in the group domain $L_{i}G_{j}$ is less than $f_{g}$ and the node $L_{i}G_{j}N_{o}$ in the group domain $L_{i}G_{j}$ is a Byzantine node.

Since two consecutive nodes in the same branch are Byzantine nodes, in this case, the child nodes of node $L_{i}G_{j}N_{o}$ cannot receive valid messages even if it sends a feedback message to node $N$ . Therefore, all branch nodes with $L_{i}G_{j}N_{o}$ as the root node cannot reach consensus. However, it only affects the local consensus in the feedback domain.

Case $D:$ Assume that the root node $N$ in the feedback domain is a Byzantine node, and the number of Byzantine nodes in the group domain $L_{i}G_{j}$ exceeds $f_{g}$ .

In this case, the feedback mechanism is ineffective in the feedback domain, so all branch nodes with node $N$ as the root cannot reach consensus.

SECTION IV.

Analysis and Simulation Experiments

A. Analysis of Failure Probability P of the Feedback Mechanism

In Case $D$ of Part D of Section III, we know that the feedback mechanism is completely invalid in the feedback domain, so we analyze the failure probability of the feedback mechanism. Assuming that the nodes are independent of each other, we have $P = P(A) \times P(B/A)$ , where $P(A)$ is the probability that the root node in the feedback domain is a Byzantine node, which is obtained from Formula 2 ( $f_{1}$ represents the number of Byzantine nodes in the feedback domain, and $m$ represents the total number of nodes in the feedback domain). $C_{f_{1}}^{1}$ indicates that a node is randomly selected from $f_{1}$ Byzantine nodes as the root node of the feedback domain, and $C_{m}^{1}$ indicates that a node is randomly selected from m nodes.

$\begin{equation*} P(A)=C_{f_{1}}^{1} / C_{m}^{1} \tag{2}\end{equation*}$ View Source

The value of

$P(B/A)$

depends on the value of

$P(A)$

$P(B/A)$

represents the probability that when the root node is a Byzantine node in the feedback domain, the group domain composed of its child nodes is a Byzantine group (Byzantine group represents that the number of Byzantine nodes exceeds

$f_{g}$

in the group domain), which is obtained from Formula 3 (where

$k$

represents the number of nodes in the group domain, and

$c$

represents the number of Byzantine nodes in the group domain).

$C_{f_{1} -1}^{c}$

indicates that

$c$

nodes are randomly selected from

$f_{i-1}$

Byzantine nodes, and

${C}_{m-1}^{k}$

indicates that

$k$

nodes are randomly selected from

$m$

-1 consensus nodes.

$\begin{equation*} {P}({B / A})=\sum \limits _{c=f_{g} +1}^{k} {C_{f_{1} -1}^{c}} {/ C}_{m-1}^{k} \tag{3}\end{equation*}$

View Source

Therefore, the

$P$

value is obtained from Formula 4, and in the case of different

$k$

and

$m$

, we obtain the relationship between the ratio

$R$

and the

$P$

value (

$R$

represents the ratio of Byzantine node

$f_{1}$

to total nodes

$m$

), as shown in Figure 5. Experiments show that when the ratio of Byzantine nodes to total nodes is small, the

$P$

value is extremely small. For example, when

$k=4$

$m=21$

, and the ratio

$R$

is one-third in the feedback domain, the value of

$P$

is 0.0034. This shows that when the number of Byzantine nodes is within the fault-tolerant range of the feedback domain, the probability that the feedback mechanism cannot be triggered is almost 0. Even if the ratio

$R$

is greater than one-third, it does not directly cause the feedback mechanism to fail, and the

$P$

value increases gradually with increasing

$R$

value. Moreover, when the total number of nodes m in the network is larger, under the same

$R$

, the probability of failure of the feedback mechanism decreases. Since the system consists of many feedback domains, the success rate of consensus reached by the system will be improved when the probability that the feedback mechanism cannot be triggered is very low.

$\begin{equation*} {P=P}({A}) \times {P}({B/A})=({C}_{f_{1}}^{1} {/ C}_{m}^{1}) \cdot \left({\sum \limits _{c=f_{g} +1}^{k} {C_{f_{1} -1}^{c}} {/ C}_{m-1}^{k} }\right) \tag{4}\end{equation*}$

View Source

FIGURE 5.

Probability of feedback mechanism failure.

Show All

B. Communication Complexity Analysis of the STBFT Algorithm

Assuming the number of nodes of the PBFT algorithm is $n$ , the consensus process of the PBFT algorithm can be divided into five phases: the request phase, pre-prepare phase, prepare phase, commit phase, and reply phase. The communication complexity can be calculated to be $1 +{ }(n-1) + (n-1)(n-1) +n(n-1) +n,$ which is simplified to $2n^{2}-n+ 1$ . Therefore, we can obtain the communication complexity of the PBFT algorithm as follows:

$\begin{equation*} {C}1=2{n}^{2}-{n +}1 \tag{5}\end{equation*}$ View Source

Assuming that the number of nodes in the STBFT algorithm is also

$n$

and the number of nodes in each group domain is

$k$

, it will be divided into

$n/k$

group domains. Our algorithm has five main stages in the consensus domain: the pre-prepare stage,

prepare stage, commit stage, reply stage, and final agree stage. The communication complexity of each consensus domain is $k+k(k-1) +k(k-1) +k+ 1$ , which is simplified to $2k^{2}+ 1$ . Therefore, we can obtain the communication complexity of the STBFT algorithm as follows:

$\begin{equation*} {C}2=[{n}(2{k}^{2}+1)] {/ k} \tag{6}\end{equation*}$ View Source

Therefore, the ratio of the communication complexity of the STBFT algorithm to that of the PBFT algorithm is

$R_{c} =C2/C1$

. From the expressions of

$C2$

and

$C1$

, we have

$\begin{equation*} {R}_{c} =[{n}(2{k}^{2}+1)]/[{k}(2{n}^{2}-{n}+1)] \tag{7}\end{equation*}$

View Source

We take

$k=4$

, 10, and 16 to conduct simulation experiments when the total number of network nodes is less than 1000 and compare the ratio of the communication complexity of the STBFT algorithm to the PBFT algorithm for different network nodes. The experimental results are shown in Figure 6. It can be seen from the figure that regardless of the value of

$k$

, compared with the PBFT algorithm, the communication complexity of the STBFT algorithm is extremely low. At the same network node, the smaller the

$k$

value is, the lower the communication complexity of the STBFT algorithm. However, a smaller value of

$k$

means that the system needs to be divided into more layers, and a deeper level will lead to higher delay and lower security. Therefore, in some specific scenarios, the application of the STBFT algorithm needs to consider these tradeoffs.

FIGURE 6.

Ratio of the communication complexity of the STBFT to the PBFT.

Show All

C. Fault Tolerance Analysis for the STBFT Algorithm

To improve the fault tolerance and security of the system, we use the randomness of VRF to ensure that nodes are evenly distributed in the consensus network. At the same time, our algorithm uses a feedback mechanism to deal with malicious behavior of Byzantine nodes, which can maximize the success rate of our algorithm in achieving consensus. Part A of Section IV shows that in the case of a random distribution of nodes, the probability that the feedback mechanism cannot be triggered within the acceptable fault tolerance range is extremely low. Therefore, we verify the effectiveness of the mechanism and analyze the success rate under malicious node attacks. By introducing the faulty probability determined (FPD) and faulty number determined (FND) models [38], our algorithm is compared with the Multi-Layer PBFT algorithm [38] to verify that it can improve the fault tolerance of the hierarchical system. The FPD model is used when the probability of every single faulty node is fixed, and the FND model is used when the number of faulty nodes in the system is fixed. We conduct FPD and FND simulation experiments on the two algorithms in MATLAB. Assuming that all nodes are independent of each other in the experiment, we use the same two-layer system structure for both algorithms and take the number of nodes $k$ of the group to be 10 and 15, so the corresponding total network nodes $n$ are 111 and 241, respectively. We use the FPD and FND models to perform 10,000 repeated experiments in the same network environment to calculate the simulation success rate by taking the ratio of success times to the total repeating times. The experimental results are shown in Figure 7. It can be seen from the figure that the STBFT algorithm has better fault tolerance performance than the Multi-Layer PBFT algorithm, and our algorithm can always maintain a high 100% consensus success rate in both the FPD model and the FND model. With the gradual increase in network nodes, the proportion of our algorithm to achieve a 100% success rate gradually increases. It can be seen from Part A of Section IV that with the increase in consensus network nodes, the probability of feedback mechanism failure will be lower, so the success rate of reaching consensus will be higher.

FIGURE 7.

Analytical results of the FPD and FND models with different k and n values for the STBFT and Multi-Layer PBFT.

Show All

To further verify that our algorithm can significantly improve the system’s fault tolerance, we also conduct an experimental analysis of the two algorithms in the same three-layer system structure. The number of nodes $k$ of the group is 10, and the total number of network nodes $n$ is 1111. As shown in Figure 8, similarly, the STBFT algorithm has better fault tolerance performance than the multilayer PBFT algorithm, and our algorithm can always maintain a high 100% consensus success rate in both the FPD model and the FND model.

FIGURE 8.

Analytical results of the FPD and FND models for the STBFT and Multi-Layer PBFT.

Show All

SECTION V.

Conclusion

Due to problems such as high communication overhead and poor node scalability of the PBFT algorithm in consortium chains, this paper proposes a scalable algorithm based on a tree topology network structure that divides network nodes into groups and layers and changes from global consensus to hierarchical multi-centralized consensus, which reduces communication complexity and improves node scalability. At the same time, our algorithm considers the problems of low fault tolerance and weak security in the hierarchical system. Therefore, in response to the problems mentioned above, the STBFT algorithm ensures that the allocation of each node is random, preventing malicious nodes from colluding with each other. All nodes will verify and confirm the location information in the system. In addition, the primary nodes in each consensus domain do not participate in the consensus of the group domain but only send transactions and collect consensus results, reducing the communication burden of the primary nodes. We also specifically designed the system structure of the algorithm, which can perform mutual supervision between the upper and lower layers and provide timely feedback when nodes generate malicious behavior, allowing the system to take measures when nodes have Byzantine behavior to improve fault tolerance. Theoretical analysis and simulation results show that the STBFT algorithm significantly reduces the communication overhead, has higher fault-tolerant performance than the Multi-Layer PBFT algorithm, and improves the system’s security. However, when the number of faulty nodes in the system is fixed, the PBFT algorithm can tolerate one-third of the nodes, while the STBFT algorithm is far from fault tolerant, and many application scenarios not only require scalability but also have high requirements for delay. Therefore, future research includes solving the delay and fault tolerance problem in the hierarchical system so that the consortium chain can be applied to more severe and complex environments.

References is not available for this document.

A Scalable Byzantine Fault Tolerance Algorithm Based on a Tree Topology Network

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Work