To demonstrate some of the capabilities of the simulator for system analysis, we will consider a simple example of a HL Fabric 2.0 system. First, we will describe the setup. Then, starting from default parameter settings, we will analyse a few design variations. We will see that DLT systems exhibit complex behaviour and are sensitive to small changes. Finally, we will discuss the design space.
The consortium consists of a varying number of organisations across Europe. We assume they play an equal role in the network, apart from ordering, if there are less orderers than organisations.
User requests result in two transaction types: queries and updates. We assume a 1:1 ratio.
We assume a homogeneous hardware topology across all organisations.
We perform parameter sweeps of N (number of organisations, x-axis), and observe two variables:
Furthermore, we assume the system to be balanced whenever this is possible. E.g. every client injects and every peer endorses an (asymptotically) equal number of transactions; orderers are distributed evenly across the datacenters. Balanced simulations avoid some artefacts of undersampling, but are of course in some respects an idealisation.
In HL Fabric, every update transaction needs to be endorsed by specified number of peers. In the default settings, endorsements from a majority of peers are required (floor(N/2)+1).
By default (or maybe more accurately: typically), every user request is mapped to one transaction.
HL Fabric supports Raft as ordering service. We will assume 5 orderers, irrespective of the number of clients or peers.
Maximum sustainable transaction injection rate per second vs. system size. All figures have the same legend: E stands for endorsements (with either maj for majority or 2), S stands for scaling of ordering service (no means 5 orders, yes means N orderers) and B for batching, with the value being the batching factor.
With these default settings, the system doesn’t scale well. The initial tps of ~1000 drops rapidly with N and the latency increases.
Studying the underlying data revealed the CPU to be the scaling bottleneck. This stands to reason. The creation and validation of endorsements involves expensive cryptographic functions. First, the number of endorsements per update transaction increases linearly with N. However, the number of endorsing peers, able to perform this work increases linearly as well. Second, and crucially, every endorsement for every update transaction needs to be validated by every peer after ordering. Thus, the cost of validation increases linearly (with N) per peer.
This first variation is a policy variation aiming at circumventing the CPU bottleneck by reducing the number of required endorsements. This has implications for security: with majority endorsement, no update can be pushed which is not agreed on by a majority of the N organisations. This secures the system against any minority deciding on updates. However, from a point of view of minimal viable security, the reasonable alternative is 2 endorsements. Here, no single organisation can perform arbitrary updates, as at least one other organisation must agree. Also, since DLTs are tamper-evident, any false endorsements can be identified via audit by any participant at any time. With 2 endorsements, the cost scales very differently:
As can be readily seen, the scaling behaviour has fundamentally changed. Latency stays constant, while tps increase up to N=80.
This second variation also aims at circumventing the CPU bottleneck, this time without reducing security (with majority endorsement). Instead, multiple user update transactions are batched into one transaction (4x and 16x). For the actual system, this behaviour needs to be programmed into the chaincode. Batching reduces the cost of endorsement and validation per user update transaction. However, the client update latency increases because:
The third variation involves the number of orderers. DLT systems revolve around reaching consensus on a single version of the ledger. The most common use case for consensus algorithms is to ensure reliability and availability: even if some nodes fail (crash), no data is lost and it is available through the running nodes. Using Raft, in practice 5 orderers suffice. The system can sustain 2 crash-failures. If a failure occurs, the system can tolerate one further failure, while rebooting or replacing the failed node.
However, if there are more than 5 organisations, not all of them can run an orderer. This would require N orderers. Clearly, this will allow the system to tolerate more crash-failures (ceil(N/2)-1, to be precise). But perhaps more importantly, this gives all organisations equal control over the ordering process.
The simulations are run with 2 endorsements, as the CPU bottleneck dominates for majority endorsements.
It is interesting (and somewhat unexpected) to see, that the ordering service only begins to become a bottleneck at N=80. With batching, the effect begins at slightly lower scale.
Finally, we investigate the best possible scaling behaviour for this setup: 2 endorsements, 5 orderers and batching update transactions.
As can be seen, the system shows good, scalable performance. However, theres clearly is an upper bound at around 3500 tps, indicating a bottleneck we have not yet considered. Closer inspection of the simulation data reveals this bottleneck to be TCP acknowledgements. i.e. the bandwidth of transferring data is limited by the delay of TCP acknowledgements (latency bound). To further increase tps, the system could e.g. be deployed on a network with lower latency.
We only show a fraction of the data available for analysis. Essentially, the full behaviour of the system can be studied in detail, including e.g. the following data:
The variations presented here represent only a fraction of the design space available to novel DLT systems, both in terms of dimensions and of range. An incomplete outline:
Finally, the simulator allows for far more detailed scenarios to be analysed. Some examples include: