June 29, 2020
By Dimos Raptis, Software Engineer at R3
In the last two versions of Corda Enterprise (4.4 and 4.5), we released a number of features that significantly improved the performance of Corda Enterprise. Our work in the past has focused on improving throughput, which is typically measured by the number of transactions completed per second (TPS).
However, we have seen a number of use cases where latency is also a critical factor, especially where there are transactions of high value and a large number of participants with long transaction chains. We knew that reducing latency would be good news for Corda Enterprise users working within capital markets and supply chain financing, or other use cases where there are multiple parties to a transaction and settlement speed is critical.
Our newly released features have been designed to reduce latency to benefit all these use cases, while also driving throughput improvements. This time we wanted to put some numbers on our performance improvements, so we took the opportunity to perform a detailed analysis on the benefits of these features. The results are in, and we are glad to share them with you in this post.
As a result of our analysis, we observed a reduction in latency up to 13 times and an increase in throughput up to 35% between Corda Enterprise 4.3 and Corda Enterprise 4.5. Any organisation using Corda Enterprise for purposes that involve signature collection and communication of the finalised transaction to multiple participants will see a substantial improvement if they upgrade to 4.5 (or 4.4). Specific industries can make use of tuning features to improve performance even further.
For users of Corda Enterprise, we found that a simple upgrade from 4.3 to 4.5 automatically improves performance, without additional tunning.
Read on to check our analysis methods and see full test results.
The features covered here are:
To perform our analysis, we developed a new CorDapp that could exercise scenarios with multiple nodes. In this context, a node can represent a party owning an asset or a party coordinating an exchange of assets between parties. This CorDapp modelled a simplified asset exchange with two basic asset classes – cash and stocks – where each asset was represented by a Corda state. This Corda state shows who holds an asset, how much it is worth and other key details.
The exchange node is responsible for issuing these assets to the various nodes in the network and coordinating the exchange of these assets between multiple of these nodes in an atomic way. This is achieved by creating a transaction with the appropriate input and output states for the assets that are transferred, calling CollectSignaturesFlow to collect signatures from all the parties that own the assets to be transferred, and then calling FinalityFlow to finalise the transaction and send it back to all the involved parties. The diagram below contains an illustration of such a network with 4 nodes that perform a swap transaction between each other.
Our tests were focused on two basic aspects of performance: latency and throughput.
In both cases, we deployed a single exchange node and a set of N normal nodes, which were split into two groups A and B of size N/2 each. The first group was initially issued cash states and the second group was issued stock states.
Each node from the first group formed a pair with a node from the second group. These pairs indicated the nodes amongst which the swaps of assets would be performed. Each Corda transaction that was performed during the test swapped two assets between the nodes of each pair. So, each Corda transaction had N inputs and N output states, which modelled the transfer of N assets.
When testing throughput, we generated load from multiple RPC clients to the exchange node triggering multiple, concurrent flows that executed the transactions described previously. This was done in order to saturate the nodes and achieve the maximum throughput possible. When testing latency, we generated load from a single RPC client that was expecting one flow at a time. This was done intentionally in order to measure the latency of the flow without overloading the nodes.
In the latency tests, we also generated a backchain by transferring bilaterally the states between the nodes of each pair before proceeding with the swap transactions, so that we could isolate the benefits derived from bulk transaction resolution and parallelised flows. We also repeated the experiments with a single exchange node and a varying number N of regular nodes (ranging from 2 up to 10) in order to measure the scaling with regards to the number of participating nodes. The structure of the CorDapp and the topology of the network were selected based on the setup of our customers in the asset exchange space in order to replicate realistic production scenarios as much as possible.
In total, we executed 3 basic tests:
For cost-efficiency reasons, we deployed multiple Corda nodes on each machine in some cases. Replicating the same tests with one dedicated machine for every Corda node is expected to achieve even better results in absolute terms, but the relative differences between versions we study in this post are a good representation of the improvements between Corda Enterprise versions.
Specifically, we had four dedicated machines that hosted Corda nodes and four dedicated machines for hosting databases for them. Two of those machines were used to host the notary and its database. The remaining six machines were used to host the remaining Corda Enterprise nodes and their databases.
PostgreSQL was used as a database, and in order to host the database of multiple Corda nodes in a single machine, we run a single PostgreSQL instance with multiple schemas, one for each node. We tried to split the nodes as evenly as possible. So, two machines hosted four nodes and one machine hosted 3 nodes, totalling 11 nodes, as described above. Similarly, two machines hosted four database schemas, and one machine hosted three schemas accordingly.
All these machines were based on Intel Xeon E5–2687W v4 and had 24 cores (48 hyper-threads) with 256GB RAM and local SSDs interconnected with a 10GBps network.
To test for latency, a pre-defined number of flows (400) were executed from a single client running on a separate machine, as described above. A backchain of depth 10 was generated for every state via bilateral transfers. We performed measurements starting with four nodes and going up to 10 nodes. This test required at least two pairs of nodes in order to trigger transaction resolution. We also repeated the test three times with versions 4.3, 4.4 and 4.5. The diagram below shows the results of this test, where the x axis indicates the number of nodes participating in the test and the y axis indicates the average latency of the flow that performed the asset swap in milliseconds.
We can observe that version 4.4 improves over 4.3 significantly thanks to the bulk transaction resolution and 4.5 improves even further thanks to the parallelised flows. We can also observe that 4.5 scales almost optimally, since the increase in latency from additional nodes is minimal.
To test throughput, the load was generated concurrently from 480 RPC clients that were spread across the four machines that were hosting Corda Enterprise nodes. This was done in order to saturate the participating nodes and prevent the load generator from becoming the bottleneck. Each one of the 480 clients run one flow at a time on the exchange node that was executing a swap transaction between the participating nodes. The diagram below shows the results of this test, where the x axis indicates the number of nodes participating in the test and the y axis indicates the average throughput measured in number of flows completed per second.
We can observe that Corda Enterprise 4.5 can achieve a significantly higher throughput when compared to 4.3. This can be attributed to the following factors:
We can also observe that the throughput decreases as we add more nodes and that the difference between the two versions gets smaller as more nodes participate in the transaction. This is mostly due to environmental issues. As we explained above, we had to deploy multiple nodes per physical machine for economical reasons. This means that as we add more nodes, each machine has to host more nodes and these nodes are sharing physical resources, thus interfering with each other. If the same test is repeated using a dedicated machine per Corda node, it is very likely that a much higher throughput would be observed for both Corda Enterprise versions, with a larger difference between the two versions when more nodes are involved.
The last test was focused on the impact of some of the new tuning options. The options we covered are those controlling the flush frequency of Artemis buffers (brokerConnectionTtlCheckIntervalMsand journalBufferTimeout) and the option controlling p2p messages compression (enableP2PCompression). You can consult the documentation for more details on these options. We tested five different variations:
The purpose of this last test was to investigate the effects of these options in both throughput and latency because some of these options introduce trade-offs between these two aspects. To this end, the tests described earlier in this analysis were repeated using the variations listed above. For the latency measurements, 10 nodes were used as we didn’t expect big interference between nodes in the same machine thanks to the low load. For the throughput measurements, only four nodes were used to avoid any interference between nodes in the same machine due to high load. The diagrams below show the results, where the x axis contains the variations used and the y axis shows the average throughput and latency achieved with each variation along with error bars indicating the standard deviation.
The main observations are listed below:
The main takeaways are listed below:
If you want a more detailed description of our performance testing setup, the associated tooling and the results of our experiments, you can have a look at the following pages:
Below you can also find links to videos with more a detailed presentation of the features covered here:
Share this post
September 25, 2020
September 21, 2020
September 17, 2020
Stay up to date on the latest news and articles related to Corda.