It’s early – we’re learning
Here we are, the broader “Blockchain” community, a few years into our pursuit of private, permissioned Blockchain or more what we prefer to call Distributed Ledger Technology (DLT) for financial and general business-to-business (b2b) usage. There have been, and continue to be proofs-of-concept enabling enterprises to explore applicability of DLT to their respective businesses in search of its value to them. It’s still the early days of DLT adoption. That said, encouragingly, we’re seeing more attention on Non-Functional Requirements (NFRs), e.g., operationalizing, high availability, and performance, including benchmarking the ‘speeds & feeds’ that many of us are used to noodling over when evaluating technology we may adopt.
Expectations have been well advertised, though not always accepted, among the user community that applying DLT is not about shaving milli-seconds, maybe not even seconds from current processing systems. Rather it’s about reducing friction in transaction processes among multiple parties. That friction is often reflected subjectively as trust, or more maybe accurately stated as degrees of mistrust. It may also be quantitatively defined in terms of time savings; however, those time savings are more apt to be about reducing processing times from weeks to days, or from days to minutes. Rarely has the goal been about improving existing high-speed transaction processing systems. Not today, at least.
Even so, we’re human, we love speed, and some can’t get enough of it. That mindset drives DLT platform builders towards ways to improve their throughput. It’s an attention grabber, for sure. Hitachi produced a paper, “Work on the Potential and Challenges of Blockchain Technology” in early in 2017 that highlighted their observations of the top five Blockchain challenges. Processing speed came in at #2.
But we do have to remind ourselves to evaluate the context of the use case, and ask about the trade-offs, i.e., what, if anything, might be sacrificed in a platform design that focuses a great deal on speed? And we should ask the question how fast is fast enough? It depends on the use case. Deloitte captures this key point in one of their Blockchain papers, “… KPIs should be specific to each use case and directly aligned with the business problem attempting to be solved.” ~ Deloitte, “Taking Blockchain Live. The 20 questions that must be answered to move beyond proofs of concept”
DLT platform builders are each blazing their own trails in their attempts at carving a market for their respective offerings. Sometimes it’s a focus on an industry such as finance, supply chain, or identity. Then there’s a play to capture mind share with our ol’ friend speed, for which some platforms claim eye-browse raising throughput numbers. That begs the question, so what’s behind the numbers?
What’s in a number?
What’s being claimed? Wide ranging numbers, but we’re all having trouble reconciling the claims. The claims may very well be legitimate, but we should be suspicious, and require verification.
I recall attending a Blockchain-oriented meetup in NYC about a year ago that hosted Donald Tapscott as its keynote speaker. His popular talk was followed by two gentlemen eager to demonstrate their new DLT platform. They claimed to be getting tens of thousands of Transactions-per-Second (TPS). This had the room ooh-ing and ahh’ing… not really. Mostly, some nodded quietly, and some just stared wondering, I imagine, whether or not the statement was something they should take seriously, before they queued up for their free signed book from Don. Some of us, including the person standing next to me, wondered on what basis they could make such a claim?
Numbers are being offered by DLT platform vendors often times without offering much insight into their testing means. And this is completely ironic considering the popularity of “Blockchain” is largely associated with a quest for increased trust and transparency. Where’s the transparency in these numbers?
We should be careful to revert to our traditional view of TPS as typically applied to database and middleware technologies. And we should refrain from our nature of latching onto the single number offered to us, as impressive as it might seem. When we see a TPS claim for a DLT platform we should be asking ourselves, what’s behind that number, and be willing to look for supporting metrics, i.e., information that tells us things like:
- How is transaction defined for this claim?
- How is throughput defined for this claim, considering what were the start and stop points for measuring TPS?
- What number of peer nodes are meant to receive a copy of the transaction, AND are involved in validating the transaction, e.g., is the platform broadcasting, is it limiting interaction between parties?
- What consensus model is used to validate the transactions? What protocol? Is it probabilistic? What is the consensus delay?
- What data store is used?
- What peer communication protocol is used?
- What were the software and hardware environment conditions for the test?
- What use case(s) did the test workload represent?
- What testing tools were used to complete the test?
- May the results be replicated by other parties?
- What are the performance SLA for the target network?
And where’s the bottleneck typically found? It’s notable, and not surprising that “consensus” is the frequent target when it comes to zeroing in on the DLT aspects offering the greatest opportunity for performance improvement. Yet, it’s in consensus across a set of distributed nodes that presents the biggest challenge. For example, it’s evident to most of us that scaling the number of nodes in a broadcast network using a probabilistic consensus protocol such as Proof-of-Work presents an enormous scaling barrier.
This motivated a number of platform builders, including R3 to consider “performance & scalability” in their platform designs. For example, Corda limits the consensus interaction to only the parties involved in a particular transaction, along with the consensus pool needed to verify uniqueness, and validate the contract if requested. Other platforms, e.g., Hyperledger Fabric V1.0 have also taken a bespoke approach to minimizing transaction sharing. Of course, the primary reason for restricting transaction sharing is “privacy” under the principle, “the best way to keep a secret is to not share it.” However, this policy does also provide ancillary performance benefits. Some might debate the loss of network resiliency in such a restrictive model. The subsequent rebuttal would be to ask, how important is resiliency with respect to network purpose, and is resiliency provided in other ways?
That raises another key distinguishing factor among the various platforms, as indicated in the above list it’s important to ask what’s a transaction, and consider at what point is the transaction considered as “submitted” to the platform for verification and commitment. With Corda we know that a Corda application (Cordapp) builds a transaction, then signs and submits it with a verifying contract code to the platform with directions for specific parties, and for a Notary Consensus Pool to also verify the contract, sign and commit the transaction. This means that for Corda it may appear that the Cordapp is the entry point, but it’s within the Cordapp where we find the “finalizing” operation, that is the real interaction point, the transaction “submit” point with the platform.
How do we fairly compare? The good news is that there is an effort underway as a Hyperledger Working Group to address the concern over performance comparison fairness.
Hyperledger Performance & Scale Working Group
During the summer of 2017, the Hyperledger project formed a new working group, called “Performance and Scale Working Group” (PSWG) in which R3 participates. Their mission states:
“The mission of the PSWG is to discuss, research, and identify key use cases and metrics that relate to the performance and scalability of a blockchain and blockchain related technologies.
“The PSWG will serve as a cross project forum for architects and technologists from the community to exchange ideas and explore the performance and scalability aspects of the technologies, both software and hardware.”
It may or may not be surprising to learn that one of the obstacles for the working group’s progress is in defining typically simple terms, “transaction” and “throughput”. Defining these terms for a DLT world such that there is a reasonably fair way to compare platforms has been the wrinkle. For example, the group has recognized the importance of “consensus delay” as proposed in a 2016 USENIX Symposium reference paper, “Bitcoin-NG: A Scalable Blockchain Protocol”. about a scalable blockchain protocol. Although the working group has not concluded its work we’re happy to share some initial thoughts that have been debated.
Guy’s thoughts and suggestion to the working group:
“Transaction” in the context of Distributed Ledger Technology (DLT) aka “Blockchain” is an activity that, once final, transitions one or more object/asset states collectively, and atomically, from current state (which could be “none” if the asset does not yet exist) to future state:
“Throughput” is a volumetric measure of events over time. For DLT this is most reasonably represented as transactions-per-second (TPS) within well-defined start and stop boundaries. The boundary begins from the submission of a transaction to the DLT platform for verification, to the transaction commitment, recognized as final within the DLT network. Final means the network reaches a consensus to acknowledge, and commit the submitted transaction to a local data store; however that is achieved by the platform.
The working group has debated a “Throughput” definition, expanding it to incorporate the number of nodes in the network being a key factor, and that we should also consider that a consensus delay plays an important role in any network platform’s effort towards reaching transaction finality. The group is purposely platform agnostic, recognizing the variation of platform designs, acknowledging that some such as Corda, Hyperledger Fabric, and others, limit node interaction for any given transaction to only the parties involved in the transaction. Additionally, we respect that some platforms, again such as Corda, do not need to insert transactions into “blocks” in order to maintain a chain of custody about any asset lifecycle.
The group comprises participants from Intel, Redhat, Huawei, etc. as well as R3. Some come with performance benchmarking experience in their respective organizations. The team also includes members who are working on a performance evaluation tool, a “proposed” Hyperledger “Caliper” project. Although Caliper is focusing on Hyperledger project platforms only for now, they intend to be platform agnostic. Their involvement in this working group provides practical feedback on what they suggest are realistically measurable metrics, considering the proposed metric definitions being discussed by the group. Please note that the final working group’s agreed metrics and definitions may differ from what is shared in this blog. The current goal is to have a draft paper to the Hyperledger Technical Steering Committee by year end or early 2018.
What we’re doing?
At R3, we are building a performance framework practice that starts with establishing a performances baseline, whose test results will be available for regular review, and will be scenario based, e.g., Cash Issuance, Cash Payment, Deliver versus Payment, Multi-party transactions, etc. Executing tests on Corda V1.0 Open Source provides insight that we can leverage towards the Corda Enterprise development that’s underway. For example, it is clear, and not surprising, that the “consensus” will be the primary focus in our efforts to maximize scalability of Corda Enterprise.
Our testing approach will consider the impact of notary consensus pool configurations. When speaking of “notary” we are really meaning a “Notary Consensus Pool,” to which specific consensus protocol will be configured, e.g., RAFT, BFT-SMaRt, etc. These pools provide a uniqueness service by operating consensus over uniqueness by nodes operated by a set of distrusting entities. A notary consensus pool could differ by the protocol configuration, and by their size (number of notary nodes in the pool), and their location (for a given pool, notary node location could be in any geographic location). Given this we would be testing the impact of the size of a notary consensus pool, which consensus protocol is employed, and the effect of notary node physical location within a pool.
Figure 3 – graphical examples of testing conditions
We will also test the effect of increasing node participants, as well as the influence of transaction size, meaning the number of asset states and their associated proposed changes are included in a single transaction. Certainly, business scenarios help us determine the technical conditions that should be tested. Some examples:
- Cash Issuance – self-issuance, self-signed.
- Cash Payment – a simple exchange of cash between two parties.
- Deliver vs Payment – an atomic transaction involving the transition of two different state objects within the same transaction.
- Multi-party transactions – varying the number of signers.
- Notary consensus pools – with variations on:
- Consensus protocol, e.g., RAFT, BFT-SMaRt, etc.
- Location of the notary nodes within the pool, i.e., large geospatial separation.
Where do we go from here?
We continue to leverage what we’ve learned from our collective prior experiences including evaluating performance and scalability of other technologies, be they databases, middleware components, etc. And we agilely layer in what we expect are the additional salient DLT characteristics. Ultimately, it’s all about communication, sharing information that Corda community can utilize in their pursuit of building or joining a DLT network that suits their needs. To that end, our hope is to produce a report on Corda Enterprise performance in the first quarter of 2018.
Your input is invaluable towards a performant and functional DLT. We thank you for your continued participation in the evolution of the Corda platform.
A recording of a presentation of this content to the Corda Architecture Working Group (AWG) is available here http://bit.ly/2j2P70T
Thanks for input from Richard Gendal Brown, Mike Ward, and Rick Parker.