Rationale for and tradeoffs in adopting a UTXO-style model

Posted on Posted in Blog
The following two tabs change content below.

Mike Hearn

Mike Hearn is a Lead Platform Engineer at R3. Before working on Corda, he was a Bitcoin developer and a senior software engineer at Google, where he worked on Maps, Earth, Gmail, bot detection and account security.

Latest posts by Mike Hearn (see all)

Corda is designed with the explicit aim of avoiding, to the extent possible, the scalability and privacy implications that arise from the use of global broadcast and the “global computer” data model.

Whilst the privacy implications of global broadcast are easy to understand, the scalability implications are perhaps more subtle. In a consensus system, it is critical that all processors of a transaction reach precisely the same conclusion as to its effects. In situations where two transactions may act on the same data set, it means that the two transactions must be processed in the same order by all nodes. If this were not the case then it would be possible to devise situations where nodes processed transactions in different orders and reached different conclusions as to the state of the system. It is for this reason that systems that use a block chain effectively run “single-threaded”, meaning all transactions are ordered relative to each other and applied serially.

In Corda, we assume the data being processed represents financial agreements between identifiable parties and that these institutions will adopt the system only if a significant number of such agreements can be managed by the platform. As such, the system has to be able to support parallelisation of execution to the greatest extent possible, whilst ensuring correct transaction ordering when two transactions seek to act on the same piece of shared state.

To achieve this, we must minimise the number of parties who need to receive and process copies of any given transaction and we must minimise the extent to which two transactions seek to mutate (or supersede) any given piece of shared state.

A key design decision, therefore, is what should be the most atomic unit of shared data in the system. This decision also has profound privacy implications: the more coarsely defined the shared data units, the larger the set of actors who will likely have a stake in its accuracy and who must process and observe any update to it.

This becomes most obvious when we consider two models for representing cash balances and payments.

A simple account model for cash would define a data structure that contained a list of balances at a particular bank for each account holder. Every holder of a balance would need a copy of this structure and would thus need to process and validate every payment transaction, learning about everybody else’s payments and balances in the process. All payments across that set of accounts would have to be single-threaded across the platform, limiting maximum throughput.

A more sophisticated example might create a data structure per account holder. But, even here, I would leak my account balance to anybody to whom I ever made a payment and I could only ever make one payment at a time, for the same reasons above.

The so-called “UTXO” (unspent transaction outputs) model would define a data structure that represented an instance of a claim against a bank. An account holder could hold many such instances, the aggregate of which would reveal their balance at that institution. However, the account holder now only needs to reveal to their payee those instances consumed in making a payment to that payee. This also means the payer could make several payments in parallel. A downside is that this model is harder to understand. However, we consider the privacy and scalability advantages to overwhelm the modest additional cognitive load this places on those attempting to learn the system.

In what follows, further advantages and disadvantages of this design decision are explored.

Pros

The UTXO model has these advantages:

  • Immutable ledger entries gives the usual advantages that a more functional approach brings: it’s easy to do analysis on a static snapshot of the data and reason about the contents, even as it’s changing.
  • Because there are no accounts it’s very easy to apply transactions in parallel, even for high traffic legal entities, assuming sufficiently granular entries.
  • Transaction ordering becomes trivial: it is impossible to mis-order transactions due to the reliance on hash functions to identify previous states. There is no need for sequence numbers or other things that are hard to provide in a fully distributed system.
  • Conflict resolution boils down to the double spending problem, which places extremely minimal demands on consensus algorithms (as the variable you’re trying to reach consensus on is a set of booleans).
  • Smart contracts are simply boolean functions and do not directly mutate state themselves. There are thus no problems with state corruption due to things like unexpected re-entrancy, as was the case in The DAO attack witnessed by Ethereum earlier in the year.

Cons

It also comes with some complexities that in practice must be abstracted from developers:

  • Representing numeric amounts using immutable entries is unnatural. For instance, if you receive $1000 and wish to send someone $100, you have to consume the $1000 output and then create two more: a $100 for the recipient and $900 back to yourself as change. The fact that this happens can leak private information to an observer.
  • Because users do need to think in terms of balances and statements, you have to layer this on top of the underlying ledger: you have to calculate a user’s balance instead of just reading it out of the database. Hence, balance calculation is a core feature of the “wallet” apps that are so vital to Bitcoin. In Corda our equivalent of the wallet is called the vault, but it does similar tasks.
  • Experience from those who have developed wallets for Bitcoin and other systems is that they can be complex pieces of code, although a significant contributor to wallets’ complexity is handling the lack of finality (i.e. the possibility of chain re-organisations).
  • Whilst transactions can be applied in parallel, it is harder to create them in parallel due to the need to strictly enforce a total ordering.

With respect to parallel creation, if the user is single threaded this is easy, but in a more complex situation where you might want to be preparing multiple transactions concurrently this can prove a limitation – in the worst case where you have a single output that represents all your value, you are forced to serialise the creation of every transaction. If transactions can be created and signed very fast that’s not a concern. If there’s only a single user, that’s also not a concern.

Both cases are typically true in the Bitcoin world so users don’t suffer from this a whole lot. In the context of a complex business with a large pool of shared funds, in which creation of transactions may be very slow due to the need to get different humans to approve a transaction using a signing device, this could quickly lead to frustrating conflicts where someone approves a transaction and then discovers that it has become a double spend and they must sign again. In the absolute worst case you could get a form of livelock.

The tricky part about solving these problems is that the simplest way to express a payment request (“send me $1000 to public key X”) inherently results in you receiving a single output, which then can prove insufficiently granular to be convenient. In the Bitcoin space BIP 70 was designed to solve this: it’s a simple binary format for requesting a payment and specifying exactly how you’d like to get paid, including things like the shape of the transaction. It may seem that it’s an over complex approach: could you not just immediately respend the big output back to yourself in order to split it? And yes, you could, until you hit scenarios like “the machine requesting the payment doesn’t have the keys needed to spend it”, which turn out to be very common. So it’s really more effective for a recipient to be able to say to the sender, “here’s the kind of transaction I want you to send me”. The Corda flow framework makes programming such negotiations much simpler.

I wrote about this problem and techniques to minimise it in this article from 2013. It coined the term “merge avoidance”, which has never been implemented in the Bitcoin space, although not due to lack of practicality.

A piece of future work for the Corda vault implementation will be to implement adaptive fragmentation and compaction of the fungible states within it, to ensure a good balance is reached between transaction size and allowed parallelism.

Finally, it should be noted that some of the issues described here are not really “cons” of the UTXO model; they’re just fundamental. A relational database design that puts too much write traffic onto a single row would see large numbers of transaction rollbacks and correspondingly reduced throughput. Likewise, creating a single “dollar” smart contract on Ethereum would result in eventual difficulty in updating it consistently and at speed.

Start the discussion at discourse.corda.net