The road ahead

November 30, 2016


The road ahead

The code we open sourced yesterday represents about a year’s worth of work. Starting with one developer and a blank screen, we have steadily built up a team and codebase from scratch. I want to discuss the software a bit, and then talk about the roadmap from here on out.

The story so far
The first and most important thing to understand about the Corda code is that it is not finished, nor even at the level of an alpha release. Significant parts of the design laid out in the technical white paper are not yet implemented. Any apps built on the platform today will encounter scaffolding and missing pieces of infrastructure. They will also be built on top of APIs and protocols that are changing in backwards incompatible ways. We have a way to go to reach Corda 1.0, which will commit to backwards compatibility and be ready for production use.

Our focus up to now has been on experimenting with the design and building small demo apps to road test some of Corda’s unique choices. Some things we will turn our attention to shortly include:

  • Performance
  • Scalability (beyond making sure the design scales well)
  • Versioning and compatibility
  • Security

For these reasons, if you experiment with the code you should expect to find a node implementation that’s slow, missing some security checks and which requires data wipes each time you upgrade. It’s a deliberate tradeoff: we’ve chosen to spend our limited number of engineering hours on projects like integrating SQL queries with the data model, developing the flow framework, figuring out how to support multiple different consensus protocols on a single network, finding good abstractions for streaming data from the node to clients and researching hardware based privacy techniques (some of that work hasn’t been integrated into the Corda repository yet). These are the sorts of issues that can affect the fundamental data structures and protocols in use, so it’s best to figure them out early.

I use the term scaffolding to refer to bits of code that are put in place knowing that they’re temporary, but which are needed to support the construction of other modules. Here are some examples of scaffolding choices you’ll find in the current code: most tasks run on a single thread, to avoid spending time debugging threading issues in code we might throw out. We use a generic object serialisation framework instead of formally specifying a wire protocol. Key management is a temporary affair that doesn’t use deterministic derivation. The distributed notary uses the Raft protocol rather than a byzantine fault tolerant protocol. And so on.

Why open source now? Because the design phase is drawing to a close, and the time to start building Corda networks for real is now beginning. Open source communities work best when the goals are clear, and that requires a relatively solid vision for what the software is meant to do. The technical white paper we’ve published is not completely final and the design may still change between here and version one, but it does lay out the current vision in a lot of detail. If you want to contribute it should be easy to find a task you’re interested in.

Building testnets
The most obvious consequence of the focus on design experimentation up to now is that there is no Corda testnet. The next step in the project is to build one. Corda networks are permissioned and thus any testnet would also need a permissioning policy, but for developer purposes simply using email addresses as the identity and an open-to-all policy is sufficient to approximate a public network. Getting there means finishing, auditing and integrating the deterministic JVM sandbox code that’s currently sitting in a submodule of the main codebase.

One of Corda’s unusual design choices is that we reuse a standard (but slightly modified) Java virtual machine for running smart contract logic, vs creating a new VM and instruction set from scratch. In addition we do not impose any particular choice of language for writing contract logic. There’s no real consensus in the wider community on the best way to write such programs: some projects have developed custom Python or Java like imperative languages, others have created dialects of Haskell or Lisp, still more have created highly restricted domain specific languages. Fortunately there are languages of almost every conceivable style that target JVM bytecode, from bread and butter Java to languages like Whiley that integrate formal methods. Sofus Mortensen of Nordea Bank has contributed an experimental domain specific language for the modelling of financial contracts based on the 2001 paper by Peyton-Jones et al, “Composing contracts”.

The JVM sandbox code is not quite finished yet (there’s a list of remaining tasks in the accompanying README), but finishing and integrating it is one of the next steps. Because mobile code is tricky (all attempts at sandboxing non-trivial code have experienced security problems in the past), it will likely be some time until we feel comfortable enough to turn on the execution of un-whitelisted contracts that propagate across the network automatically … but that’s the destination we’re heading for.

Additionally, the current peer-to-peer and RPC protocols are only partially specifiable, due to the use of an ad-hoc serialisation framework. Selecting and implementing a final binary serialisation protocol will make the platform easier to work with from non-JVM languages as well as closing one of the remaining (known) security holes: nodes are currently quite promiscuous in what they will accept in the message streams. This process will take place fully in the open, so please do contribute opinions and suggestions in this area through our forum (although we’re pretty darn sure we want a tight, easily canonicalised binary protocol and not something text based).

One important thing to note here is that any Corda testnet, and most likely also any production network, will not attempt to fully automate handling of denial-of-service attacks by other nodes. This is for the simple reason that in a permissioned network with identifiable nodes and no global broadcast, a misbehaving node can simply be blocked by human decision. This is somewhat analogous to how the Tor network evicts or downgrades relays that are found to be acting against network policies. Thus although the Corda deterministic JVM sandbox does impose a slightly similar concept to Ethereum’s “gas” cost, the purpose is to ensure deterministic termination for excessively large computations, not throttling usage of network resources.

The road ahead
Along the way we will continue to develop new demos and proof-of-concept CorDapps, to find out which use cases the distributed ledger concept works well for and which aren’t quite such good fits. And we will implement or finish off features that are described in the tech white paper, but lagging behind in the code.

Some features we plan to work on next year are:

  • A notary using a BFT protocol, most likely BFT-SMaRT. The Raft notary we have today won’t be abandoned, as it’s become clear from discussions with financial industry stakeholders that threat models vary significantly and sometimes legal assurances are good enough that performance and low latency would take priority over byzantine fault tolerance. Corda allows for algorithmic agility within a single network.
  • Data distribution groups, for times when the set of parties that need to see a collection of states isn’t derivable from the states themselves (this is an approximation of a limited broadcast/gossip mechanism, please see the white paper for details).
  • Scalability, performance, security and privacy.
  • Ensuring all code samples have Java versions as well as Kotlin versions (this is just a documentation issue, you can write CorDapps fully in Java today).
  • Client libraries for .NET, to make it easy to integrate Corda with the Microsoft ecosystem.

The endpoint of this journey is Corda 1.0, so named not because it will be “done” or have every feature users might want, but rather just because version 1.0 of any platform typically signifies the point at which people’s apps stop breaking. At that point Corda’s network protocols will be stable and evolvable, the node plugin APIs will be backwards compatible and it will be possible to build production ready networks. Hopefully, we will arrive at this point by Q3 2017 — but you know how it is with software estimation.