Compatibility and upgrades
Business networks and compatibility zones
Many P2P blockchain systems have some concept of a “network” — a set of configuration parameters that have to be shared for nodes to talk to each other. In the Bitcoin protocol these are things like the hash of the genesis block, the difficulty retargeting intervals, the inflation formula and so on. There’s also some shared infrastructure required, like the seed nodes. Many “alt coins” are in reality just Bitcoin with these magic constants tweaked.
In Corda nodes also have to share some basic things in order to talk: a root identity certificate authority, at least one notary (distributed or centralised, doesn’t matter), a network map and a few numerical constants like the event horizon (how long a node can be offline before pending messages can be cleared).
Corda also currently has a little known concept with the vague name of “services”. A service is just a string that is advertised by a node via the network map … effectively broadcast to the whole network. It’s similar to the same concept in the Bitcoin protocol, although Bitcoin uses bit flags rather than strings.
Our thinking on both these things has evolved over the past few months. Here’s where we want to go:
- What was previously called a Corda “network” will start being called a “compatibility zone”, reflecting more accurately its purpose of ensuring nodes can talk and trade with each other.
- Corda “advertised network services” will be deleted before 1.0 and replaced with business networks in a future release. A business network is a set of nodes that are authorised to use an application, by some entity that can sign certificates. Apps can check that peers are in the same business network before proceeding to talk to them.
The goal of this proposed change is to address these issues:
- The word network is one with many complex connotations, and in particular, business owners tend to assume that “joining a network” or “owning the network” is something with serious implications. However there’s really not much business benefit in running a compatibility zone any more than there’s a business benefit in creating your own replacement for the root DNS servers on the internet. It’s helpful if everyone shares the same set, but it doesn’t give the operators much power (if anything it’s a liability as it takes work to maintain). Thus zones are a technical concept because nodes need to agree on some basic parameters in order to communicate, but that doesn’t have much strategic impact in a decentralised system.
- The previous concept of a network service wasn’t well thought out. It was imported from Bitcoin without much customer requirements analysis or design work simply to unblock other things that were higher priority. Because it wasn’t subject to a thorough design doc process, and because the concept is vague to begin with, it’s been used in inconsistent ways and that has led to obscure bugs in some cases.
- Unauthenticated broadcast strings are a poor fit for what people usually want to do — create gated communities with some sort of entrance and exit procedure, and where there is no risk of some random firm you’ve never heard of initiating interactions with you. That is, if you install an app for managing interest rate swaps, maybe that doesn’t mean you want to trade derivatives with a one-man operation the other side of the world who is also using Corda and happens to have a copy of the app. For example you might want to only trade with entities that have agreed to some shared (paper) contract beforehand governing their behaviour.
- Network services reveal who runs an app, which may sometimes be commercially sensitive information.
As the notion of a business network encompasses authorisation by an identified owner, it makes sense to also use PKIX for this. The owner of a network can be a certificate authority that signs a copy of the node’s identity certificate, and being ejected from the network can be implemented through certificate revocation. Apps that wish to restrict themselves to members of a business network can invoke a subflow at the start of their app flows that challenge the counterparty to sign with their business network certificate. It can be made largely transparent to app developers who would need to write only a single extra line of code in their app to enforce membership. A separate kit can be created to allow business network owners to manage membership e.g. based on a CA-in-a-box. Care must be taken that revocation checks do not reveal trading relationships to the owner of the business network. RPCs can be added to filter the list of all known peer identities by which networks they’ve passed challenges for.
Because Corda 1.0 won’t ship with business networks, they can be simulated in in a less private and scalable way by app devs just checking the counterparty identity against a remote server (e.g. via HTTP or LDAP) or a fixed downloaded list of members.
Modifications to the global ledger usually take place in the context of workflows, which is why Corda emphasises this. On other platforms you have to do your own inter-firm workflows and/or model it entirely on-ledger, but Corda has a dedicated framework for this. When flow code is upgraded, we need a way to deploy that upgrade on a live system.
In Corda 1.0 flows are intended to be relatively short lived. They coordinate low level operations like gathering signatures, verifying a payment is expected, propagation of errors and so on. Most flows last for less than a second if no counterparties are offline. Even once we add support for human interaction and blocking on other MQ-based services, lifespan may be at most days e.g. over a weekend.
Given the short lifespan of many flows, writing logic to interrupt one that’s in flight and then re-arrange its internals to be able to continue with the new code doesn’t always represent a good use of scarce engineering hours. Rather, you could just have two copies of the flow: the old code and the new one. The old flows can be left alone until they finish, and requests to start new flows go to the new code. We’re nearly there with this — we just need a way for a flow to claim session starts for a string that isn’t its own class name (i.e. the old class name), which is easy.
But then you have to keep copying your code around inside your app and have a plan for when the old code is deleted, which isn’t really ideal. Sometimes, especially for low traffic apps, maybe you just want to edit your code, restart the node and keep it simple.
To make this doable safely we need a new feature, flow drains. A drain is when you instruct your node via RPC or the shell to divert session start messages for the types you’re trying to drain (e.g. com.acme.foobarapp.*) to an alternative durable queue. Whilst an app is draining existing flows continue to run, but new starts are suspended (the other side will of course wait). Once the drain is complete there are no outstanding checkpoints or executing flows of that type, and it is safe to shut down the node, swap out the app for the new version, restart it and undrain. That will reroute the diverted session start messages back onto the node’s main message queue where processing will start against the new version. Of course the new version of the flow must be compatible with the older message sequences it may get from other peers on the network, but anything else it does can have changed by this point.
This feature isn’t likely to be a lot of work, but we’d need to ensure that the administrator UI and shell print prominent warnings when a drain is in effect to ensure admins can’t forget to undrain.
Bundled JVMs and Java 9
We do not have any particular timeframe in mind for upgrading to Java 9. We appreciate that some organisations have policies around when they upgrade Java versions.
Whilst we currently ship Corda as a set of ordinary JARs and ask users to bring their own JVM to the party, that will likely be changing soon after 1.0 to us providing native packages for each platform we formally support in production (e.g. tarballs or deb/rpms on Linux, other platforms TBD). The native package will bundle its own JVM along with any native files required like SGX enclaves. Bundling the JVM has the following advantages:
- Because CorDapps are at least partially shared, we would find app developers mandating Java versions required by their apps anyway. The worst case scenario is two apps that specify incompatible version constraints on the JVM version used e.g. due to bug fixes and support policies. By pinning JVM version to Corda platform version ourselves, there is clarity on what exact runtime can be expected by app developers.
- The deterministic JVM strategy requires some small JVM patches in a few places. Our UI tools like DemoBench and Explorer use the JetBrains patchset on top of the JDK because they benefit from bug fixes that aren’t yet shipped by Oracle upstream.
- We can test more effectively and give better quality assurances when we know the runtime version.
Java 9 introduces at least three features that are strategic for Corda and thus at some point — probably next year — we will produce a migration plan so developers will know when they can start benefiting from it.
- Jigsaw modules. As Corda is an app container, the module system will help do dependency management for CorDapps and avoid issues where different apps may have conflicting dependencies. Until then we are doing what most app containers do and just rolling-our-own with classloaders.
- Direct stack access. The Quasar engine that powers the flow framework currently requires @Suspendable annotations on methods that might be on the stack when a checkpoint occurs, which is annoying to remember especially as the framework is mostly transparent the rest of the time. In Java 9 the need for this goes away due to JVM improvements. Also, there are other improvements going into Java 9 specifically to support Quasar fibers (see my previous blog post)
- The new JVMCI interface, as used by the Graal compiler, may be important for future research work. We may discuss this at a later time.
Your feedback is, as always, tremendously welcome.