Advanced app interop on Corda

March 13, 2019

More detail on how apps can work together

Corda 4 introduces new app interop features, like the ‘class carpenter’

A big difference between Corda and other industrial blockchain platforms is the extent to which Corda prioritises different apps working together. This may seem like not a big deal — nearly everyone building platforms claims to care about things like flexibility, interoperability and the like.

But dig a little deeper and what we see is that allowing apps to be mixed’n’matched on a global ledger is difficult; it requires many different features and design decisions to come together at just the right moments. A typical way to deploy blockchain technology in the field is thus via single-use networks. Although this makes it a bit easier to get to production, and certainly simplifies life for platform designers, it means you lose many of the benefits of DLT!

In this article I’m going to explore just one or two of these interacting features in Corda, with an illustrative example of how several apps in the same rough business area can implement shared standards so generic tools can work with all of them at once. The goal is to describe design patterns that application developers can adopt.

Illustrative scenario

We’ll use insurance contracts as an example. I should start by observing that I’ve never worked in insurance and am very far from an insurance domain expert, so if you do have insurance expertise please accept my apologies for mangling your industry terminology 🙂

Insurance is interesting because there are two major industry consortiums building on Corda: b3i and the RiskBlock Alliance. These two cover the bulk of the industry, meaning insurance is betting on Corda for getting the benefits of blockchain technology. Perhaps in future other people will also write insurance-oriented apps.

Insurance is a complex domain and thus app developers may create data models for the same industry concepts in different ways, with different features depending on prioritisation and regional differences. But some things are common to all insurance activities. It would be nice if contracts managed by software from both organisations would be to some extent compatible, so tools designed for people who don’t care about the details can still do useful things like trade them, put some basic information into user interfaces, sum their values and so on.

In other words, we’d like to be able to write software to the common subsets on which everyone can agree, without needing multiple different codepaths for each possible industry consortium. Abstracting over different implementations is of course a common task in programming.

Standards as JVM interfaces

In the insurance world there is a standards body called ACORD. Let’s imagine that through an ACORD working group app developers realise there are at least a few things they agree on. For instance, maybe every insurance contract has both an insurer and policy holder and they can always be represented at minimum as a piece of text, ignoring all other structured data (e.g. let’s say the actual representation of the identity may or may not always be a Corda Party object here).

We can represent this idea in Java using a simple interface:

package org.acord.corda.std;
interface InsuranceContract {
String getInsurerName();
String getPolicyHolderName();

This is way overly simplified, but you get the picture.

The Java world has lots of standards like this, albeit most of them aren’t industry specific. Many of them are organised through the Java Community Process (JCP). Each standard is given a number. A standard that is industry specific is JSR-354, the money and currency API. Java Specification Requests have working groups, and they produce bits of software as outputs (i.e. JAR files), however those bits of software don’t do anything useful directly. They exist only to precisely expose APIs that implementors follow to the JVM, so programs that don’t care which implementation is used can be written in terms of the JSR interface names instead of concrete class names.

This model works well and is familiar to developers. We work with such interfaces all the time.

Standards for tokens

I’m using insurance as an example here to emphasise that all this can be done independently of changes in Corda, but there’s one area where we expect to see lots of different apps that all have a similar shape — tokens. We’re developing a “tokens SDK” to codify various best practices and features for tokens on Corda, provide a higher level API than what the core provides and build a more industrial strength replacement for the finance CorDapp that has been a part of the project since day one.

At the moment, the exact scope of the tokens SDK is in flux. The extent to which it defines shared standards vs shared code is an open question.

Implementing standard interfaces on Corda

Corda apps have two ‘sides’ — the on ledger code and the rest. In Corda 4 we’ve begun recommending and implementing a design pattern whereby you split your app into a contracts JAR that contains on ledger code, and a workflows JAR that contains the rest. On ledger code is attached to ledger transactions, and thus copied around on demand, alongside the transaction data itself.

Every Corda transaction can define a set of attachments — JARs referred to by their hash. The attachments are combined together into a unified classpath and thus these JARs can build on each other in the usual way. In Corda 4 we also started using the signing feature of JAR files, to ensure that Java package namespaces can be claimed by their rightful owner (defined as whoever controls the equivalent DNS name). When used properly this means only people genuinely authorised by the ACORD group can publish code under its name: this is very useful for security in a ledger environment with untrusted counterparties.

In Corda 4 we also introduced some powerful new features in our serialisation engine: calculated properties and class synthesis. They’ll help with this scenario and we’ll get to them in a moment.

This gives us the building blocks of what we need. A working group can define a set of standard APIs in the form of JVM interfaces, produce a JAR file containing them named using the website of the standards body, create a key for signing it to give it a strong identity/meaning, and publish it. This JAR can then be made a dependency of any CorDapp that wishes to implement this API in the usual way, by adding it to the app’s build.gradle file. NB: There are some details we’re still working on around how exactly dependencies are expressed to ensure the set of attachments ends up being correct, but those are a bit low level for this article — as we finish off the last details around multi-CorDapp interactions, we’ll publish more documentation covering exactly how to do it.

Now the concrete code provided by each application vendor can implement shared interfaces on ledger. For instance, the implementation of the getInsurerName method can read and format a name from the data model, for example by formatting a Party object to a string, or using some other app-specific identifier.

What does that mean exactly and how does it benefit us?

Generic vault queries

Imagine we want to write a tool that knows how to work with a vault filled with insurance contracts we’ve purchased. The tool does nothing especially fancy; maybe it produces a nice Excel file that summarises the vault and emails it to someone. It could use a library like Apache POI to do this.

The simplest way is to write an app that has every possible insurance CorDapp as a dependency, along with the shared interface JAR defined by the ACORD working group. The code in the tool can now issue a vault query for any state that implements the standardised org.acord.corda.std.InsuranceContract interface and read the details of the participants as strings. The node will query the database for any state that implements the correct type.

If any new app is introduced that also defines insurance contracts it just has to declare that it conforms to that API and be included in the reporting tools dependency list. If the tool needs to be extended to extract details that aren’t standardised, no problem — just cast the state object to the vendor specific type and continue.

On the fly class synthesis

It’s unfortunate that our little reporting tool actually needs a copy of every possible insurance CorDapp, or at least the ‘on ledger’ contract JARs. It’d be nice if we could read data from the ledger without needing local copies of all the apps.

The standard way to do this is to just strip away the whole static type system and download data from the vault in a generic form. This is the usual way blockchain and database systems work — they define a very basic dynamic type system and encoding for those types such as JSON, XML or SQL’s set of varchar, blob, timestamps etc.

That works, and you can do it in Corda by making your states implement QueryableStatewhich activates Corda’s support for automatically mapping ledger state to ordinary tables in your RDBMS. Then you can issue SQL queries directly, or use your favourite JPA compatible ORM. Every developer who knows how to work with databases knows how to use these technologies.

But another way is to stay within the Corda and Java type system by exploiting a novel feature of the Corda 4+ RPC subsystem: automatic class synthesis, combined with calculated properties. This gives you all the convenience and safety you get from the original approach of just having all the apps on your classpath, without actually needing them.

Corda’s wire protocol uses the AMQP/1.0 serialisation format. This format is flexible. It defines a compact binary encoding for data structures that’s similar to Google protocol buffers, and also defines a schema language. We then extend AMQP with a way to encode those schemas in the binary format itself, thus enabling messages to be compactly self describing, and define a 1:1 mapping of the JVM type system into the AMQP/1.0 type system. This is straightforward because the AMQP type system is quite rich and there are obvious equivalents for most JVM concepts.

So in Corda 4 we provide a new subsystem in the deserialisation engine, the so-called class carpenter. In normal object serialisation frameworks, if the deserialiser can’t find a Java class on the classpath that fits the data it’s loading from the wire then it will throw an exception and bail out. In Corda 4 it will instead use the schema data in the binary message data to construct a synthetic class on the fly, using bytecode generation and a custom classloader. This synthetic class has getter methods for each property that was found in the schema, and — crucially — implements any interfaces that the original object implemented if they can be found.

We call this subsystem the carpenter because these synthetic objects are reminiscent of the fake buildings movie studios created in the desert for spaghetti westerns — from the outside they look like real buildings, but if you walk through the door there’s nothing there. These synthetic objects are similar: they look like the original objects, but without any of the brains of the original: all you can do with them is extract data.

Now, the getInsurerName and getPolicyHolderName methods don’t necessarily correspond to any actual field in the underlying states. They are probably what we call calculated properties — their value is likely derived on the fly from other data. If we have all the apps locally then of course, the data will be re-derived inside the RPC client app by just using the app code directly. But if we don’t have the apps because we’re relying on the carpenter, the data has to come from somewhere. We can mark the interface properties @SerializableCalculatedProperty in order to ensure their values are calculated and stored to binary when the state object is serialised. If the app code is present on the classpath when deserialisation occurs the stored values will be ignored and recalculated, but if the carpenter is in use, the stored values will be returned from the synthetic interface methods.

Why do things in this odd way? There are several advantages.

First, these generic structures are ‘plain old Java objects’ (or POJOs as they are sometimes known), which means they can be fed into any framework that uses reflection. For instance you can feed them straight into a JSON, XML or Yaml serialiser. This is how the new “blob inspector” tool works. You could also auto-generate GUIs from them. We could have defined our own generic variant types like XML’s Element, or Jackson’s JsonNode but Java reflection is a 20 year old API and lots of tools can use it already, so it makes sense to reuse it.

Secondly, because these classes implement named interfaces that are on the classpath you can cast them to those interface types and work with them as if they are real. If the interface contains methods that aren’t property getters, those will be implemented too but just throw an exception if called.

Thirdly, these types are exactly the kind of classes that scripting engines want to use. By using the widely recognised JavaBeans conventions, auto-generated types can be fed directly into dynamically typed scripting language languages that don’t care about type safety much anyway, which radically simplifies development and deployment. I gave a demo of this kind of inter-language interop at CordaCon 2018.

In future we plan to allow annotations to be persisted into the binary schema format and generated classes as well, so the self-describing messages can be extended to customise things like JSON serialisations, labels in auto generated user interfaces and more.

You can watch a tech talk about how class synthesis works here.

As a consequence the author of our hypothetical Excel generating tool has a happy discovery — he can just remove the app dependencies from the tool entirely and assuming he hasn’t written any app specific code, everything will transparently still work. The tool needs no adjustments to make it generic, at all!

Conclusion & caveats

There are many, many details involved in scaling an ecosystem to thousands or millions of interoperable apps. By building on standard Java design patterns, and bringing them to the on-ledger blockchain context, Corda enables app developers to do things in the normal way and have it mostly just work.

I say “mostly” because I’d like to finish with a small warning. Corda’s deep Java integration can make it seem like you can do anything on ledger that you’d normally do in Java. But it isn’t so. A distributed ledger is still an unusual environment with its own rules and quirks. Whilst we don’t currently enforce this, in future contract JARs will be run inside our Deterministic JVM. And in current versions of Corda, your code needs to manually attach dependencies of your contract JARs to transactions as they are built, and verify the hashes of dependencies are correct. Until Java 9 introduced the Jigsaw module system the JVM had no knowledge of how JARs related to each other and as Corda still uses Java 8 we inherit that limitation (we’ll resolve it in future versions).

Finally, the ledger is a strange environment in which you may in future encounter code and data created by malicious adversaries. To protect yourself, and to allow seamless cross-network decentralised upgrades, code should be signed and versioned appropriately. These are all areas we have done significant work on in Corda 4, but sophisticated design patterns and dependency structures are best avoided until we’ve had time to fully document and describe how to use them. Until then — keep it simple!

Advanced app interop on Corda was originally published in Corda on Medium, where people are continuing the conversation by highlighting and responding to this story.