Integration with internal systems

Posted on Posted in Blog
The following two tabs change content below.

Mike Hearn

Mike Hearn is a Lead Platform Engineer at R3. Before working on Corda, he was a Bitcoin developer and a senior software engineer at Google, where he worked on Maps, Earth, Gmail, bot detection and account security.

Latest posts by Mike Hearn (see all)

Any developer working with a distributed ledger of any kind faces the question of how to integrate this new-fangled database with internal systems. By systems, I mean both software and human-oriented business processes. As a decentralised database is not a “company in a box” no matter how sophisticated it is, such integration is inevitable and not a matter of legacy vs new systems – ultimately all users will discover a need to connect custom software to the ledger, and have individual employees respond to and enact changes on that ledger. Corda’s design considers this requirement from the start, and here’s our plan to implement it.


New flow features

One of the platform’s key concepts is that of a “flow”. At a technical level flows are continuations that are checkpointed into the underlying node database when they need to wait for an inbound network message. The checkpoint contains not only serialised stack frames but also all objects reachable from that stack, with big “service” objects represented with tokens that are properly wired up when the flow is resumed. As peers buffer messages to offline nodes, you can restart a node and have all activity continue without interruption.

But that’s a low level view that misses the bigger picture – flows are really a way to orchestrate all the actions needed to reach agreement and then change the ledger in some way. The Corda M6 release contains the “two party trade flow” which shows how two parties can build a single transaction that swaps ownership of two different assets, allowing for atomic “delivery vs payment” style trades. It’s a simple example of why flows are useful: this procedure involves quite a bit of back and forth interaction in order to build the final signed transaction … and it doesn’t even implement a negotiation process!

A real trading system built on top of this would have many other requirements. For instance, signing the final transaction might require a human to use their dedicated signing device. There might need to be database lookups, or interactions with entirely different systems to decide what price to offer. Perhaps a ticket with details of the trade needs to be opened against an auditing department. None of this code makes sense to put into the shared CorDapp that defines the protocols and state schemas, as these details are specific to an organisation and involve data that will never even appear on the ledger at all.

Currently, flows may only send and receive messages with other peers on the P2P network. But that is going to change: in 2017 flows will gain the ability to interact with other internal programs as well. By designing the standard flows to have plenty of hooks and overridable (virtual) methods as part of their API, it will become possible to build one CorDapp that extends another, and then redirect the flow implementation from the base app to your plugin. In this way the standard logic used for updating the ledger will also drive the act of updating and checking with your internal systems.

As flow code is just ordinary JVM bytecode, the most obvious way to do that would be to just use whatever protocols and client libraries you want directly (e.g. an HTTP library to do a REST call, an SMTP library to send an email …). But that wouldn’t integrate with the checkpointing system meaning if the node went offline and was restarted, the HTTP request or email send might end up happening twice. That would be undesirable.

So instead we are thinking of introducing two new variants on the existing send/receive/sendAndReceive APIs:

  1. Sending to and receiving from plain message queues
  2. Sending to and receiving from people

Behind the scenes, people would be modelled as message queues as well, and thus both of these can be seen as ways for flows to suspend on message queues that aren’t connected to the P2P network. As such it’s a minimal adjustment to the current implementation.

A Corda node would then have a series of micro-services connected to it that bridge the flows to internal systems. For example if you have a system that is most easily used from .NET, you could run a Windows micro-service that connects to the Corda node via MQTT and processes requests one at a time, posting the results back to the labelled queue. If your goal is to notify a human of something that happened, messages can be consumed from the queue and converted into emails.

Sometimes your desired interaction with a person also requires some kind of response. A typical example is when you need a signature only a specific employee (or set of employees) can provide. If that person is sick today, it may take some time for the transaction to become approved. It’s for this kind of reason that the flow framework assists you with checkpointing and resumption – your state will sit cheaply on disk instead of consuming RAM and a thread, waiting for the person to come back to work. Because simple form-filling with support for client-side generated signatures may often be useful, we plan to enable support for these kinds of interactions in the Explorer app. When an existing ticketing system is available,  a bridge for that can be easily developed instead.

It’s likely that R3 will provide a series of bridges to common enterprise systems once the base flow APIs are extended.


Database integration

A common desire of app developers is to be able to port parts of their existing software to a distributed ledger whilst preserving all the rest. As such apps are often classical database driven apps, this requirement is supported by Corda’s ability to write states through to a set of tables in a standard relational database engine. Whilst currently only an embedded H2 database is supported everything is built on standard Java database access technologies, so we anticipate adding support for other external database engines in future (Oracle, Postgres, MS SQL Server etc).

Having access to your state data in relational form makes it easy and quick to analyse using regular SQL, to create views over that data, to use triggers as an alternative to listening via RPC, and to combine on-ledger data with private existing datasets using regular inner or outer joins. From the perspective of the app developer (or porter), the primary difference is thus how writes occur – those cannot be done via SQL and must currently be triggered via Corda’s RPC mechanism. Direct SQL access is convenient but is ultimately not an abstraction: we don’t try to hide the fact that you’re not working with a traditional on-site RDBMS.

Although our support for this is reasonably good, it is – like the rest of Corda – not yet production ready. Amongst other things, our roadmap to productionising this feature includes:

  • Alignment of RPC and database users for the embedded case. Currently the H2 database is configured with a default root user, and thus needs to be locked down.
  • Ensuring that per-table access control is such that accidental attempts to run UPDATE on ledger-controlled tables fail.
  • Allow the database to be split out onto a separate machine instead of used embedded.
  • Teaching the node how to properly suspend itself if the database connection breaks or permissions are broken, until access is restored.
  • JMX exports to track read/write latencies and other useful metrics from the node’s perspective.