High Availability for R3 Corda Nodes

February 19, 2018


High Availability for R3 Corda Nodes

As we get closer to the general availability of R3 Corda we would like to highlight some of the unique capabilities our distribution brings for demanding enterprise workloads. R3 identified a set of requirements that are unique to demanding mission critical deployments. While Corda moves at pace as an open source project we have created R3 Corda, a commercial distribution of Corda open source blockchain platform, to meet these unique demands.

Expectations of High Availability (HA) in modern mission-critical systems are for them to recover normal operation in a few minutes at most, while ensuring minimal/zero data loss. HA is related to the capability for Disaster Recovery (DR), which requires the presence of a procedure to handle large scale multi-component failures such as data centre flooding, acts of terrorism and so on. R3 Corda has been designed for production deployment by commercial institutions and has had a strong roadmap for HA and DR functionality from the start.

In many blockchain-based systems transactions are globally broadcast and every node on the network maintains its own copy of the ledger. This naturally achieves N-level redundancy for ledger data, albeit at the expense of privacy since in this architecture every node is privy to every transaction. By contrast, strict privacy requirements of Corda necessitated technical design in which transaction data is only visible to its participants and to the consensus service, also known as Notary. In addition to enhanced privacy, this approach also unlocks unparalleled potential for network scalability and performance because, unlike others, R3 Corda network can process multiple transactions simultaneously. Please refer here and here for a more detailed explanation of Corda design principles.

In Corda, ledger resilience is achieved through HA deployment configuration of nodes. Typical financial institutions maintain large, complex technology landscapes in which individual component failures can occur, such as:

  • Small scale software failures.
  • Mandatory data centre power cycles.
  • Operating system patching and restarts.
  • Short-lived network outages.
  • Middleware queue build-up.
  • Machine failures.

R3 Corda is built on tried and tested technologies such as Java Virtual Machine and SQL, it supports the use of commercial RDBMSs and Cloud. This enables administrators to utilise existing technology capabilities and industry best practices to reduce the frequency and mitigate the effects of multiple component failures. At the most fundamental technical level the solution relies on redundancy for:

  • Network infrastructure.
  • Power systems.
  • Disk storage, such as RAID or Storage Area Network (SAN).
  • Processing units.

Furthermore, leading RDBMS vendors provide HA solutions for their database (DB) products such as ‘Always On Failover Cluster’ for Microsoft SQL Server and ‘Real Application Clusters’ for Oracle DB. Their respective Platform-as-a-Service (PaaS) offerings are also equipped with highly-available geo-replicated storage tier for filesystems and SQL databases. R3 Corda nodes are designed to be deployed within existing enterprise computing infrastructure and can take full advantage of such technical capabilities in both kinds of deployments scenarios — on-premises and in Cloud.

R3 Corda roadmap plans for staged delivery of HA functionality for nodes. Initial Hot-Cold design is described below, and follow-up blog posts will focus on Hot-Warm and Hot-Hot configurations which can address advanced requirements such as fully automatic failover and rapid node scaling.

The main objectives of the initial delivery of HA capability for R3 Corda nodes is as follows:

  • A logical Corda node should continue to function in the event of an individual component failure or (e.g.) restart.
  • No loss, corruption or duplication of data on the ledger due to component outages.
  • Continuity of flows throughout any disruption.
  • Support for software upgrades in a live network.

Hot-Cold node architecture depicted on Figure 1 below addressed these requirements. For Disaster Recovery nodes utilise standard geo-replication capabilities of cloud providers such as Azure DB and file store.

Figure 1: R3 Corda node Hot-Cold HA deployment

As shown above, in a two-node HA deployment scenario there is a ‘cold’ backup instance that can be manually started if the ‘hot’ primary instance is stopped. A load balancer monitors health of the primary and secondary nodes and automatically routes traffic from the public IP address to the only active end-point — the ‘hot’ node. (Note: load balancer is not part
of R3 Corda product offering). Nodes share a database which is deployed externally in HA configuration, and the Artemis messages queue stored on an shared resilient filesystem.

This architecture fits perfectly into infrastructures commonly offered by Cloud providers. For example, to setup an HA R3 Corda node in Microsoft Azure the following tasks need to be performed:

  • Create internet facing load-balancer on Azure platform (see here for the step-by-step guide of how to do this).
  • Configure load balancing rules for TCP ports 10002 for P2P, 10003 for RPC, and HTTP port 10004 for Web traffic (these are default Corda node port numbers, they can be changed to any useable value via configuration files).
  • Create Azure SQL database (see instructions here).
  • Create a general purpose Azure storage account with a resource manager deployment model, add a file share and create a persistent mount point for it in /etc/fstab.
  • Configure nodes to use the Azure database for ledger data and the configured mount point for storing files.

This will result in a fully-functional Hot-Cold HA deployment of an R3 Corda node.