By: Dries Samyn, Principal Software Engineer
Corda 5 introduces a sandbox to the Corda infrastructure. This is a necessary addition in order to achieve multi-tenancy and high availability. This article will explore the benefits of the sandbox and what it means for Corda users.
In follow-up articles we will explore this topic and its implementation more deeply.
What is a sandbox in software engineering?
In the real world, the aim of a sandbox is for sand to be contained to a particular area, and not contaminated by external elements, so kids can safely enjoy playing in clean, uncontaminated, sand without messing up the rest of the garden or playground.
In software engineering we define a sandbox as something—an environment or context—which is isolated in some way from another.
Anybody who has children or is young enough to remember the playground will recognise that sand is often found outside the sandbox, and more importantly, undesirable substances are often found in the sand. The boundaries of a real world sandbox can be quite effective at protecting the rest of the garden from its mess, but are less useful when it comes to keeping things out.
In figurative terms, the ‘sandbox’ has come to have a wide variety of contextual applications. You may be familiar, for example, with the sandbox as a safe area to ‘play in’—for example to test an API or try out a new product. This lines up closer to the characteristics of the real-world sandbox, but is very different to the sandbox in operation system design—here, it is an essential security measure.
The Corda 5 sandbox is more aligned to the latter of these two examples. We are not referring to a playground environment, but an essential part of the Corda 5 architecture, used to support Corda’s stability and security when operating in a highly-available (HA) and multi-tenant configuration.
Corda 5, high availability (HA) and multi-tenancy
In order to understand why sandboxes are necessary, we need to dig into 2 major new features of Corda 5:
- High availability – Being able to operate Corda in a hot/hot, reliable way to provide high uptime.
- Multi-tenancy – Support for ‘Virtual Corda Nodes’ to share a single Corda Installation to improve the Total Cost of Ownership (TCO) in certain scenarios.
The concept of high availability is quite easy to understand (however, harder to achieve). Multi-tenancy, on the other hand, is a new concept in Corda that deserves an explanation.
Imagine a Corda 4 network, like in the ‘Corda 4 – distinct networks’ diagram. Each node (superhero) on the network represents an identity, and is backed by a physical (JVM) process. In case an identity is represented in multiple, unrelated, networks (the DC or Marvel Universe), the organisation that represents this identity hosts two nodes or processes.
Corda 4 – distinct networks
As illustrated in the ‘Corda 5 – Multi-tenancy’ diagram, Corda 5 introduces the concept of ‘Virtual Nodes’ to allow multi-tenancy. As per Corda 4, a Virtual Node represents an identity (identified by its X500 name), but now, a Corda instance can host multiple Virtual Nodes.
Furthermore, a Corda instance can take part in multiple, unrelated, ‘Application Networks’, and as you can see in the 2 diagrams, Axel Asher, represented in both the DC and Marvel networks, exists as 2 physical Corda Nodes in Corda 4, but can be supported by 2 ‘Virtual Nodes’ in Corda 5, meaning they can co-exist in the same Corda instance.
Corda 5 – Multi-tenancy
What does this have to do with sandboxing? To answer this, we have to first look at the Corda programming model.
CorDapps, the programming model and operational stability in Corda 5
A CorDapp (Corda Distributed Application) is written in Kotlin or Java, compiled into Java bytecode, and then loaded into the Corda process at runtime. This means Corda is effectively an application server.
This programming model is very powerful, however, it presents some challenges for the Corda Platform.
In Corda 4, we assume that you’re running your own node, which is akin to you playing with your own sand in your own garden. And we also assume that you’re a responsible adult: we trust you to take care and we assume you’re not going to deliberately ruin your grass. So Corda didn’t have a sandbox. In Corda 5 we can’t make these assumptions. In Corda 5, it’s more like the garden is shared amongst multiple houses or apartments. We can’t assume everybody will be careful and some of them may be positively malicious.
Putting this into a concrete example, imagine the example of the 2 Application Networks (Marvel and DC), and a particular Corda instance supporting an identity on both networks (e.g. Axel Asher in the diagrams above).
DC and Marvel would write their CorDapps independently, and let’s imagine they both define a
Superpower class like so:
|The Marvel Universe||The DC Universe|
Within their own application, both of these definitions are valid. However, loading both classes in the same Java Virtual Machine (JVM) would result in a runtime error.
Fortunately, we can resolve this by using a different Java
ClassLoader for each application. We use OSGi to manage these class loaders, but more on that in a follow-up article.
ClassLoader effectively gives us a sandbox, i.e. it keeps classes related to a particular CorDapp (or classes from a CorDapp’s dependencies) isolated from another.
Isolating CorDapp classes and dependencies in this way also means that they are isolated from the Corda host process. This means that CorDapps no longer share libraries with Corda itself, and therefore can use their own version of 3rd party dependencies, for example.
Sandboxing and security in Corda 5
Class Loaders are an effective way of isolating CorDapps in the JVM in order to support operational stability, however, they offer very little security protection.
Imagine the above example, but we have a Corda 5 instance that supports multiple identities, each represented by a virtual node. This means that each Virtual Node can interface with their own Vault. How do we prevent a CorDapp, which is ‘just’ JVM bytecode, interfacing with the ‘wrong’ Vault or signing a transaction with the wrong key? Or how do we prevent it accessing the call stack from another Virtual Node’s in-progress Flow execution?
Corda Node and Cluster Operators establish a level of trust when multiple Virtual Nodes are co-hosted, as they agree to use a given, signed, CorDapp, which means a malicious attack may be unlikely (however cannot be ruled out) and, furthermore, to ensure integrity, we must rule out even accidental spilling into or out of a Virtual Node’s sandbox.
In Corda 5 this is done using a combination of OSGi bundle hooks and the OSGi/Java Security Manager.
The Security Manager, for example, prohibits CorDapp code reflecting over Corda platform classes, and OSGi bundle hooks ensure only specific CorDapp bundles are visible inside the Virtual Node sandbox.
We will dig deeper in to our use of OSGi and the Security Manager in follow-up articles.
Sandboxing and HA in Corda 5
As illustrated above, the need for
ClassLoader isolation is obvious in a multi-tenancy world, however we also make use of it to support our high availability (HA) requirements.
HA requirements mean we need to be able to install and upgrade CorDapps without impacting other CorDapps operating in the same Corda process, and to provide a way to allow CorDapps to upgrade without the need to stop existing flows or the host process.
This requirement of being able to ‘hot load’ CorDapps, means we can potentially load multiple versions of a given CorDapp or library. Therefore, sandboxing and isolation are required in order to support HA.
Sandboxes in Corda 5 are powerful because they support Multi-tenancy and high availability, however there are also some limitations we must be aware of.
Firstly, software sandboxes, can be impacted by bugs or attacks, meaning it may be possible for a malicious actor to compromise or ‘break out’ the sandbox. To minimise this risk, first and foremost, good care should be taken to keep systems updated and configure software according to best practise, however there are a few additional mitigations against the likelihood of such attacks, such as worker isolation and CorDapp signing.
Secondly, when creating a sandbox, we must decide on the height of the barrier around the sandbox. Make it too low, and we risk sand spilling out of the box or undesirable things coming in, and if we make it too high, we may limit the usefulness of what’s in the sandbox.
Practically, what this means is that when we apply a Security Manager policy, we must find a balance—between setting rules that prevent breaking out the sandbox, but allowing the CorDapp running in the sandbox to do useful things such as use reflection, make HTTP requests to external services, etc.
Corda 5 supports configurable Security Manager policies. The strictest policy will be applied by default, but a Corda administrator will be able to override this policy if required. We will cover in detail how this works and how we can customise policies in a follow-up article.
The Corda 5 worker architecture
Covering the worker architecture is beyond the scope of this article, but I mention it here as significant in context of isolation, outside the sandbox.
In the above example, we can see that a single Corda Instance is made up of a number of these ‘workers’, for example, Database Worker, Flow Worker, Crypto Worker.
Sandboxes exist in the Flow and DB workers because these host ‘external’ (CordApp) code. The flow workers execute flows and verify contracts, for example, and the DB workers connect to the Virtual Node databases. The Database Sandbox can be much more restrictive compared to the Flow Sandbox as we are not expecting to execute un-trusted code except from parsing user-defined JPA entities.
These sandboxes are opportunities to ‘break out’, and therefore it makes sense to isolate these processes to protect the other processes such as the Crypto workers which have access to to sensitive cryptographic materials (Keys and Certificates).
Alternative sandbox models
We made a decision to use sandboxes within the JVM and there are, of course, alternatives to this. Why did we not use process/container isolation per Virtual Node, for example?
Imagine, for example, a network with plenty of nodes interacting very infrequently. Maybe they transact a few times per day, maybe a few times a week or month. Or, to put forward a more complex example, imagine an environment with nodes occasionally processing a large amount of concurrent transactions in between ‘quiet’ periods where they are idle (daily reconciliation processes for example).
Corda 5’s ‘Virtual Node’ architecture means that there is no link between a given Node and a given process. Meaning, in the ‘busy’ node scenario, it can be served by many processes simultaneously to ensure a high level of throughput, while not so busy nodes can re-use those same processes.
The cost of operating a node can therefore be optimised in a Multi-tenancy environment.
Or put otherwise, the Corda instance can be (horizontally) scaled to support the total amount of work across all Virtual Nodes while ensuring a Virtual Node can be served immediately without the need to provision an on-demand process or container.
In our example of the network where nodes interact a few times a week or month, it may be that a single Flow Worker Corda instance can support 1000s of Virtual Nodes, or a two Flow Worker Corda instance can support the same in a Highly Available configuration.
This article introduced the Corda 5 sandbox and gives some background to why we need it, what it does and some of the caveats we need to be aware of. The topic goes much deeper than what we have space for here, and we will be following up with additional articles about how we use OSGi, the Security Manager, our new packaging format and our worker architecture.
In the meantime, make sure to check out our code and let us know what you think and what you would like to see covered on our blog. And sign up for CordaCon 2022, and find us at the booth or one of the talks, as we would love to talk to you more about this or any other Corda topic.