Corda, System Failures and Data. Who needs to look after stuff when things go wrong ?
We’re often asked (on our Slack, Discourse, or in person) how, if someone running Corda were to lose all their data, they might go about restoring it from other nodes on the network. Equating Corda to traditional blockchain solutions, some may naively assume that this is a desirable feature for Corda to have.
In reality, however, you would never want to depend on third parties in this way. I’d like to expand on the reasons why.
Let’s take a somewhat cynical scenario: We make a simple contract for a €1 coin-toss bet. I win the bet, but disaster strikes and (oh noes!) I lose my entire datastore, including the contract and proof of result for this bet. Not only that, but I’ve also lost my internal accounting trail describing which internal department funded the bet in the first place, along with any internal sign offs needed for me to place this bet in the first place.
What happens next? What if the counterparty I made this bet with doesn’t want to give me “my” data back, enabling me to pursue them through the judicial process and be awarded my (fairly-gotten) gains?
Even assuming that the shared data is recoverable from other parties, and that I can query their Corda nodes for a copy of the data given the transaction hashes (if I have them), there is no recourse for any data which I maintained internally and chose not to share with other parties. So, in the case of the coin toss bet above, exactly what motivation would my counterparty have in order to pay up? In a simple world, they’re potentially earning interest every day they can stall the payment of the bet — so yes they might eventually supply you with the information “for the sake of good business conduct and reputation” but I’ve never experienced an organisation in the world who prioritises paying money out.
So, one must treat the data store (i.e. the database and filesystem) that a DLT application uses with as much significance and importance as any other production system where the data has operational significance.
(I will point out that one of the team did want me to highlight that it would be fairly trivial to create an extension to the contracts in order to include a “backup node” which could then automatically receive a copy of all relevant data, or possibly just offer this as a service to others if they trust you enough).
Comparison against other systems
But how does this compare to the other blockchain or distributed ledgers out there? Well, for all of them (including Corda), if you lose your private keys, it’s game over — there’s no going back from this. So of course you’ll be backing those up (securely — obviously). Beyond that though, different architectures require different strategies which can be vaguely correlated (and please do excuse my obvious bias) with the complexity of the underlying system.
Let’s start with the granddaddy of them all, the Bitcoin blockchain. It’s no surprise that this has probably created the thought that “If it’s a blockchain system, then everybody has all the data and I can download it from anybody at anytime” — and of course for a basic client of Bitcoin just using it as a means to transfer value via bitcoins themselves, one would be correct, but even with this model, the current total blockchain size is way in excess of 100GB — not an insignificant amount of data to download in order to verify any potential inbound transactions. However, let’s take into consideration the fact that transactions between bitcoin users are only infrequently between “simple” users — commercial entities involved in transactions are required to maintain links between their payments and debits with the counterparties that transacted them — even more so for any entity that transacts across currencies i.e. exchanges etc — and the data that creates these links is (obviously) not broadcast throughout the blockchain … and of course all of this data needs to be secured somehow.
Looking at Ethereum, the current blockchain size is a reasonable 11GB, (as of March 2017), so there’s not such an overhead re-retrieving this data. As mentioned, if you lose your private keys, it’s game over, so obviously you’ll be keeping those close. Contract code, however, is *not* stored in it’s source form on the Ethereum blockchain, only the compiled code, so if you ever want to publish an updated version of the contract code (and yes I know that in the purest form, you cannot update code in place — but for the sake of brevity and simplicity please allow me to carry on), you’ll be needing to keep a copy of the source somewhere safe (but, to be fair, this is par for the course for all developers now, keeping source code in a versioned repository). Also, all of that data that allows you to integrate with the non blockchain part of your system (for example, web interface or local access control but also including examples from the bitcoin comparison above) will not be propagated throughout the Ethereum blockchain — so this will require the same elements of data protection as any traditional production system.
Finally, let’s consider a non-blockchain / distributed ledger solution, perhaps a SWIFT or FIX client. Considering SWIFT, the data received is, as you would expect, efficient and brief. It requires an immense amount of enrichment (which, to be honest, is probably not the correct verb as no real user of SWIFT keeps their data in that format, it really just is a transport mechanism) in order to be useful internally. Once it has gone through rudimentary validation, the next destination is often a database of some description so that the data can start it’s transit through the financial institution. However, regardless of the next part of the journey, even just at the basic receiving SWIFT file level, the machines have mirrored and backed up file systems, databases with standby hardware — and this is even with the support of SWIFT happily resending you any messages you may have lost somehow. This is how the world works today, and distributed ledgers do not sprinkle any magic “unbreakably redundant” dust over these data fundamentals.
Repeat after me: Blockchains and DLT do not absolve me of my data husbandry duties.