If you ever wondered why blockchains are slow or scaling is hard, you are not alone.

by Caleb Lau


The below mostly stemmed from talking to a few clients on blockchains and scaling. Their main qualm being – Why are some of the largest blockchains so slow? Why is scaling hard? Why are we not already getting very usable blockchains today?

There exists a theorem called the CAP Theorem which introduces the tradeoffs inherent to distributed system since 1996 – Namely on the properties of Consistency, Availability and Partition tolerance. At any given time a distributed system will only be able to fulfill two of the properties, i.e. the data could be reliably replicated across all databases (consistency) while having high uptime (availability) but lacking in partitioning (low number of replicas). Or the data could have high availability and replicas across the world, however due to network latency has low consistency (each data store may not be fully synchronised at all times). This is simply a physical limitation given that data stores are replicated and partitions exist to separate these data stores from one another.

Given that blockchains are also a form of distributed network, it also inherits properties from CAP theorem, even more so when a blockchain is highly likely to be partitioned (hence the need for a mechanism to resolve forks over time). When there is a deterministic network where every single node processes every transaction throughout the network and nobody has control of the minimal machine specification that should connect to the blockchain, this results in a fundamental tradeoff trilemma, fondly known to be the Zamfir Triangle – Between scalability, security, and decentralisation. Scalability is the number of transactions the entire blockchain could process, security is how secure the blockchain is towards malicious intent, and decentralisation is how open the network is to any form of participants even on a lightweight laptop. The higher the throughput of the blockchain while maintaining high levels of security, nodes will need to maintain some degree of closeness (either from a geographical, specification or social perspective) therefore sacrificing on decentralisation. A highly scalable blockchain which is at the same time highly decentralised, is potentially exploitable as deep forks could happen and the blockchain could not maintain consistency. A highly secure blockchain attempting for high levels of decentralisation and accessibility will trade off scalability, which is what Bitcoin and Ethereum are currently facing presently.

Currently most research is centered around improving scalability, the distinguishing factor being not trading off decentralisation and security at the same time. Increasing block sizes helps to an extent, but on the flip side results in block propagation latencies and an increase in processing power requirements. Second layer solutions with sidechains such as Plasma could be designed to maintain very high degree of protection for users, maintaining the ideal of decentralisation while providing much higher throughput while leveraging on the security of the base layer. On the other hand sidechains are still reliant on the base chain for liveness factors, e.g. in the case of Plasma mass exits, thereby base layer innovation is still required in the long run. Sharding, which in essence splits up the network to multiple streams, allows parallelism which increases throughput along with the increase in shard numbers. However a naively implemented sharding protocol vastly impacts security (e.g. 1% shard takeover) and introduces synchronicity issues for cross-shard communication (e.g. train and hotel problem). Then there’s the question if dApp developers should be made to be aware of potential concurrency problems during cross-shard communications (obvious answer being no), and how to resolve this.

In fact, scalability is and should not be restricted to throughput alone. An ever-growing database with the assumption that storage capacity will scale accordingly is too risky an assumption to make. To maintain sufficient decentralisation and to keep the network accessible, a mixture of light, pruned state and full archive client implementations are necessary. Incentives could be used to encourage users to host their own full nodes while making it available for others to connect to, whereas disincentives such as storage rent could be used to discourage unnecessary reliance on state storage, to steer developers to seek less expensive options such as utilising logs or using blockchains for critical aspects of data availability proofs.

The above scratches just the surface, and there are a lot going on in the space of public blockchains, with plenty of hard computer science/distributed network problems to be resolved. There are solutions, be aware that not all solutions are made equal, each with their own tradeoffs; But together we can continue striving towards creating a truly robust, incentive driven, cryptoeconomically secure open access network for everyone.