Decentralization / Peer-to-peer

Building a Debate App: Part 16

Scalability and efficiency

Jeremy Orme
Published in
4 min readSep 19, 2023

--

Photo by Shubham's Web3 on Unsplash

Over the past couple of months, I’ve been thinking about the potential scalability and efficiency of the debate app we’ve been creating in this series and contemplating whether a pure peer-to-peer approach is practical for a large scale app, for example — i.e. one with millions of active users and where users are predominantly mobile and therefore storage and energy constrained.

Pure peer-to-peer network

The pure peer-to-peer approach, where every user is a node in the p2p network, is great in theory because each node potentially contributes to the network. The compute resource is completely democratized and there’s no central server that can be blocked or coerced in order to enforce some undesirable policy.

Democratization of compute resource means that the processing power and storage capacity that the app depends on is spread thinly across the network, rather than being owned (and paid for) by just a few people. This improves resilience because it removes dependence on the financial stability and incorruptibility of a few individuals.

Unfortunately, in practice, there are problems with the pure approach due to the requirement to maintain database replicas on each device, which consumes significant CPU and memory resources.

CPU (energy) consumption

Running an IPFS node in a desktop browser, after a short while produces an annoying energy consumption warning. On mobile, the device becomes noticeably hot and the battery drains away rather quickly.

There seems to be a significant energy consumption even without user input. It’s not clear whether this is fundamental or could be fixed by some optimization of the IPFS implementation.

In addition to the background cost of just running the node, there is additional processing required to reconstruct each collection from its operation log, including all the associated cryptographic operations to verify its integrity.

Memory (space) consumption

Even if the energy consumption issue could be resolved, there is a second issue faced when replicating a database on a mobile device — storage/memory consumption.

For apps that are stateful, like our debate app, database collections are replicated by each node, which involves fetching and caching the entries that make up the collection.

In theory, nodes only need to replicate the collections they use so careful design can reduce the size of the data that needs to be held by each device and less frequently used data can be aggressively purged then reloaded as required by constraining the cache size.

The trouble is, infrequently used data would then, by definition, get purged by all peers and become lost forever.

Of course, we can sprinkle special peers into our p2p network, hosted on powerful servers with access to large storage devices to guarantee data is kept and made available.

The question then springs to mind: does each user actually need to host a p2p node, if they’re only consuming data from those servers and not really hosting data themselves?

What’s the goal?

At this point it’s taking a step back and asking ourselves… what is it we’re trying to achieve by using a peer-to-peer database?

The key benefit unique to peer-to-peer databases is the decentralization of data ownership. The idea is that no entity can obtain control over data that is replicated by a diverse enough group of peers.

Data integrity and access are enforced by cryptographic signature so the worst a rogue peer can do is refuse to accept some data but with sufficient diversity there’s always another one to pick up the slack.

The question is… do we need every user to be a peer to achieve this goal?

A hybrid network?

Rather than having every user be a peer, we could have a network of peer servers and have the user app be a client of those servers.

This potentially gives the best of both worlds:

  • Clients just make http requests for the data they need, when they need it, and don’t have to burn energy and space maintaining a replica.
  • Servers host the data but still have no unilateral control over what gets accepted into the database — they can only exercise control over what they choose to store in their own replica.

To be effective, spinning up a new server should be as easy as possible. In addition to servers hosted on dedicated machines, they could be run as a background application on any computer. Any person, without needing any technical expertise, can start a server application and begin contributing compute and storage resource to the distributed app they want to support.

Standardized database server

The functionality I’ve described above is generic. It’s easy to imagine a REST API for performing CRUD operations on such a database. Knowledge of specific collections would reside in the specific client app.

This is very convenient for the application developer, who can start an off-the-shelf database server and read/write it using a simple javascript API.

Of course, server owners would want to constrain the writes that their server would accept, to avoid it being filled with spam or data for other applications they don’t want to support. The rules for what to store could be set at any point by supplying a suitable configuration file.

In summary, this approach has several positives:

  • Reduced resource burden on end users
  • Still ensures distributed data ownership
  • Simpler, more familiar paradigm for app developers than full p2p

I’m going to pause development on the debate app for now, ending this article series, and work on this idea. Once the p2p database server is implemented, I’ll start a new series that re-implements the debate app using this approach.

In the meantime I’ll be posting articles on the development of this server app and any work on Bonono required to support it.

--

--

Jeremy Orme
Coinmonks

Software engineer. Experimenting with database decentralization