Decentralized App Development

Peer-to-Peer MongoDB: Part 3

Sparsely connected networks

Jeremy Orme
Published in
5 min readMar 30, 2024

--

Photo by Rubaitul Azad on Unsplash

Last time we created a rudimentary p2p network over TCP sockets and mirrored database entries between nodes. In that basic implementation, each node connects to every other node in the network, which works fine for a small number of nodes but the number of messages sent quickly becomes problematic with larger networks.

Going forwards, we’d like to limit the number of connections that each node makes to reduce the number of update messages sent. The downside is that, on average, it takes more hops to propagate a message between any two nodes but the trade-off is worthwhile to allow larger networks and can be controlled by adjusting the maximum connection limit.

Who to connect to?

We assume, for now at least, that we have the complete list of peers in the network. This lists the IP address and port of each peer so for example, if we run 7 peers on our local machine, it could be:

127.0.0.1:5001
127.0.0.1:5002
127.0.0.1:5003
127.0.0.1:5004
127.0.0.1:5005
127.0.0.1:5006
127.0.0.1:5007

We want to establish connections with other peers so we’ll pick peers from the list at random and try to connect to them. Once we have established enough connections we’ll stop. If a connection is terminated then we’ll try to establish another connection until we have enough again.

Isolated nodes

This approach introduces a new problem. Previously, a node connected to all other nodes in the network but now each node only connects to a few other nodes. That opens the likelihood that, by chance, some node will be left without any incoming connections and never receive updates.

So far, we only sent updates in the direction that a connection was made — if A connects to B, updates are sent from A to B but not from B to A (unless B also connected to A).

We can fix this by sending updates along the connections that were made to us as well as those we made to others. However, that means we should no longer make outgoing connections to peers that have already made an incoming connection to us or we’ll be sending messages twice to the same peer.

Identifying a peer

Previously, it was simple to avoid multiple connections to the same peer because we just avoided making the same outgoing connection twice to a given peer in our peer list. Now we need to worry about incoming connections too.

That means that when we receive an incoming connection, we need to identify which peer in the peer list it relates to. The peer list identifies peers by their address and listener port and that information is not available from the incoming connection socket. Therefore, the peer has to send its identity after the connection is made — until that information is received, the connection must sit in a pending state and not be used.

To de-duplicate the peer connection successfully, its identity must exactly match the identity specified in the peer list. The port is unambiguous but the address part can be specified multiple ways — for example, using a domain name, IP address or alias such as “localhost”. The simplest way to resolve this is to require the server to specify its own address exactly as written in the peer address list.

Connection example

Let’s consider an example connection scenario. Let’s assume our peer list is [a, b, c, d, e, f, g]. In a real peer list, each entry would be something like 127.0.0.1:5000 but to keep the diagrams simple, we replace the addresses with single letters.

Let’s assume the connection limit is set to 2 and a, b and c are fired up and start connecting. We could end up with the following connections made:

These three peers have all acquired their maximum two connections so would stop attempting to connect at this point.

Suppose we bring a couple more peers online. They each connect to two peers at random:

We can see that not all peers are equal:

  • c has 4 connections
  • a and b have 3 connections
  • d and e have 2 connections

In general, using this approach, we’d expect peers that joined the network earlier to accumulate more connections than those that join later.

Let’s bring the final two peers online:

Implementation

The above connection algorithm is implemented in src/pub-sub.ts in jeremyorme/bonono-server at release/part-3 (github.com). The following diagram shows the connections resulting from an actual run with 7 nodes started one after the other. All the peers were run on the same machine so only the port is shown (ports are dealt out in order from 5001 to 5007):

The main change in the implementation from last time is that we have implemented a trivial protocol to pass the identity of the connecting peer to the peer receiving the connection. For this we define two simple messages:

enum PeerMessageType {
PeerInfo,
PeerData
}

interface PeerMessage {
type: PeerMessageType
}

interface PeerInfoMessage extends PeerMessage {
address: string;
port: number;
}

interface PeerDataMessage extends PeerMessage {
data: any;
}

The PeerInfoMessage is sent automatically after connection to inform the other peer of our identity. The PeerDataMessage is used subsequently to send data between peers.

Next time

Now we’re sparsely connected, our update transmission is faulty… we only send updates to our directly connected peers. Next time, we’ll look at relaying updates so they reach all peers in the network.

--

--

Jeremy Orme
Coinmonks

Software engineer. Experimenting with database decentralization