4.1 Random networks (Barabási Ch.3)

The following is not originally from me, but my notes from reading Network Science (Ch. 3) by Albert-László Barabási. Notes are accompanied by my explanations.

4.1.1 Generating a random network

Most of networks in real life look like randomly constructed. Then, how can we, as humans, produce random networks that we see in nature?

Simply by putting links randomly between nodes.

But how can we make sure that we are putting links randomly?

Think about it for a minute before reading on.

If you have studied statistics, you would know that randomness is all about probability. If we say that we are going to randomly pick a person from a group, then we’ll need to make sure that each person in the group has an equal chance to be picked. Otherwise, the choice is not random.

Now, let’s go back to our original problem: how can we randomly put links between nodes?

For example, we now have 100 isolated nodes, which are called singletons, as shown below. What to do next?

g4_1 <- graph( edges=NULL, n=100)
100 isolated nodes

FIGURE 4.1: 100 isolated nodes

Frist, we’ll set up a value p (\(0 \leq p \leq 1\)), called link probability. Then, we’ll pick a pair of nodes (STEP 1), for example, \(i\) and \(j\), and generate a ramdom number \(r\) for this pair (STEP 2). If \(r < p\), then we connect \(i\) and \(j\). Otherwise, we keep them disconnected. We’ll repeat step 1 and step 2 for all node pairs, the total number of which should be \(\frac{N(N-1)}{2}\). This way, we will make sure that the probability of each pair being connected is exactly \(p\). Why so?

plot(c(0, 1), c(0, 0.2), 
     type= "n", 
     xlab = "", 
     ylab = "", 
# Solution: https://rstudio-pubs-static.s3.amazonaws.com/
# 297778_5fce298898d64c81a4127cf811a9d486.html
abline(v=0.1, col="red")
Illustrating why the probability of each pair being connected is equal to the link probability

FIGURE 4.2: Illustrating why the probability of each pair being connected is equal to the link probability

The reason can be shown in Figure 4.2. If we set \(p = 0.1\), then only \(10\%\) of random numbers will fall in the range of \([0,0.1)\) (the area shown in light green), the remaining \(90\%\) will fall in \([0.1,1]\)1.

Okay, let’s set \(p=0.1\) and see what will happen:

# Solution from: https://rpubs.com/lgadar/generate-graphs
erdosRenyi <- erdos.renyi.game(100, 0.1, type = "gnp")
An Erdős-Rényi network

FIGURE 4.3: An Erdős-Rényi network

The method described above is used in a model called \(G(N,p)\) model, which was introduced independently by Edgar Nelson Gilbert. In this model, the number of nodes (\(N\)), and link probability (\(p\)) are fixed.

There is a similar model called \(G(N,L)\) model where the number of nodes and the number of links are fixed. This model was introduced by Pál Erdős and Alfréd Rényi.

In practice, we use \(G(N,p)\) model more. But since Pál Erdős and Alfréd Rényi played such an important rold in advancing our understanding of random networks, we still name networks generated by the \(G(N,p)\) model as an Erdős-Rényi network.

In Chapter 3.2 of Network Science , there is a mistake: the second step of generating a random network said that “if the number exceeds \(p\), connect the selected node pair with a link.” However, it should be “if the number is smaller than p, connect the selected node pair with a link.”

4.1.3 Degree distribution Binomial distribution

First of all, if you are not familiar with combinations and permutations, or that you have forgotten what you learnt in your high school, you are encouraged to go through this amazing tutorial on mathsisfun.com.

Then, carefully read throught this tutorial on binomial distribution.

If you are able to understand the tutorials above, then you should know that if we have a biased coin which produces heads with probability \(p\), the probability of having \(k\) heads out of \(n\) tosses is:

\[\begin{equation} {n \choose k}p^k(1-p)^{n-k} \tag{4.3} \end{equation}\]

How to understand it?

We can look at it this way: tossing a coin \(n\) times, we have \(2^n\) different outcomes (i.e., combinations of heads and tails), and the number of outcomes (or combinations, if you want) that have \(k\) heads is \(n\choose k\) outcomes. However, we cannot simply use \({n\choose k}\div{2^n}\) to calculate the probability of having \(k\) heads. Why? Because this is a biased coin, so each outcome (or combination) has different probabilities.

What should we do then?

Now we know the number of outcomes that will produce \(k\) heads out of \(n\) tosses. It will be great if we know the probability of each of these outcomes and sum them up. Bingo!

When we think more deeply, we will know that each of these \(n\choose k\) outcomes has exactly the same probability: \(p^k(1-p)^{n-k}\). But why? Read this tutorial on binomial distribution again, especially the tree diagram. Also, you’ll find this tutorial on the probability of independent events helpful.

Now, let’s go back to random networks.

TABLE 4.1: Comparing random network generation and flipping coins
\(p\) \(N\) \(k\)
random network generation link probability number of nodes number of node pairs successfully connected
tossing coins probability of having a head in one toss number of tosses number of heads

For a given node \(i\), the maximum number of links it can have is \(N-1\). Let’s denote the probability of node \(i\) having \(k\) links as \(p^k\). Eq. (4.3), we know that:

\[\begin{equation} p^k = {N-1 \choose k}p^k(1-p)^{N-1-k} \tag{4.4} \end{equation}\]

In this binomial distribution, the mean is:

\[\begin{equation} E(x) = Np \tag{4.5} \end{equation}\]

Its standard deviation is:

\[\begin{equation} \sigma_i = [p(1-p)N]^\frac{1}{2} \tag{4.6} \end{equation}\]

And its second moment is:

\[\begin{equation} E(x^2) = p(1-p)N + p^2N^2 \tag{4.7} \end{equation}\]

Sorry that I am currently not capable of proving Eq. (4.5) to Eq. (4.7). For now, just memorize them. Poisson distribution

Most real networks are sparce, so its average degree \(\langle k \rangle\) is much smaller than the size of the network, \(N\). Considering this limit, we usually use Poisson distribution to describe a random network’s degree distribution because of simplicity:

\[\begin{equation} p_k = e^{-\langle k \rangle}\frac{\langle k \rangle^k}{k!} \tag{4.8} \end{equation}\]

Eq. (4.4) and Eq. (4.8) are collectively called degree dostribution of a random network.

Things to keep in mind:

  1. Binomial form is the exact version; Poisson distribution is only an approximation;

  2. We’d better use Binomial distribution to describe a small network, for example, \(N = 10^2\), but use Poisson distribution for large networks, for example, \(N = 10^3\) or \(10^4\);

  3. In Poisson distribution, the standard deviation is \(\sqrt {\langle k \rangle}\);

  4. The Poisson distribution tells us that for two networks, as long as they have the same \(\langle k \rangle\), their degree distribution is almost exactly the same despite different sizes, i.e., \(N\).

I did not quite understand the lecture on the Poisson distribution by Professor YY.

4.1.4 Poisson distribution does not capture reality

Poisson distribution undoutedly accurately describes the degree distribution of random networks, but we need to ask, do random netwoks reflect reality?

Reading Ch. 3.5, will let us know that if random networks can describe social networks in our daily lives, we would expect that:

  1. Most people will have \(\langle k \rangle \pm \sqrt {\langle k \rangle}\) friends;

  2. The highest number of friends a person can have is not that different than the smallest possible number.

However, we know that this is not the case in real life. Many people have over 5,000 contacts on Facebook and WeChat.

From the figure shown in Ch. 3.5, we will know that in real networks, both the number of high degree nodes, and the standard deviation of the distribution, are much larger than what is expected from random networks.

  1. I am a little bit uncertain here because I don’t know where to put 0.1.↩︎