5.3 The Barabási-Albert Model

The Barabási-Albert model, or BA model for short, addresses this question: how does a network’s degree distribution form a power law?

Two hidden assumptions of the random network model do not hold water in real networks:

  1. The random network model assumes that the number of nodes is fixed. In real networks, however, this number continues to grow as new nodes join the network. That is to say, the size of a real network keeps growing.

  2. The random network model assumes that a link is generated randomly. However, in real networks, new nodes prefer to link to high-degree nodes. This phenomenon is called preferential attachment.

The BA model recognized the coexistence of network growth and preferential attachment in real networks and proposed a method of generating networks with scale-free properties, i.e., hubs, and a degree distribution that follows a power law.

I am wondering: does a power law distribution already implies that there are hubs in the network?

We begin with \(m_0\) nodes. Each node’s degree is no less than 1. Links between these \(m_0\) nodes are arbitrary.

  1. At each timestep, a new node \(i\) with a degree of \(m\) (\(m \le m_0\)) is added to the network.

  2. The probability that node \(i\) connects to an old node, denoted as node \(j\), is proportional to the old node’s degree \(k_j\) (Menczer, Fortunato, and Davis 2020):

\[\begin{equation} \Pi(i, j) = \frac{k_j}{\sum\limits_l k_l} \tag{5.3} \end{equation}\]

The denominator in Eq. (5.3) is the sum of all degrees in the old network, so the sum of all these probabilities is equal to one (Menczer, Fortunato, and Davis 2020).

I am not sure whether I can explain why the sum of all these probabilities is equal to one this way: Let us say the probability that node \(i\) will connect to node \(1\) is \(P_1\), to node \(2\), \(P_2\), to node \(3\), \(P_3\) …, if we add these probabilities up, it will (and has to) be equal to one.
In Menczer, Fortunato, and Davis (2020), in Box 5.4, it says that \(m\) here is the average degree of the network. I did not quite understand why.

The image 5.4 made me wonder whether my computation of 5.4 was wrong. I will check it later.

Update: Yeah, i was wrong. Originally, when calculating deg.dist_internet, I set cumulative to be FALSE. This is due to the fact that I did not quite understand power law distribution at that time. I need to set it to be TRUE.

5.3.1 Improving the BA model

The following is largely from the online course of INFO 606 at Indiana University taught by Professor Yong-Yoel (“YY”) Ahn. My notes are accompanied.

Suppose you are an incoming node and are about to attach yourself to other nodes. According to Eq. (5.3), you first should calculate all the degree nodes in the existing network and then calculate the proportion of a specific node’s degree against the sum of all the degrees. This proportion will be the probability that you will connect to this node.

There is a problem in the first step. You need to know all the nodes before you can get the sum of all degrees. However, this is very difficult. For instance, when you enter a university, and are about to develop a friendship with other people, is it realistic for you to first know how many friends EVERY person at this university has? Probably not.

Is it possible for us to achieve preferential attachment without requiring knowledge about each and every node in the existing network?

Yes. Try this: The incoming node randomly chooses a link from the existing network, then connect to either end of this link. Hubs have more links connecting to it, so it is more likely for this incoming node to connect to a hub. If you have difficulty understanding why so, consider the extreme example of a star graph.

starG <- make_star(n = 12, mode = "undirected", center = 1 )
plot (starG)
A star graph

FIGURE 5.5: A star graph

In Fig. 5.5, no matter which link you pick, it is always connected to the hub.

If you have followed the understood the logic so far, you should be able to know that if \(k_i = 2 \cdot k_j\) (meaning node \(i\) has twice as many links as node \(j\)), then the incoming node is twice as likely to connect to node \(i\). This means that the probability of connecting to node \(i\) is linearly proportional to \(k_i\), which is how the BA model defines preferential attachment.

Professor YY said at the end of the video that this process can also result in a high clustering coeffient of the network. I need to brush up on clustering coeffient a little bit to understand explain why.