This is not really a blog post, but rather a checklist of more or less painful mistakes in networks-science programs by me, or my students or teachers. (As a side note, computer scientists tend to be better a this, but are by no means immune to it.)

This list is incomplete. I plan to fill it up as I go along (suggestions are welcome). I let *n* denote the number of nodes and *m* the number of edges, and assume we are dealing with simple graphs.

#### Isolated nodes are also nodes

The most common format for network data (and the one I always use) is the edge list—an *m*×2 matrix where every row represents an edge. Isolated nodes will never appear in an edge list. So a common mistake is, of course, to assume that they do. It typically happened to me when I compared an empirical network without isolated nodes with a model network, or a randomized version of the original, that happens to have them.

We can get around this problem by setting the largest node index to *n* – 1 (or *n*) and assuming nodes between 0 (1) and *n* – 1 (*n*) that do not appear in the file are isolates. However, I many times forgot to check that the largest index node is not an isolate.

#### Swap it right (and left)

To update networks while keeping the degree sequence fixed, we often rewire edges. The way not to do it is:

- Pick two random edges (
*i*,*j*) and (*i’*,*j’*). - Check that
*i*and*j’*are different, that*i’*and*j*are different and that (*i*,*j’*) and (*i’*,*j*) are not edges. If so, then: - Replace (
*i*,*j*) by (*i*,*j’*) and (*i’*,*j’*) by (*i’*,*j*).

The problem with this is that a node in the first argument of an edge will never leave the first argument. So if in the input data, *i* and *j* both only occur in the first argument, there will never be an edge between them. Many times the edges are sorted, so this could cause very biased results. There are many ways of fixing this. One can occasionally flip the edges (*i*,*j*) → (*j*,*i*). One can occasionally (or maybe always?) rewire to (*i*,*i’*) and (*j*,*j’*) instead, or to (*i*,*j’*) and (*j*,*i’*).

#### Getting a random node (or edge)

This should be just routine for us all, and I assume if you use some network library in Python or R there is nothing to worry about. (Is there?) If you use C++ or such, however, don’t just take a random number modulo *n*. I think the fastest and most elegant way of getting it correct is as in this routine (slightly rewritten from the PCG random number generator)

unsigned int pcg_32_bounded (unsigned intbound) {

unsigned intthreshold = -bound % bound, r;

do r = pcg_32(); // getting a 32-bit random number

while (r < threshold);

return r % bound;

}

Here `threshold` is 2^{32} minus the largest multiple of `bound` that fits into 32 bits (note that nifty trick with the two’s complement of `bound` that wouldn’t work for a signed integer). In practice, I don’t think I ever dealt with networks where just `pcg_32() % bound` would give probabilities that are more than 1 percent off, but it is wrong not to do things right. And if one would get the chance to play with an *n* = 3,221,225,472 network then the naïve approach would sample half of the nodes twice as frequently as the other half.

#### List of nodes in random order

(Of course, in Python you would use `random.shuffle` or in R `sample`, but if you have to do it from scratch.) The common mistake is to go through all nodes and swap each node with a random node. This is *el classico* of simulation-code errors and I’ve seen it committed by prominent members of my field. What you should really use is the Fisher-Yates shuffle. Once again, the naïve algorithm does not get it *that* wrong, and it is hard to believe any textbooks need to be rewritten if it was weeded out, but still.