If you haven’t heard about Zachary’s karate club, you should probably be careful calling yourself network scientist in the wrong company. It is a small network data set that is used as an example, and benchmark for community detection algorithms. It even has a club! The Zachary Karate Club Club. With a trophy going to the one first presenter showing a Zachary Karate Club plot at a conference.
The original paper
WW Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33(4), 452–473 (1977)
is well worth a read, also for the young generation. It captures many “modern” ideas of how information flows in social networks and even a community detection algorithm (maybe incomplete by today’s standards, but anyway) based on the Ford-Fulkerson algorithm for the maximum flow problem. Indeed it uses similar ideas as the paper that became responsible for the revival of the Zachary karate club network:
M Girvan, MEJ Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002)
I always recognized the Zachary club plots by the presence of only one node with degree one.
Today while checking the original paper for the n’th time, I noticed that the plot of the network has two nodes with degree one.
I encoded the network from scratch and compared it to the one in NetworkX (which I assume is identical to the one most people use) and concluded that there is one extra link in that data. If you want the original Zachary’s karate club, you can simply do (in Python):
import networkx as nx
G = nx.karate_club_graph()
Now, of course, I’m not suggesting people to start using the original. It makes more sense to keep the version with the extra edge for consistency. And if anyone would start using the original graph, I suggest they do that within the confines of a Zachary’s Zachary Karate Club Club.
Update 1: As pointed out by Aaron Clauset, this discrepancy probably comes from a typo in the adjacency matrix (also published in the paper). Since the paper explicitly states that the networks considered are undirected, and the adjacency matrix is asymmetric, the paper is inconsistent. I guess you can argue both that the sparser graph is more likely because it is mentioned twice (also in the figure) . . or that the extra link logically should exist because of triadic closure.
Update 2: As suggested by Sang Hoon Lee, given the gravity of the matter, some funding agency should open their pockets for an investigation into this. Probably some of the members are still around and the existence of link (24,34) (of the original data) could be verified. For NetSci 2013 Sune Lehmann and I had the idea to track down Zachary and invite him as a dinner speaker, but a quick google bout didn’t do the trick at that time.
Anyone knows what happened to him? After just an hour or so crowd intelligence located him to a company providing organizational solutions for health care. It’s Searching for Sugar Man all over again.