Summary
netcarto is a command line tool for finding modules (and node roles) by maximizing Modularity with Simulated Annealing.
Given a network, the program netcarto identifies modules--i.e. densely connected groups of nodes in the network--and classifies nodes according to their roles, as defined in Refs. [1, 2].
Download
Download netcarto from our Bitbucket repository, or open a terminal and type
hg clone https://bitbucket.org/amarallab/network-cartography
netcarto_cl is the executable file for Linux 32bits. For Windows 32bits platforms, please use netcarto_cl.exe.
Usage
Type the following command in the terminal:
netcarto_cl <networkfilename> <seed> <initial_temperature> <iteration_factor> <cooling_factor> <number_of_randomizations>
Examples:
-
For a large network (thousands of nodes)
netcarto_cl testnetwork.dat 1111 5.0 0.4 0.96 5
-
For a small network (tens of nodes)
netcarto_cl testnetwork.dat 1111 10.0 1.0 0.999 10
Input
netcarto takes the following input parameters:
-
Seed for the random number generator: Must be a positive integer. Since the module identification algorithm is stochastic, different runs will yield, in general, slightly different different modules. Two runs with the same seed, though, should give the exact same results.
-
Name of the network file: Name of the file that contains the network. The file must be a list of links with the format:
n1 n2 n3 n4 . . . .
This represents a network with a link between nodes n1 and n2, another between nodes n3 and n4, and so on. Nodes must be separated by spaces.:
-
Iteration factor (f): At each temperature of the simulated annealing (SA), the program performs fN^2 individual-node updates (involving the movement of a single node from one module to another) and fN collective updates (involving the merging of two modules and the split of a module). The number "f" is the iteration factor. Large values of f (1 or larger) will result, in general, in better results (higher modularities) and longer execution times. The recommended range for f is [0.1, 1], although smaller values may be needed for large and/or dense networks. Note, also, that a minimum number of iterations is imposed at each temperature, so that when f is very small, the minimum number will be used instead of fN^2 or fN.
-
Cooling factor (c): After the desired number of updates is done at a certain temperature T, the system is cooled down to a new temperature T'=cT, where c is the cooling factor. the cooling factor must be strictly larger than 0 and strictly smaller than 1. In general, values close to one will result in better results and longer execution times. Recommended values of the cooling factor f are [0.990, 0.999], although smaller values (0.95 or even 0.9) may be needed for large and/or dense networks.
-
Number of randomizations: After modules and roles are identified in the original network, there is the option to calculate the value of the modularity in a random graph with the same degree (connectivity) distribution as the original network. As discussed in [3], this test is necessary to establish whether the modular structure of the original network is significant or not. Calculation of the modularity for each random network will take approximately the same time as for the original network. If you do not want to run any randomization, just enter 0 here.
Note: T_ini, iteration_factor, and cooling_factor can be set to -1 to use the defaults (2/size_of_network, 1.0, and 0.995, respectively).
Output
After entering these parameters, the algorithm will start to identify the modules in the network. As the SA proceeds, the program prints three columns on the screen, which indicate the inverse of the temperature, the modularity at that temperature, and the temperature itself, respectively. This provides you with a fast way to check if the process is too slow or, conversely, if it is fast and the accuracy can be increased. The program will stop when the modularity remains unchanged during 25 different temperatures. In general, finding a good partition requires decreasing the temperature several orders of magnitude. Thus, as a rule of thumb, if the time between one temperature and the next is larger than a second, the program will likely take days to complete (this, however, will also depend on the cooling factor). Of course, there is nothing wrong with long runs, as long as you are willing to wait!
When the SA for the original network finishes, the program calculates the role of each node almost instantly and outputs the following files:
-
network.net: A Pajek file containing the giant component of the network (for information on Pajek, visit http://vlado.fmf.uni-lj.si/pub/networks/pajek/).
-
modules.clu: A Pajek partition containing the modules as identified by the algorithm.
-
roles.clu: A Pajek partition containing the roles as identified by the algorithm.
-
modules.dat: A text file containing some basic information about the modules (can be edited with any text editor such as NotePad, or imported in Excel as a csv file). The format of the file is as follows. Each line corresponds to a different module. The first number is an ID number for the module, mostly irrelevant. The second is the number of nodes in the module. The third is the total number of links in the module, the fourth the number of within-module links, and the fifth the number of links from this module to other modules (of course, the third column must be equal to the sum of the fourth and fifth columns). Then there is a "---" and the next columns correspond to the list of nodes in the module. The last line of the file contains the value of the modularity for this partition.
-
roles.dat: A text file containing some basic information about the roles (can be edited with any text editor such as NodePad, or imported in Excel as a csv file). The format of the file is as follows. Each line corresponds to a different role. The first number is the role number, as defined in [1, 2]. The second is the number of nodes with that role. The third is the total number of links of nodes with that role, the fourth the number of within-role links, and the fifth the number of links from this role to other roles (of course, the third column must be equal to the sum of the fourth and fifth columns). Then there is a "---" and the next columns correspond to the list of nodes with that role.
-
node_prop.dat: A text file with four columns. The first one is the number of the node. The second is the degree (number of links) of the node. The third is the participation coefficient as defined in [1, 2]. The fourth one is the within-module relative degree, as defined in [1, 2].
After all of this, the program starts to calculate the modularity for as many randomizations as you have selected, printing on the screen, as before, the inverse of the temperature, the modularity, and the temperature. Once all randomizations are done, the program creates one last file that contains the modularity of the original network, the average modularity of the randomized networks, and the standard deviation of the modularity of the randomized networks.
References
In case you use the results of the program in a publication, please cite the following papers:
- Guimera, R. & Amaral, L.A.N., Functional cartography of complex metabolic networks, Nature 433, 895-900 (2005).
- Guimera, R. & Amaral, L.A.N., Cartography of complex networks: modules and universal roles, J. Stat. Mech.-Theory Exp., art. no. P02001 (2005).
For the comparison to randomized networks, please cite: