Software and Data

All software and data are free to use. We will appreciate if you could send us an email to hmakse@lev.ccny.cuny.edu in case that the use of any of the software or datasets result in a publication.

Below is a summary of the databases that we have collected and used in our papers. These datasets have either been collected from various sources in the web or have been obtained by our group and collaborators by crawling the Internet, as indicated. The following table provides a short description of these databases, where the number of nodes and links refers only to the largest connected cluster. The detailed explanation of the databases is provided after the table.

Online CI Influencer Calculation

Go to online CI influencer calculation

Download CI source code

Detailed instructions for running the CI code are below, in the section “Algorithms for Finding Superspreaders.”

For Prospective Graduate Students and Postdocs

The following is a list of software that are used in our lab to analyze networks and data. Students and postdocs should be (or become) familiar with these methods.

Low Expertise Required:

MATLAB — generally used for data cleaning, data analysis, calculations, plot generation, etc.; you can get as simple or complicated as you want with it
Complex Networks toolbox —MATLAB toolbox by Lev Muchnik for analysis of complex networks; includes a k-shell decomposition algorithm
Machine learning toolbox
— MATLAB toolbox for (basic) machine learning

Community detection/modularity algorithm
— MATLAB code used to find community structure and modularity of a network

Network attributes algorithm — MATLAB code used to find network components, sizes, and lists of member nodes
Python — another general-use platform; again, uses can range in complexity
NetworkX — Python library used to find basic attributes of a network, such as the degree distribution
graph-tool — Python library for fast component decomposition, finding modularity, large network visualization

pandas — Python library used for data management
NumPy — Python library used for vector and matrix operations
SciPy
— Python library for statistics, hypothesis testing, regression, and numerical computation

Beautiful Soup — Python library used for website scraping
Scikit-learn — Python library used for basic machine learning methods, including GLasso and stochastic gradient descent
ImageJ
— Java image processing program used for optical CT imaging analysis

Gephi — visualization and analysis software for networks ***CAN BE BUGGY — SAVE WORK OFTEN***
Pajek
— general network visualization software

Low/medium Expertise Required:

SQLite — used for Twitter data management and analysis

Medium Expertise Required:

Graphical Lasso (GLasso) algorithm — MATLAB code used to find a sparse inverse correlation matrix
Collective Influence algorithm — C code implementation of Collective Influence algorithm; can be downloaded on this page
Monte Carlo for Maximum Entropy XY model — C code to find interaction matrix for network which can be modelled via a Maximum Entropy XY model ***BEST FOR VERY SMALL NETWORKS***
FMRIB Software Library (FSL) — used for model-based FMRI analysis (FEAT) and modelling the brain (BET)
BrainNet Viewer — brain network visualization software

Medium/high Expertise Required:

Medical imaging toolbox — MATLAB toolbox specifically for medical imaging
Natural Language Toolkit — platform for building Python code used in natural language processing (e.g., on Twitter)

High Expertise Required:

TensorFlow — used for Deep Learning development in machine learning

For computer analysis you will need:

Anaconda for Python 3.6
Gephi (link is above)
A Twitter account

For an introduction to Twitter network analysis, please see the following tutorial by postdoc Alexandre Bovet.

For those who would like to learn the R or Python computer languages from scratch, please see the following compilation of resources.

Courses for Prospective Students

The courses below will allow you to analyze Big Data in a variety of circumstances ranging from systems biology, to ecology, to social networks and finance:

Complex Networks at the Graduate Center – Physics – PHYS85200 – CRN 23395 – Professor H. Makse
This is my course on Network Theory; please see the syllabus.

Machine Learning at the Graduate Center – Computer Science – CSC74020 – Professor R. Haralick or Professor C. Yuan
Professor Haralick focuses more on the theoretical aspect while Professor Yuan focuses more on Natural Language Processing.

Big Data Analysis: Principles and Methods at the Graduate Center – Physics – PHYS85200 – CRN 32250 – Professor G. Patz
More application than theory, this course is a good introduction to the topic.

Finance for Scientists at the Graduate Center – Physics – PHYS85200 – CRN 30235 – Professor T. Schäfer
This course provides a good mathematical background on stochastic processes.

Computational Methods in Physics at the Graduate Center – Physics – PHYS85200 – CRN 23394 – Professor A. Poje
Ideal for those who have some experience in programming but want to become more comfortable with applications such as Monte Carlo methods.

The following courses cover theoretical principles important to the core of our research program, and in fact, the first two are mandatory for first-year Ph.D. students at the Graduate Center:

Statistical Mechanics at the Graduate Center – Physics – PHYS74100

Mathematical Methods in Physics at the Graduate Center – Physics – PHYS70100

Quantum Information Theory at the Graduate Center – Physics – PHYS85200

Quantum Theory of Fields I & II at the Graduate Center – Physics – PHYS82500 and PHYS82600, respectively

There are also courses outside the CUNY system, which I suggest that you look into if you have time. New York University has a Center for Data Science, as does Columbia University. Some examples of online courses offered are:

Computational Physics – PHYS-GA-2000

Non-equilibrium Statistical Physics – PHYS-GA-2061

Online courses are also important to our field of study:

Deep Learning is an important subject for any data scientist to know, although there is no course currently offered in the CUNY system. My students are self-taught or take online courses.

If you are learning the Python programming language (the language for Data Science), the Python Data Science Handbook is a very useful resource, as are Python courses that can be found at Coursera or edX.

For Data Science, Machine Learning, and Big Data Analysis, most of my students use Python, MATLAB, C, C++, Mathematica, and other languages. Please see “For Prospective Graduate Students and Postdocs” for further details.

There are also a great many online courses on applications of Data Science that can be found here. They are mostly (if not all) free, and range in difficulty level from introductory, like “Introduction to Python for Data Science,” to advanced, like “Case Studies in Functional Genomics.” There is even, at the time of this writing, an introductory course in the application of Data Analysis to biological systems, called “Introduction to Bio: Annotation and Analysis of Genomes and Genomic Assays.”

The above are a sampling of what my students found online, so you can also look into it further.

Brain Networks

Short Name Nodes Links Description Data
Functional brain networks --- --- Data used to study the connectivity of functional brain networks [See below for more information] Brain datasets

Gaze Position and Eyetracking

Short Name Nodes Links Description Data
Gaze position data from eyetracking --- --- Gaze position data from eyetracking; see below for more information Full eye data for each subject and each video
Individual ratings --- --- Ratings given by each subject for each video watched during the study Individual ratings
Video data --- --- Data regarding each video shown to subjects during the study (includes nation-wide Ad Meter ratings and video titles) Video data

Research on functional brain networks

Human dual task datasets (networks data, ATTENTION! Large file: 2.2 Gb)

Complete human dual task datasets (7 Gb) including all experiments with activity (phase and amplitude, see Sigman paper below for details) of each voxel, code to read the data, spatial coordinates of the nodes, and activity mask used to filter the data.

Human resting state datasets (network datasets, ATTENTION! Large file: 1.4 Gb)

These files contain all networks information and readme.txt files for networks of networks in dual tasks and resting states in functional human brain networks.

For more details on the datasets, please see here.

Research on gaze position

Repository of all gaze position data and code

The above repository contains all the data and code used for our work on applying a Maximum Entropy model to gaze positions recorded from a set of subjects who were instructed to watch and rate Super Bowl 2014 commercials. The individuals’ gaze data, demographic data, and ratings are included, as well as some ratings data at the larger scale of the general United States public.

There is also a readme.txt explaining the meaning and functions of all the code and data.

2014 Super Bowl commercials viewed by the participants in the study. Gaze positions for all viewers are denoted by red dots.

Big Data Analysis

Collective Influence Maximization

Please find here the code (in C) for the algorithm used in the paper of Pei, Teng, Shaman, Morone, and Makse, Efficient collective influence maximization in cascading processes with first-order transitions, Sci. Rep. 7, 45240 (2017).

This code calculates values of Collective Influence in the Threshold Model (CI-TM) in an Erdos-Renyi random graph; the structure of the graph is a list of neighbors.

Algorithms for Finding Superspreaders

Algorithm to identify superspreaders using Collective Influencers (CI), k-shell, and PageRank.

Please find CI algorithm and code. CI algorithm is explained in F. Morone, H. A. Makse: Influence maximization in complex networks through optimal percolation, Nature (2015).

Instructions to run the CI code are as follows:

  • to compile the source code, use this command: gcc -o CI CI_HEAP.c -lm -O3
  • inputs are the file containing the network, and your desired value of L (the radius of the ball you want to consider)
    • network file must be an adjacency list where the first number on each line is a node ID and the following numbers on that line are its neighbors, e.g.:
      1 3255 18210 24119 70247
      2 9205 88665 89859
      3 3255 328244 25046 41508

      3255 1 3 34913 73168
  • to run the code, use this command: ./CI <network_filename> <L>
  • values of (q,G) are printed on the screen for fraction of removed nodes q, and an output file “Influencers.txt” is created listing the influencers in the network by decreasing influence

Algorithms of k-shell and pagerank are in this link. More information is in S. Pei, L. Muchnik, J. S. Andrade, Jr., Z. Zheng, H. A. Makse, Searching for superspreaders of information in real-world social media, Sci. Rep. 4, 5547 (2014).

Research on Superspreaders of Information and Visualization Techniques

For detailed explanation see here.

The datasets were published in S. Pei, L. Muchnik, J. S. Andrade, Jr., Z. Zheng, H. A. Makse, Searching for superspreaders of information in real-world social media, Sci. Rep. 4, 5547 (2014). DOI: 10.1038/srep05547. [PDF].

Visualization:

The cover of our Nature Physics 2010 paper on kcore spreaders M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. Eugene Stanley, H. A. Makse, Identification of influential spreaders in complex networks, Nat. Phys. 6, 888 (2010). [PDF]

Cover:

was created by the software package of LaNet-vi, which you can download at LaNet-vi. It was developed by the group of Indiana U. There is also a webpage interface you can use there. You may want to try different parameters to adjust the figures as you wish.

Beside this figure, most of our Nature-like high-quality figures are done with MetaPost. For instance, Fig. 1 in S. Pei, et al. Sci. Rep. 4, 5547 (2014). PDF:

Unfortunately, this is not an ‘interactive’ software to generate these type of images, but it is a programmable description of the image. If you google metapost tutorial you will find a lot of pages explaining the details. In general, it is not too hard but you would need to spend some time to learn how it works.

In general, a metapost file includes commands such as: fill fullcircle scaled 8.0 shifted (253,385) withcolor (1.0,0.0,0.0); which draws a circle of radius 8.0 at coordinates 253,385, using the red color, which in rgb is (1,0,0). You need to write a program that will take as input the nodes coordinates and radii, as well as their color, and then produce a file with statements such as the above. This metapost file needs to start with beginfig(1); and end with endfig; end;

This means that a file, test.mp , may look like beginfig(1); fill fullcircle scaled 8.0 shifted (253,385) withcolor (1.0,0.0,0.0); endfig; end;

You then need to compile this file with the command mpost test.mp This will produce a file called test.1 which is a postscript file that you can rename to test.eps and is your final image.

Before preparing the mp file, you may need to use other network visualization softwares to get the coordinates of nodes, such as gephi and pajek. Then you can export the coordinates and incorporate them into the mp file. We attach two mp files as examples. You can write your own mp files based on these examples: twolayer_c1.mp , and twolayer_sample.mp .

The datafiles used in the Srep paper are:

  • APS Dataset This file contains the coauthorship and citations of all scientific papers published in American Physical Society (APS) journals until 2005, including Physical Review A, B, C, D, E and Physical Review Letters.
  • Facebook Dataset This dataset is available online at http://socialnetworks.mpi-sws.org/data-wosn2009.html. It contains the friend relations of New Orleans Facebook social network as well as the wall posts records of users during a period of nearly two years. In the social network there are 63731 nodes with average degree 24.3. The total number of wall posts is 876992.
  • Twitter Dataset This file contains the mention network and retweet relations extracted from the tweets sampled between January 23rd and February 8th, 2011 provided by Twitter (http://trec.nist.gov/data/tweets/). We are not allowed to distribute any private information about the twitter users, so in the dataset each user is represented by an anonymized ID. Retweet dataset in Twitter that we used in a collective influence paper can be downloaded here.

Social Networks

Short Name Nodes Links Description Data
IMDB 374510 15014833 Co-appearance of actors in the same movie Barabasi homepage
Blogs 200316 700737 Links between blogs in the Sina blogosphere (http://blog.sina.com.cn) Paper by Fu et al.
HEP citations 34401 420784 High Energy Particle Physics citations from the hep-th section of arxiv.org Pajek homepage
Guestbook signing 19215 54623 Links between users of the pussokram.com online community that have signed guestbooks of other users Liljeros paper
Hospitals 291295 10118441 Contact network of patients hospitalized in Sweden - please note that due to Swedish ethics laws, this information cannot be shared outside Sweden and you will need special permission from a Swedish ethics board and the Stockholm County council to access it Liljeros paper
IMDB networks 47719 1097537 A network of connections between actors who have co-starred in films IMDB Dataset
Email 12701 20322 Email messages sent and received at the Computer Sciences Department of London's Global University Email Dataset
POK dataset -- -- The entire sequence of messages sent by users in the POK community POK Dataset
APS -- -- The co-authorship and citations of all scientific papers published in American Physical Society (APS) journals until 2005 APS Dataset
Facebook New Orleans 63731 1545685 The friend relations of New Orleans Facebook social network Facebook Dataset
Data source
Twitter mention network 2870418 4772477 Social network created by "mention" in Twitter Twitter Dataset

Research on Social Networks

  • IMDB Dataset (AdultIMDB.dat.gz): IMDB actors in adult films. We have created a network of connections between actors who have co-starred in films, whose genre has been labeled by the Internet Movie Database as ‘adult’. This network is a largely isolated sub-set of the original actor collaboration network. Additionally, all these films have been produced during the last few decades, rendering the network more focused in time. The largest component comprises 47719 actors/actresses in 39397 films. The average degree of the network is 46.0.
  • Email Dataset (Emailcontacts.dat.gz): The network of email contacts is based on email messages sent and received at the Computer Sciences Department of London’s Global University. The data have been collected in the time window between December 2006 and May 2007. Nodes in the network represent email accounts. We connect two email accounts with an undirected link in the case where at least two emails have been exchanged between the accounts (at least one email in each direction). There are 12701 nodes with an average degree of 3.2.
  • POK Dataset (pok.dat):This file contains the entire sequence of messages sent by users in the POK community. The three columns correspond to: (sending member ID) (receiving member ID) (seconds). The third column shows the time when the message was sent, measured in seconds since the very first message.

Biological Networks

Short Name Nodes Links Description Data
Metabolic (E.coli) 2895 6890 Interactions between the metabolites of E.coli in the course of the metabolic cycle Barabasi homepage
PIN yeast 777 1797 High-confidence filtered data for interactions between proteins in yeast M.Vidal paper
PIN H.sapiens 425 612 Protein interactions in homo-sapiens CCSB Interactome Database
PPI networks --- --- Protein-protein interaction networks of seven species (E. coli, S. cerevisiae, A. thaliana, C. elegans, D. melanogaster, M. musculus, H. sapiens) at different evolutionary levels [See below for more information] PPI Dataset

Research on protein-protein interaction networks

We have developed a novel filtering approach based on percolation analysis to construct the fractal structures of the present-day protein-protein interaction (PPI) networks. The original unfiltered data of present-day networks are downloaded from the STRING database.

Ancestral proteins are reconstructed using orthologous groups on different evolutionary levels. The information of orthologous groups are obtained from the eggNOG database. Ancestral interactions are reconstructed using a stochastic duplication-divergence model. For more information, please refer to our paper. All files below are .rar

  • Dataset for the PPI networks of seven species (E. coli, S. cerevisiae, A. thaliana, C. elegans, D. melanogaster, M. musculus, H. sapiens) at different evolutionary levels.
  • Algorithm and simple instructions to perform the percolation analysis.
  • Algorithm and simple instructions to perform the static analysis (fractal dimension, degree distribution, renormalization group analysis…).
  • Algorithm and simple instructions to perform the dynamic analysis (test of the multiplicative growth mechanism).
  • Mapping between ID of each node and the String identifier. File = mapping.txt.

Complex Networks

Short Name Nodes Links Description Data
WWW 325729 1090108 Linked WWW pages in the nd.edu domain Barabasi homepage
DIMES 20612 60653 Internet connections at the Autonomous System level, as found by the DIMES project DIMES homepage

Research on Fractal Complex Networks

Algorithm to perform a box covering to calculate the fractal dimension of a complex networks using the MEMB algorithm in C. All algorithms are explained in C. Song, L. K. Gallos, S. Havlin, H. A. Makse, “How to calculate the fractal dimension of a complex network- the box covering algorithm”, J. Stat. Mech. , P03006 (2007). See paper in JSTAT. See enclosed “readme.txt” file for instructions.

Algorithm to calculate the fractal dimension of a network. This file includes MEMB, CBB and random covering algorithms for Python. This package requires networkx. See file header for comments on how to use it.

Cities

Short Name Nodes Links Description Data
USA Cities -- -- Information of USA cities USA cities
GB Cities -- -- Information of GB cities for years 1981 and 1991 GB cities
Population of USA MSAs -- -- Population of USA MSAs MSA population
Boundaries of USA MSAs -- -- Boundaries of USA MSAs in 2000 MSA boundaries
Population of USA Places -- -- Population of USA Places in 2000 Places population
CO_2 emissions dataset -- -- Original input/output dataset, "Large cities are less green" CO_2 emissions dataset
GRUMPv1 dataset -- -- Global Rural-Urban Mapping Project (GRUMPv1) data on Equirectangular projection GRUMPv1 dataset
New CCA code and GRUMPv1 dataset -- -- New City Clustering Algorithm (CCA) code with associated GRUMPv1 data on Equirectangular projection New CCA code and GRUMPv1 dataset

Research on Cities

  • Download datasets we used in our research on the City Clustering Algorithm (CCA) and population growth. Here is the full dataset with readme instructions. Additionally, each datafile is below:
    • Dataset for USA in ascii format. Also available in Excel format here.
    • Dataset for GB for years 1981 and 1991.
    • Dataset for population of USA MSAs.
    • Dataset for boundaries of USA MSAs in 2000.
    • Dataset for population of USA Places in 2000.
    • Dataset regarding boundaries for Places or FIPS are too large. We make it available under request or they can be downloaded directly from the US Census Bureau.
  • Download results obtained with the CCA algorithm in the USA and GB.
    • Download FIPS corresponding to each CCA cluster, using the CCA algorithm, for l=2,3,4,5km.
    • Download FIPS corresponding to each MSA.
    • Download MSA/CCA correspondence.

CO2 Emissions Data

  • Original dataset used in the Scientific Reports paper by Oliveira, Andrade, and Makse, Large cities are less green (2014).
    • Contains input (grid_xypci.dat) and output (out_CCA_d1000_l5.dat) data for total CO2 emissions.
    • Input is arranged in columns: longitude, latitude, population, CO2, income; output was found using parameters of population threshold D* = 1000 people/km² and distance threshold = 5 km.
    • Area of each cell is approximated by 1 km²; the calculation of this area is a difficult problem involving spherical triangles.
    • This dataset is on Lambert conformal conic projection, as opposed to the commonly-used Equirectangular projection.
  • Global Rural-Urban Mapping Project (GRUMPv1) dataset; this data is already on the Equirectangular projection.
    • Data is arranged as follows:
      lx ly m lon_northwest lat_northwest (top row only)
      i   j   pop(i,j)
    • Image based on the above data.
  • New CCA (City Clustering Algorithm) code with area calculation and associated GRUMPv1 dataset.
    • Density of each cell is defined as population/area; parameters used are D* = 1500 people/km² and = 5 km.
    • Dataset is already divided for each CCA city and is on the Equirectangular projection.
    • Data is arranged as follows:
      lx ly m lon_northwest lat_northwest (top row only)
      i   j   pop(i,j) area(i,j)
    • Images based on the above data for Los Angeles, New York City, and Chicago.

Health

Short Name Nodes Links Description Data
Diabetes dataset -- -- Diabetes data for 3092 counties in USA [See below for more information] Diabetes dataset
Population dataset -- -- Population of counties for years 1969-2009 Population dataset
Population density dataset -- -- Population density of counties for years 1969-2009 (population/ square mile) Population density dataset
Obesity dataset -- -- Percentage of adult obese people (out of a 100) in each county for 2004-2008 Obesity dataset
County fips dataset -- -- Percentage of adult people with diabetes (out of a 100) in each county for 2004-2008 County fips dataset
Physical inactivity dataset -- -- Percentage of adult people who are not physically active (out of a 100) in each county for 2004-2008 Physical inactivity dataset
Employment dataset -- -- Fraction of people in a county who are Employed in any sectors of the economy. Years 1986 - 2009 Employment dataset
Food service employment dataset -- -- Fraction of people employed in the food service industry. (NAICS code 722) Years 1986 - 2009 Food service employment dataset
Food stores employment datase -- -- Fraction of people employed in food stores. (NAICS code 445) Years 1986 - 2009 Food service employment dataset
Food stores employment dataset -- -- Fraction of people employed in food stores. (NAICS code 445) Years 1986 - 2009 Food service employment dataset
Supermarket employment dataset -- -- Fraction of people employed in supermarkets. (NAICS code 44511) Years 1986 - 2009 Supermarket employment dataset
Wholesale employment dataset -- -- Fraction of people employed in Wholesale. (NAICS code 42) Years 1986 - 2009 Wholesale employment dataset
Administrative jobs dataset -- -- Fraction of people in administrative jobs. (NAICS code 56) Years 1986 - 2009 Administrative jobs dataset
Manufacturing employment dataset -- -- Fraction of people employed in Manufacturing. (NAICS code 31) Years 1986 - 2009 Manufacturing employment dataset

Obesity and cancer spreading in USA

  • These are the datasets we used in our Obesity research. See paper: Gallos, Barttfeld, Havlin, Sigman, Makse, “Collective behaviours in the spatial spreading of obesity”, Sci. Rep. 2: 454 (2012). These include data on population, Obesity, Diabetes, Food Services, Supermarkets, County Fips, Economy among others. Cancer data must be obtained directly from the website stated in the paper, since it is copyrighted.
    • Diabetes Dataset The fips file contains four columns in the following order: The first is the county ID assign by us arbitrarily, the second is the fips code of the county, the third is the state of the county and the last is the county name. these are organized alphabetically. There are 3092 counties in total, and all the files in the Census Research section contain the same number of counties.
    • population Dataset in this file the first column corresponds to the counties for years 1969 -2009. the remaining columns shoe the population for each subsequent year.
    • pop density Dataset same as above for population/ square mile.
    • Obesity Dataset Percentage of adult obese people (out of a 100) in each county for 2004-2008.
    • County fips Dataset Percentage of adult people with diabetes (out of a 100) in each county for 2004-2008.
    • Physical inactivity Dataset Percentage of adult people who are not physically active (out of a 100) in each county for 2004-2008.
    • Employment Dataset Fraction of people in a county who are Employed in any sectors of the economy. Years 1986 – 2009.
    • Food Service Employment Dataset Fraction of people employed in the food service industry. (NAICS code 722) Years 1986 – 2009.
    • Food stores employment Dataset Fraction of people employed in food stores. (NAICS code 445) Years 1986 – 2009.
    • Supermarket Employment Dataset Fraction of people employed in supermarkets. (NAICS code 44511) Years 1986 – 2009.
    • Wholesale Employment Dataset Fraction of people employed in Wholesale. (NAICS code 42) Years 1986 – 2009.
    • Administrative jobs Dataset Fraction of people in administrative jobs. (NAICS code 56) Years 1986 – 2009.
    • Manufacturing Employment Dataset Fraction of people employed in Manufacturing. (NAICS code 31) Years 1986 – 2009.

Packings

Short Name Nodes Links Description Data
Phase diagram for jammed matter RCP line -- -- RCP line in the phase diagram [See below for algorithm] RCP dataset
Phase diagram for jammed matter RLP line -- -- RLP line in the phase diagram [See below for algorithm] RLP dataset
Phase diagram for jammed matter G line -- -- G line in the phase diagram [See below for algorithm] G dataset
Experimental data on colloids and granular matter -- -- The experimental data on colloids and granular matter use to calculate the effective temperature of a colloidal glass and sheared granular matter from the particle trajectories [See below for more information] data

Research on the phase diagram for jammed matter

Algorithm and simple instructions for the protocol to prepare packings of spherical particles using Molecular Dynamics as usually done in our papers. This file includes the MD-Distinct Element Method (DEM) for Fortran. See the instruction manual to use the code.

  • Download the datasets we used in our research on the phase diagram of jammed matter that predicts RCP. This is the data published in Song, Wang, Makse, Nature 453, 629 (2008). The data contains the box size of the system, the coordination for each particle in the system and the force between any two touched particles.
    • Dataset for RCP line in the phase diagram. The instruction for reading the data here.
    • Dataset for RLP line in the phase diagram. The instruction for reading the data here.
    • Dataset for G line in the phase diagram. The instruction for reading the data here.

Research on hard sphere packings

Research on random packings of different size balls

Algorithm in Python to calculate the volume fraction of binary packings of hard spheres. Please cite the following paper if you use this code:

M. Danisch, Y. Jin, and H. A. Makse, Model of random packings of different size balls, Phys. Rev. E 81, 051303 (2010). [pdf].

The full data coordinates of all packings is here.

Research on Random Packings of Non-Spherical Particles

Algorithm and instructions to generate hard spheres packings at mechanical equilibrium from RLP to FCC. The computer simulation of jammed packing includes two independent algorithms. In the first part, the modified Lubachevsky-Stillinger (LS) algorithm is used to generate hard-sphere packings. This code was obtained from the website of Torquato at Princeton. Please cite the following papers (available at Torquato’s website) if you use this code: A. Donev, S. Torquato, and F. H. Stillinger, Neighbor List Collision-Driven Molecular Dynamics for Nonspherical Hard Particles: I. Algorithmic Details, Journal of Computational Physics, 202, 737 (2005). A. Donev, S. Torquato, and F. H. Stillinger, Neighbor List Collision-Driven Molecular Dynamics for Nonspherical Hard Particles: II. Applications to Ellipses and Ellipsoids, Journal of Computational Physics, 202, 765 (2005).

In the second part, the computer program “Trubal” applies MD or Distinct Element Method (DEM) to generate mechanically stable jammed packings of particles interacting with Hertz-Mindlin forces. TRUBAL was initially developed by Cundall and Strack and it is commonly known in the engineering literature as Discrete Element Method. Our version of the code is a more developed version based on the original Cundall code and that developed by Prof. Colin Thornton at the Department of Civil Engineering at Aston University. The original manuals with tons of explanations of the structure of DEM and the interactions considered between the particles can be downloaded here:

In our version, Trubal uses the output from LS algorithm as the initial configurations. Please refer to the following paper for details of this algorithm:

“A first-order phase transition defines the random close packing of hard spheres”, by Yuliang Jin and Hernan A. Makse, Physica A, Volume 389, 5362-5379 (2010), [pdf].

The LS part of the algorithm is used to generate packings with any volume fraction in the interval phi=0.55 (random loose packing) to phi=0.74, FCC. However, the LS algorithm does not generate mechanically stable packings as they are composed of hard spheres with no forces: A jammed packing in LS is an infinite kinetic pressure state (jamming is achieved with infinite number of collision per unit time). To find the forces that satisfy mechanical equilibrium, we run the MD part of the code applying an infinitesimal deformation to the particles in the LS packings following Hertz-Mindlin forces. Thus we generate packings in the limit of hard-sphere (zero stress pressure) satisfying force and torque balance in the entire range of volume fraction from RLP to FCC.

For more information about Trubal, please refer below to “Research on the phase diagram for jammed matter” and “Research on the entropy for jammed matter”.

Research on the entropy of jammed matter

Algorithm to the simulation of preparing the jammed packings and the calculation of the entropy from the packings. See these instructions for comments on how to use it.

  • Download datasets we used in our research on the the entropy for different friction coefficient.
    • Dataset for friction coefficient 0.00001.
    • Dataset for friction coefficient 0.05.
    • Dataset for friction coefficient 0.1.
    • Dataset for friction coefficient 0.2.
    • Dataset for friction coefficient 0.5.
    • Dataset for friction coefficient 1.
    • Dataset for friction coefficient 10000.
    • Origin files for all the datasets.

Finding all the minima and saddle points in the Potential Energy Landscape of small clusters of LJ and Hertz particles.

Algorithm and instruction to find all the minima and transition states in clusters of small number of particles interacting via Lennard-Jones or Hertz potentials. The algorithms follows improved conjugate gradient methods to find all the minima, that is that finding the mechanical stable packings. The algorithm also find the value of the angoricity of the packings. This code has been used in the paper Wang, Song, Wang and Makse, Angoricity and compactivity describe the jamming transition in soft particulate matter, EPL 91, 68001 (2010).

One data example that we used in the paper to find the minima for packings with volume fraction 0.61.

Experimental data on colloids and granular matter

This zip file contains all the experimental data on colloids and granular matter use to calculate the effective temperature of a colloidal glass and sheared granular matter from the particle trajectories. See the papers: P. Wang, C. Song, and H. A. Makse, Dynamic particle tracking reveals the aging temperature of a colloidal glass, Nature Physics 2, 526-531 (2006), [pdf] and C. Song, P. Wang, and H, A. Makse, Experimental measurement of an effective temperature for jammed granular materials, PNAS, 102, 2299 (2005), [pdf].

Long-range Correlations

Short Name Nodes Links Description Data
Generating sequences of long-range correlations -- -- The program to generate long-range correlations in 1d and 2d [See below for more information] data

Research on generating sequences of long-range correlations (in 1D and 2D)

This tar file contains the program to generate long-range correlations in 1d and 2d. The method is explained in the paper: H. A. Makse, S. Havlin, M. Schwartz, and H. E. Stanley, Method for Generating Long-Range Correlations for Large Systems, Phys. Rev. E 53, 5445-5449 (1996).