For Students

Our Lab advances the objectives of five of the 10 Big Ideas for Future NSF Investments:
  1. “Harnessing Data for 21st Century Science and Engineering”: Big-Data Science
  2. “Understanding the Rules of Life: Predicting Phenotype”: From Structure to Function
  3. “Work at the Human-Technology Frontier: Shaping the Future”: Artificial Intelligence
  4. “Growing Convergent Research at NSF”: Interdisciplinary Research
  5. “NSF INCLUDES: Enhancing Science and Engineering through Diversity”: Diversity
This document describes all of the ten big ideas that will push forward the frontiers of research across all NSF-funded fields.

Please find here a list of important papers compiled by the graduate students in this lab. These papers range from introductory to technical, general to project-specific, so you should be able to get a good idea of the types of research we are conducting.

Thinking of learning R or Python? Please see the following compilation of resources.

A little food for thought: Dr. Richard Feynman on the difference between Mathematics and Physics; followed by Dr. Murray Gell-Mann on truth, beauty, and physics; Dr. Steven Weinberg on whether mathematics is invented or discovered; and finishing with Jonathan Pillow on statistical modelling of neural data.

  • For prospective students and postdocs: Software

    For prospective students and postdocs: Software

    The following is a list of software that are used in our lab to analyze networks and data. Students and postdocs should be (or become) familiar with these methods.

    Low Expertise Required:

    MATLAB — generally used for data cleaning, data analysis, calculations, plot generation, etc.; you can get as simple or complicated as you want with it
    Complex Networks toolbox —MATLAB toolbox by Lev Muchnik for analysis of complex networks; includes a k-shell decomposition algorithm
    Machine learning toolbox — MATLAB toolbox for (basic) machine learning
    Community detection/modularity algorithm
    — MATLAB code used to find community structure and modularity of a network
    Network attributes algorithm — MATLAB code used to find network components, sizes, and lists of member nodes
    Python — another general-use platform; again, uses can range in complexity
    NetworkX — Python library used to find basic attributes of a network, such as the degree distribution
    graph-tool — Python library for fast component decomposition, finding modularity, large network visualization
    pandas — Python library used for data management
    NumPy — Python library used for vector and matrix operations
    SciPy — Python library for statistics, hypothesis testing, regression, and numerical computation
    Beautiful Soup — Python library used for website scraping
    Scikit-learn — Python library used for basic machine learning methods, including GLasso and stochastic gradient descent
    ImageJ — Java image processing program used for optical CT imaging analysis
    Gephi — visualization and analysis software for networks ***CAN BE BUGGY — SAVE WORK OFTEN***
    Pajek — general network visualization software

    Low/medium Expertise Required:

    SQLite — used for Twitter data management and analysis

    Medium Expertise Required:

    Graphical Lasso (GLasso) algorithm — MATLAB code used to find a sparse inverse correlation matrix
    Collective Influence algorithm — C code implementation of Collective Influence algorithm; can be downloaded on the Software page
    Monte Carlo for Maximum Entropy XY model — C code to find interaction matrix for network which can be modelled via a Maximum Entropy XY model ***BEST FOR VERY SMALL NETWORKS***
    FMRIB Software Library (FSL) — used for model-based FMRI analysis (FEAT) and modelling the brain (BET)
    BrainNet Viewer — brain network visualization software

    Medium/high Expertise Required:

    Medical imaging toolbox — MATLAB toolbox specifically for medical imaging
    Natural Language Toolkit — platform for building Python code used in natural language processing (e.g., on Twitter)

    High Expertise Required:

    TensorFlow — used for Deep Learning development in machine learning

    For computer analysis you will need:

    Anaconda for Python 3.6
    Gephi (link is above)
    A Twitter account

    For an introduction to Twitter network analysis, please see the following tutorial by postdoc Alexandre Bovet.

    You can find further videos and tutorials pertinent to our research here, courtesy of the NIPS conference.

  • For prospective students: Courses

    For prospective students: Courses

    The courses below will allow you to analyze Big Data in a variety of circumstances ranging from systems biology, to ecology, to social networks and finance:

    Complex Networks at the Graduate Center – Physics – PHYS85200 – CRN 23395 – Professor H. Makse
    This is my course on Network Theory; please see the syllabus.

    Machine Learning at the Graduate Center – Computer Science – CSC74020 – Professor R. Haralick or Professor C. Yuan
    Professor Haralick focuses more on the theoretical aspect while Professor Yuan focuses more on Natural Language Processing.

    Big Data Analysis: Principles and Methods at the Graduate Center – Physics – PHYS85200 – CRN 32250 – Professor G. Patz
    More application than theory, this course is a good introduction to the topic.

    Finance for Scientists at the Graduate Center – Physics – PHYS85200 – CRN 30235 – Professor T. Schäfer
    This course provides a good mathematical background on stochastic processes.

    Computational Methods in Physics at the Graduate Center – Physics – PHYS85200 – CRN 23394 – Professor A. Poje
    Ideal for those who have some experience in programming but want to become more comfortable with applications such as Monte Carlo methods.

    The following courses cover theoretical principles important to the core of our research program, and in fact, the first two are mandatory for first-year Ph.D. students at the Graduate Center:

    Statistical Mechanics at the Graduate Center – Physics – PHYS74100

    Mathematical Methods in Physics at the Graduate Center – Physics – PHYS70100

    Quantum Information Theory at the Graduate Center – Physics – PHYS85200

    Quantum Theory of Fields I & II at the Graduate Center – Physics – PHYS82500 and PHYS82600, respectively

    There are also courses outside the CUNY system, which I suggest that you look into if you have time. New York University has a Center for Data Science, as does Columbia University. Some examples of online courses offered are:

    Computational Physics – PHYS-GA-2000

    Non-equilibrium Statistical Physics – PHYS-GA-2061

    Online courses are also important to our field of study:

    Deep Learning is an important subject for any data scientist to know, although there is no course currently offered in the CUNY system. My students are self-taught or take online courses.

    If you are learning the Python programming language (the language for Data Science), the Python Data Science Handbook is a very useful resource, as are Python courses that can be found at Coursera or edX.

    For Data Science, Machine Learning, and Big Data Analysis, most of my students use Python, MATLAB, C, C++, Mathematica, and other languages. Please see “For prospective students and postdocs: Software” for further details.

    There are also a great many online courses on applications of Data Science that can be found here. They are mostly (if not all) free, and range in difficulty level from introductory, like “Introduction to Python for Data Science,” to advanced, like “Case Studies in Functional Genomics.” There is even, at the time of this writing, an introductory course in the application of Data Analysis to biological systems, called “Introduction to Bio: Annotation and Analysis of Genomes and Genomic Assays.”

    The above are a sampling of what my students found online, so you can also look into it further.