Course Materials

Introduction to Python Programming and Data Science

Since the Fall of 2014, we have run an intensive introductory course on Python programming at Northwestern University. Our bootcamp has been attended by about 350 members of the Northwestern community, including undergraduate and graduate students, postdoctoral fellows, faculty and staff.

Software

NetCarto

netcarto is a command line tool for finding modules (and node roles) by maximizing modularity with simulated annealing.

NullSeq

NullSeq is a tool for generating random nucleotide coding sequences with amino acid and GC content constraints.

PyGrace

The pygrace project is a set of tools designed to act as an interface between the Python programming language and the Grace plotting tool.

Topic Mapping

TopicMapping finds topics in a set of documents using network clustering (Infomap) as a guess for LDA likelihood optimization. This guide will tell you how to compile, what are the input and output format, how to tune the algorithm’s parameters, and some more.

WALDO: Worm Analysis For Live Detailed Observation

Worm Analysis For Live Detailed Observation (WALDO), is a tool used to evaluate and clean worm behaviour data.

Ziggy

Ziggy provides a collection of python methods for Hadoop Streaming. Ziggy is useful for building complex MapReduce programs, using Hadoop for batch processing of many files, Monte Carlo processes, graph algorithms, and common utility tasks (e.g. sort, search).

Data Sets

Gun violence at US Schools

This dataset accompanies the work in "Economic insecurity and the rise in gun violence at US schools".

Passing and Shooting Data for Euro 2008

Companion data for "Quantifying the performance of individual players in a team activity" by Duch J, Waitzman JS, Amaral LAN (PLoS ONE 5, e10937, 2010).

Researchers biographical and publication data

Companion dataset to "The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact" by Duch J, Zeng XHT, Sales-Pardo M, Radicchi F, Otis S, Woodruff TK, Amaral LAN ( PLoS ONE 7, e51332, 2012).

US film citation network

List of citations to films produced in the US by other films also produced in the US.

Guides

Collections

From a programmer's perspective, taking care of memory and time are the most important issues. Computers have limited memory and accessing it has a computational cost.

The first step I always do before I start programming is to think about the problem and the data structure. To define the data structure well, it is necessary to know what will be the way to access the data. For example, how to iterate over the elements, access an element or insert elements. Also, it is important to know the relationship between the elements: are they unique, do they aggregates, or is there an order?

High performance computing on Quest

An overview of how to use the high performance computational resources of Quest as an Amaral lab member

Interactive web graphics with d3.js

A presentation that details making a simple, interactive line graph in the browser using d3.js. A mercurial repository accompanies this presentation and contains the demo for following along.

IPython Notebooks

This was a presentation and sample notebook that I whipped up for the Amaral lab to explain why and some basic hows of using iPython notebooks. This goes along with my previous blog post and has this gist to go along with it.

Mercurial: The long tutorial

A near exhaustive tutorial to get you through working in Mercurial, vetted and constructed from the minds of Irmak Sirer and Adam Pah. If this can't get you up to speed, then we might need to start worrying.

Mercurial: Why we switched

This is the presentation explaining the differences between Mercurial and SVN and why we switched as a lab (this changeover happened in late 2011). This is a simple explanation of a distributed version control system and possible workflows.

Mounting a remote folder on OS X over SSH

The current project I am working on needs to access to a folder on a remote server. It seems to be a simple task, but there is one issue: I am a Mac user.

Mounting a server folder is very useful if you have a lot of data to share with your colleagues. It is insane to copy it to your hard drive every time it changes or manage large amounts of data with version control since it will slow down the repository.

The best solution we found in the lab is using SSH and mounting folders using sshfs. It works really well in Linux and we don't want to use a different system for other operating systems.

Mounting a remote folder on OS X over SSH (Yosemite)

This is an update of my previous post that you can read here.

The current project I am working on needs to access to a folder on a remote server. It seems like a simple task, but there is one issue: I am a Mac user.

Mounting a server folder is very useful if you have a lot of data to share with your colleagues. It is insane to copy it to your hard drive every time that it changes or manage large amounts of data with version control since it will slow down the repository.

The best solution we found in the lab is using SSH and mounting folders using sshfs. It works really well in Linux and we don't want to use a different system for other operating systems.

pyenv Tutorial

Meet pyenv: a Simple Python Version Management tool. Previously known as Pythonbrew, pyenv lets you change the global Python version, install multiple Python versions, set directory-specific Python versions, and create/manage virtual python environments. All this is done on *NIX-style machines (Linux and OS X) without depending on Python itself and it works at the user-level–no need for any sudo commands. So let’s start!

Setting up a new development environment

Setting up your development environment on a new computer can be a pain. This guide will show you how you can take your existing environment and put them into an installer script.

Speed up your Python & Numpy codes

If you run short simulations, you may tell yourself that you don’t need faster code because it only takes a few of seconds -or up to a couple of minutes- and you don’t want to “waste” your time learning non interesting coding tricks. However, my experience tells me than good programming habits are easier to learn than bad ones, they decrease the probability of having bugs in your code, and you'll have a clearer and better organized result.

Visualizations

MetExplore

An interactive application to explore the network structure of metabolism across species.