Performant Statistical Computing and Algoware Development (Biostatistics 140.850)

Computational statistics plays a central role in extracting meaningful insights from increasingly large and complex data in modern applications. Statistical models, computational algorithms, and software implementations are often treated as separate subjects, yet in reality the three aspects must work together to yield best statistical results. This course explores how to design efficient statistical inference algorithms and effective software under various practical considerations. (We use a portmanteau “algo-ware” to emphasize interdependence between an algorithm and its software implementation.)

We will cover a range of topics, familiarity with which are critical for becoming effective statistical algo-ware developers / contributors. These include, among others, 1) common gotchas from finite precision arithmetic and 2) how modern hardware architecture affects algorithm performance. Relevant statistical models and algorithms are introduced along the way. Also discussed are a small sample of algorithms commonly used for modern large-scale data.

Many of the topics are best learned by doing, so homework is an essential part of this course. Some assignments will be of open-ended nature and may involve readings (or Googlings) beyond what is covered in the class.