BINF GU 4002: Machine Learning for Healthcare

Staff:

Instructor: Shalmali Joshi
Teaching Assistants: Young Sang Choi and Yuta Kobayashi

Logistics:

Contact: Courseworks
Location: 622 W 168th St, PH20-200, NY-10032 on the 20th floor of the Presybterian Hospital Building (Directions)
Lecture Times: Tuesdays, Thursdays 9:00a - 10:15am
Instructor Office Hours: Shalmali Joshi: Fridays TBD at 622 W 168th St, PH-20, 402
TA Recitation and Office Hours (optional): Young Sang Choi and Yuta Kobayashi: Fridays TBD, at 622 W 168th St, PH-20, Collaboration Room

Pre-reqs:

Course Description:

Machine Learning (ML) has transformative potential for applications in health and medicine. You will learn fundamentals of machine learning, deep learning, and modeling clinical and biomedicine data using machine learning.

Objective:
By the end of this course you will be able to:

Course Material:

Course material will largely consist of lecture notes, and specific references provided for each module. No textbook is required, though some textbook chapters may be provided as additional reading.

Policy on Generative AI:

Schedule and Content | Format and Grading | Homework Assignments | Final exams

Schedule and Content

Note that schedule and content is subject to minor changes. Slides and notes will be posted on Courseworks.
Lecture number Date Day Topic Subtopics Homework timelines
1 2024-01-21 Tuesday Intro and course logistics, MIMIC-IV access, Google Colab Credits Colab accounts setup, data downloads, conda environment setup, etc. Homework 0 out with instructions on setting up Colab, accessing MIMIC-IV data
2 2024-01-23 Thursday Probability and information theory primer Homework 0 due, Homework 1 out
3 2024-01-28 Tuesday Linear algebra and optimization primer
4 2024-01-30 Thursday Introduction to supervised learning Empirical Risk Minimization, Loss functions, Model families
5 2024-02-04 Tuesday Introduction to supervised learning IID vs OOD, Bias-variance tradeoff, regularization
6 2024-02-06 Thursday Empirical practices in machine learning LOOCV, validation data, calibration, uncertainty quantification, bootstrapping
7 2024-02-11 Tuesday Principles of Maximum Likelihood Logistic regression- two views, other models motivated by maximum likelihood estimation
8 2024-02-13 Thursday Basics of probabilistic modeling and Bayesian inference Prior, Likelihood, and Posterior. Logistic and Linear Regression
9 2024-02-18 Tuesday Basics of probabilistic modeling and Bayesian inference Posterior Predictive, Exponential Familities, Maximum-a-Posteriori Homework 1 due, Homework 2 out
10 2024-02-20 Thursday Introduction to Regression Linear regression, other types of regression
11 2024-02-25 Tuesday Bayesian linear regression Derivation and connection to regularization
12 2024-02-27 Thursday Empirical practices in machine learning - revisited Comparing approaches to uncertainty quantification, best practices, etc.
13 2024-03-04 Tuesday Review of basic supervised learning Decision trees, Random Forests, XGBoost
14 2024-03-06 Thursday Introduction to deep neural networks Multilayer perceptron and connection to logistic and linear regression
15 2024-03-11 Tuesday Optimization in deep neural networks Backpropagation, stochastic gradient descent Homework 2 due, Homework 3 out
16 2024-03-13 Thursday Midterm Midterm
No class 2024-03-18 Tuesday Spring break Spring break
No class 2024-03-20 Thursday Spring break Spring break
17 2024-03-25 Tuesday Deep learning for image data Convolutional Neural Networks
18 2024-03-27 Thursday Deep learning for sequential data Recurrent Neural Networks, LSTM, State-space models, Gated Recurrent Units
19 2024-04-01 Tuesday Deep learning for networked data Graph Neural Networks
20 2024-04-03 Thursday Deep learning for sequential data Transformer: Attention-based neural networks
21 2024-04-08 Tuesday Deep learning for sequential data - contd Training paradigms for sequence based models (e.g., Seq-2-seq, decoder-only etc) Homework 3 due, Homework 4 out
22 2024-04-10 Thursday Distribution shifts, generalization, and domain adaptation Concept of generalization, types of distribution shifts, examples of implications in healthcare
23 2024-04-15 Tuesday Distribution shifts, generalization, and domain adaptation - contd. Focus on various methods of adaptation to overcome different types of distribution shifts
24 2024-04-17 Thursday Unsupervised learning History and review of classical methods, brief review of modern methods
25 2024-04-22 Tuesday Generative modeling Foundations of generative model, basic loss-functions and a broad overview of models
26 2024-04-24 Thursday Foundation models -LLMs Large-language models (Transformers but more)
27 2024-04-29 Tuesday Foundation models - Vision-language CLIP and other basic Vision-language models Homework 4 due
28 2024-05-01 Thursday Foundation models- Biological data - e.g., AlphaFold Major foundation models for biological data
29 2024-05-05 Monday Finals


Format and Grading

Lectures held twice a week (5 points for participation in-person)

Homeworks:Problem Sets: Each homework will be a combination of numeric/conceptual/computational problem and data-driven/modeling questions (4 homeworks, 40 points). A 'Homework 0' will be assigned in the first week on basics of probability and linear algebra.

Homework Assignments (40 points)

Homeworks will be assigned on Courseworks. Homeworks will be due 3 weeks after they are assigned. We have ensured ample time for all homeworks. However, we are allowing two slack days usable over the semester. If you are using a slack day for a homework, mention this on your homework submission (otherwise the TAs will assume late submission). You are free to use the slack days for any of the homeworks.

Exams

There will be one mid-term (25 points) and one final exam (30 points). Both exams will be for the duration of the class. You will be allowed one Letter-sized cheat sheet for formulae only. You will return the cheat sheet along with the exam for grading.