BINF GU 4008 Section 3 / COMS 4995 Section 14: Advanced Machine Learning for Health and Medicine

IMPORTANT NOTICE: From September 15th onwards, our course will meet on Fridays 2p-4p at Uris 306 (not ENG 253).


Instructor: Shalmali Joshi
Teaching Assistant: Young Sang Choi


Contact: Courseworks
Time: Friday 2:00pm - 4:00pm
Location: Uris 306, 3022 Broadway, New York, NY 10027
Office Hours: Shalmali Joshi: Thursdays 9:00a-10:00a at 622 W 168th St, PH-20, 402; Young Sang Choi: Fridays 4:00pm - 5:00pm, Uris 306


Course Description:

Machine Learning (ML) has transformative potential for applications in health and medicine. The complexity of healthcare and medicine has highlighted foundational challenges in ML such as lack of generalization, robustness, safety, inequity, and statistical interpretability. Traditional ways of measuring the utility of ML models often do not reflect clinical/medical benefits resulting in technical innovations in ML methods.
Exciting opportunities have opened up for methodological progress in Machine Learning motivated by health and medicine applications. The availability of large de-identified multi-institutional datasets for healthcare and medicine from around the world has accelerated progress in ML for health. These data are from healthcare providers and are patient-facing as opposed to traditional medical data, which is well-curated and collected for specific tasks. Patient-facing datasets are rife with some of the most complex statistical artifacts that require innovative ML solutions to improve over the state-of-the-art in health and medicine.
In this course, you will learn about complexities that make health and medicine data unique and how it opens up opportunities for advanced AI. You will learn advanced Machine Learning methods useful in health and medicine applications, for example, time-series modeling, reinforcement learning, probabilistic modeling, causal inference, foundation models, unsupervised learning, and self-supervised learning. I will further provide an overview of challenges such as fairness, interpretability, generalization, robustness, safety, and policy implications of ML in health and medicine. The course will train students to map real-world challenges of working with health and medical data to statistical challenges that require new and advanced ML methods.

By the end of this course you will be able to:

Course Material:

Course material will largely consist of lecture notes and research papers. No textbook is required, some textbook chapters may be provided as additional reading.

Schedule and Content | Format and Grading | Homework Assignments | Final projects

Schedule and Content

Note that schedule and content is subject to minor changes. Slides and notes will be posted on Courseworks.
Class Date Topic Reading Assignments Reflection Questions Projects/Homework timelines
1 9/8
9/7 (Thu)
6-8 PM
at Uris 140
1.1 Introduction to health and medicine data
1.2 History of AI/ML in health
1.3 Statistical challenges in health and medicine data
1. Big Data In Health Care: Using Analytics To Identify And Manage High-Risk And High-Cost Patients
2. A Review of Challenges and Opportunities in Machine Learning for Health
No reflection questions Homework #1 out by midnight today
2 9/15 2.1 Supervised Learning in Healthcare
2.2 Preventing data leakage
2.3 Learning with noisy labels
2.4 Positive and Unlabeled Learning
2.5 Shortcut Learning
1. Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis
2. Using Anchors to Estimate Clinical State without Labeled Data
3. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission
Reflection questions #1 due before class.
Homework #1 continues
3 9/22 3.1 Medical Imaging modalities
3.2 Convolutional Neural Networks, ResNet, ViT
3.3 Common tasks in medical imaging
3.4 State-of-the-art deep neural networks in medical imaging
1. Mammographic Breast Density Assessment Using Deep Learning: Clinical Implementation
2. Data-efficient and weakly supervised computational pathology on whole-slide images
Reflection questions #2 due before class.
Homework #1 continues
4 9/29 5.1 Time-series modeling in health
5.2 Factorial switching dynamic models
5.3 State-space models
5.4 Deep learning for time-series modeling (RNN, LSTM, Attention)
1. A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data
2. Probabilistic detection of short events, with application to critical care monitoring
Reflection questions #3 due before class.
Homework #2 ongoing
Project proposals due
5 10/6 4.1 Survival Modeling Basics
4.2 Censoring
4.3 Survival Modeling using Deep Learning
1. An Introduction to Survival Analysis Math
2. Deep Survival Analysis
Reflection questions #4 due before class.
Homework #1 due
Homework #2 out by midnight today
6 10/13 6.1 Causal Inference in Healthcare
6.2 Introduction to Structural Causal Models, Potential Outcomes framework
6.3 Causal view of structural biases in the data
6.4 Average Treatment Effect, Conditional Average Treatment Effect, Effect of Treatment on the Treated
6.5 Causal Machine Learning
1. Chapters 1, 2, and 3 of What If book by Miguel Hernan and James Robins
2. Death by Round Numbers: Glass-Box Machine Learning Uncovers Biases in Medical Practice
Reflection questions #5 due before class.
Homework #2 ongoing
7 10/20 7.1 Overview of Markov Decision Processes
7.2 Offline Off-policy Evaluation and Learning
7.3 Model based RL, Causal view of RL
7.4 Causal view of Off-policy RL
7.5 Applications in Healthcare
1. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units
2. Evaluating Reinforcement Learning Algorithms in Observational Health Settings
3. Confounding-Robust Policy Improvement
Reflection questions #6 due before class.
Homework #2 due
Homework #3 out by midnight today
8 10/27 8.1 Generalization of Machine Learning
8.2 Distribution Shifts: Types of distribution shifts
8.3 Domain adaptation, transfer learning
8.4 Causal view of distribution shift
8.5 Algorithms for robustness in Supervised Learning, Reinforcement Learning
8.6 Guest Lecture: Harvineet Singh, PhD (Postdoctoral Fellow, UCSF on Responsible ML for Health)
1.The Clinician and Dataset Shift in Artificial Intelligence
2. Factors Associated With Variability in the Performance of a Proprietary Sepsis Prediction Model Across 9 Networked Hospitals in the US
Reflection questions #7 due before class.
Homework #3 ongoing
9 11/3 9.1 Self-supervised Learning in Health
9.2 Contrastive Learning and Meta-Learning in Health
9.3 Guest Lecture: Pranav Rajpurkar, PhD Stanford (Assistant Professor, Harvard University, DBMI)
1. Self-supervised learning in medicine and healthcare
2. Leveraging Time Irreversibility with Order-Contrastive Pre-training
Reflection questions #8 due before class.
Homework #3 ongoing
10 11/10 10.1 Foundation Models Basics
10.2 Overview of Large Language Models, Foundation Models in Healthcare (Unimodal, Multimodal)
10.3 Discussion of metrics, evaluation, future directions
10.4 Guest Lecture: Monica Agarwal, PhD MIT
1. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data
2. UniverSeg: Universal Medical Image Segmentation
3. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing
4. Event Stream GPT: A Data Pre-processing and Modeling Library for Generative, Pre-trained Transformers over Continuous-time Sequences of Complex Events
5. Optional: Leveraging medical Twitter to build a visual–language foundation model for pathology AI
Reflection questions #9 due before class.
Homework #3 due
11 11/17 11.1 Interpretability for Health and Medicine
11.2 Conceptual overview of methods, ideas
11.3 Challenges of viewing interpretability narrowly
11.4 Guest lecture: Daksh Mittal (PhD student, Columbia Business School) and Yuanzhe Ma,(PhD student IE/OR) on Uncertainty quantification for interpretability in deep learning

1. “Why did the Model Fail?”: Attributing Model Performance Changes to Distribution Shifts
2. The false hope of current approaches to explainable artificial intelligence in health care
Reflection questions #10 due before class.
Projects ongoing
12 12/1 12.1 Ethics, Safety, and Equity of ML in Healthcare
12.2 Modeling frameworks for safe and equitable ML in healthcare
12.3 Regulation of ML/AI in Healthcare
12.4 Guest Lecture: Adarsh Subbaswamy PhD; (Staff Fellow (Regulatory Scientist) at the U.S. FDA
in the Division of Imaging Diagnostics and Software Reliability
at the Center for Devices and Radiological Health)

1. Ethical Machine Learning in Healthcare
2. Ethical limitations of algorithmic fairness solutions in health care machine learning
3. Artificial Intelligence and Machine Learning in Software as a Medical Device
4. Learning-to-defer for sequential medical decision-making under uncertainty
Reflection questions #11 due before class.
Projects ongoing
13, 14 12/8, 12/15 Project Presentations N/A No reflection questions. Classroom discussion encouraged.
Project reports due

Format and Grading

Lectures held once a week (5 points for participation in-person)


Homework Assignments

Homeworks will be assigned on Courseworks. Homeworks will be due 3 weeks after they are assigned. We have ensured ample time for all homeworks. However, we are allowing two slack days usable over the semester. If you are using a slack day for a homework, mention this on your homework submission (otherwise the TA will assume late submission). You are free to use the slack days for any of the homeworks. No slack days for project submission is allowed (unless you have a medical emergency, in which case please contact the instructor).


The goal of the projects is to identify a meaningful problem in health and medicine that can be addressed using Machine Learning. The project can be a new method, a new application, or a new dataset. Each type of project requires careful consideration of: The project can be done individually or in groups of 2 students maximum. The project will be presented in class at the end of the semester. The project report will be in NeurIPS paper format (8 pages excluding references).
Potential sources of data (list is not exhaustive) Instructor(s) will work with students to manage (temporary) data access to above data sources. Other data sources/tasks can be used in consultation with the instructor. Students will work closely with the instructor to scope the project and frequently update the instructor on progress on their projects at office hours. Please see the "Format and Grading" section for more details on the project.