BINF GU 4008 Section 3 / COMS 4995 Section 14: Advanced Machine Learning for Health and Medicine

IMPORTANT NOTICE: From September 15th onwards, our course will meet on Fridays 2p-4p at Uris 306 (not ENG 253).

Staff:

Instructor: Shalmali Joshi
Teaching Assistant: Young Sang Choi

Logistics:

Contact: Courseworks
Time: Friday 2:00pm - 4:00pm
Location: Uris 306, 3022 Broadway, New York, NY 10027
Office Hours: Shalmali Joshi: Thursdays 9:00a-10:00a at 622 W 168th St, PH-20, 402; Young Sang Choi: Fridays 4:00pm - 5:00pm, Uris 306

Pre-reqs:

Machine Learning (COMS W 4771-001) or equivalent (BINF G 4002) with grade B or higher
Familiarity with Python and one or more of: Pytorch, Tensorflow, JAX
Basic knowledge of probability, statistics, and linear algebra

Course Description:

Machine Learning (ML) has transformative potential for applications in health and medicine. The complexity of healthcare and medicine has highlighted foundational challenges in ML such as lack of generalization, robustness, safety, inequity, and statistical interpretability. Traditional ways of measuring the utility of ML models often do not reflect clinical/medical benefits resulting in technical innovations in ML methods.
Exciting opportunities have opened up for methodological progress in Machine Learning motivated by health and medicine applications. The availability of large de-identified multi-institutional datasets for healthcare and medicine from around the world has accelerated progress in ML for health. These data are from healthcare providers and are patient-facing as opposed to traditional medical data, which is well-curated and collected for specific tasks. Patient-facing datasets are rife with some of the most complex statistical artifacts that require innovative ML solutions to improve over the state-of-the-art in health and medicine.
In this course, you will learn about complexities that make health and medicine data unique and how it opens up opportunities for advanced AI. You will learn advanced Machine Learning methods useful in health and medicine applications, for example, time-series modeling, reinforcement learning, probabilistic modeling, causal inference, foundation models, unsupervised learning, and self-supervised learning. I will further provide an overview of challenges such as fairness, interpretability, generalization, robustness, safety, and policy implications of ML in health and medicine. The course will train students to map real-world challenges of working with health and medical data to statistical challenges that require new and advanced ML methods.

Objective:
By the end of this course you will be able to:

Get a foundational overview of state-of-the-art ML in healthcare
Distill challenges in health and medicine data to technical challenges to address using ML
Develop new ML methods focused on health/medicine tasks
Better equipped to evaluate and validate of ML in health and medicine

Course Material:

Course material will largely consist of lecture notes and research papers. No textbook is required, some textbook chapters may be provided as additional reading.

Schedule and Content

Note that schedule and content is subject to minor changes. Slides and notes will be posted on Courseworks.

Class	Date	Topic	Reading Assignments	Reflection Questions	Projects/Homework timelines
1	~~9/8~~ 9/7 (Thu) 6-8 PM at Uris 140	1.1 Introduction to health and medicine data 1.2 History of AI/ML in health 1.3 Statistical challenges in health and medicine data	1. Big Data In Health Care: Using Analytics To Identify And Manage High-Risk And High-Cost Patients 2. A Review of Challenges and Opportunities in Machine Learning for Health	No reflection questions	Homework #1 out by midnight today
2	9/15	2.1 Supervised Learning in Healthcare 2.2 Preventing data leakage 2.3 Learning with noisy labels 2.4 Positive and Unlabeled Learning 2.5 Shortcut Learning	1. Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis 2. Using Anchors to Estimate Clinical State without Labeled Data 3. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission	Reflection questions #1 due before class.	Homework #1 continues
3	9/22	3.1 Medical Imaging modalities 3.2 Convolutional Neural Networks, ResNet, ViT 3.3 Common tasks in medical imaging 3.4 State-of-the-art deep neural networks in medical imaging	1. Mammographic Breast Density Assessment Using Deep Learning: Clinical Implementation 2. Data-efficient and weakly supervised computational pathology on whole-slide images	Reflection questions #2 due before class.	Homework #1 continues
4	9/29	5.1 Time-series modeling in health 5.2 Factorial switching dynamic models 5.3 State-space models 5.4 Deep learning for time-series modeling (RNN, LSTM, Attention)	1. A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data 2. Probabilistic detection of short events, with application to critical care monitoring	Reflection questions #3 due before class.	Homework #2 ongoing Project proposals due
5	10/6	4.1 Survival Modeling Basics 4.2 Censoring 4.3 Survival Modeling using Deep Learning	1. An Introduction to Survival Analysis Math 2. Deep Survival Analysis	Reflection questions #4 due before class.	Homework #1 due Homework #2 out by midnight today
6	10/13	6.1 Causal Inference in Healthcare 6.2 Introduction to Structural Causal Models, Potential Outcomes framework 6.3 Causal view of structural biases in the data 6.4 Average Treatment Effect, Conditional Average Treatment Effect, Effect of Treatment on the Treated 6.5 Causal Machine Learning	1. Chapters 1, 2, and 3 of What If book by Miguel Hernan and James Robins 2. Death by Round Numbers: Glass-Box Machine Learning Uncovers Biases in Medical Practice	Reflection questions #5 due before class.	Homework #2 ongoing
7	10/20	7.1 Overview of Markov Decision Processes 7.2 Offline Off-policy Evaluation and Learning 7.3 Model based RL, Causal view of RL 7.4 Causal view of Off-policy RL 7.5 Applications in Healthcare	1. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units 2. Evaluating Reinforcement Learning Algorithms in Observational Health Settings 3. Confounding-Robust Policy Improvement	Reflection questions #6 due before class.	Homework #2 due Homework #3 out by midnight today
8	10/27	8.1 Generalization of Machine Learning 8.2 Distribution Shifts: Types of distribution shifts 8.3 Domain adaptation, transfer learning 8.4 Causal view of distribution shift 8.5 Algorithms for robustness in Supervised Learning, Reinforcement Learning 8.6 Guest Lecture: Harvineet Singh, PhD (Postdoctoral Fellow, UCSF on Responsible ML for Health)	1.The Clinician and Dataset Shift in Artificial Intelligence 2. Factors Associated With Variability in the Performance of a Proprietary Sepsis Prediction Model Across 9 Networked Hospitals in the US	Reflection questions #7 due before class.	Homework #3 ongoing
9	11/3	9.1 Self-supervised Learning in Health 9.2 Contrastive Learning and Meta-Learning in Health 9.3 Guest Lecture: Pranav Rajpurkar, PhD Stanford (Assistant Professor, Harvard University, DBMI)	1. Self-supervised learning in medicine and healthcare 2. Leveraging Time Irreversibility with Order-Contrastive Pre-training	Reflection questions #8 due before class.	Homework #3 ongoing
10	11/10	10.1 Foundation Models Basics 10.2 Overview of Large Language Models, Foundation Models in Healthcare (Unimodal, Multimodal) 10.3 Discussion of metrics, evaluation, future directions 10.4 Guest Lecture: Monica Agarwal, PhD MIT	1. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data 2. UniverSeg: Universal Medical Image Segmentation 3. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing 4. Event Stream GPT: A Data Pre-processing and Modeling Library for Generative, Pre-trained Transformers over Continuous-time Sequences of Complex Events 5. Optional: Leveraging medical Twitter to build a visual–language foundation model for pathology AI	Reflection questions #9 due before class.	Homework #3 due
11	11/17	11.1 Interpretability for Health and Medicine 11.2 Conceptual overview of methods, ideas 11.3 Challenges of viewing interpretability narrowly 11.4 Guest lecture: Daksh Mittal (PhD student, Columbia Business School) and Yuanzhe Ma,(PhD student IE/OR) on Uncertainty quantification for interpretability in deep learning	1. “Why did the Model Fail?”: Attributing Model Performance Changes to Distribution Shifts 2. The false hope of current approaches to explainable artificial intelligence in health care	Reflection questions #10 due before class.	Projects ongoing
12	12/1	12.1 Ethics, Safety, and Equity of ML in Healthcare 12.2 Modeling frameworks for safe and equitable ML in healthcare 12.3 Regulation of ML/AI in Healthcare 12.4 Guest Lecture: Adarsh Subbaswamy PhD; (Staff Fellow (Regulatory Scientist) at the U.S. FDA in the Division of Imaging Diagnostics and Software Reliability at the Center for Devices and Radiological Health)	1. Ethical Machine Learning in Healthcare 2. Ethical limitations of algorithmic fairness solutions in health care machine learning 3. Artificial Intelligence and Machine Learning in Software as a Medical Device 4. Learning-to-defer for sequential medical decision-making under uncertainty	Reflection questions #11 due before class.	Projects ongoing
13, 14	12/8, 12/15	Project Presentations	N/A	No reflection questions. Classroom discussion encouraged.	Project reports due

Format and Grading

Lectures held once a week (5 points for participation in-person)

Homeworks:

Problem Sets: Each homework is a mini-project divided into 3 problem-based questions (3 homeworks, 30 points)
Reading assignments: Reading will be assigned prior to each lecture. Students will be expected to answer reflection questions associated with all reading assignments submitted prior to the lecture (15 points for completing reading assignments and corresponding questions).
Final project - Each individual will propose and execute their own project. Project proposals due end of week 4, final projects in NeurIPS paper format (8 pages excluding references) due end of the semester (50 points).

Homework Assignments

Homeworks will be assigned on Courseworks. Homeworks will be due 3 weeks after they are assigned. We have ensured ample time for all homeworks. However, we are allowing two slack days usable over the semester. If you are using a slack day for a homework, mention this on your homework submission (otherwise the TA will assume late submission). You are free to use the slack days for any of the homeworks. No slack days for project submission is allowed (unless you have a medical emergency, in which case please contact the instructor).

Projects

The goal of the projects is to identify a meaningful problem in health and medicine that can be addressed using Machine Learning. The project can be a new method, a new application, or a new dataset. Each type of project requires careful consideration of:

What is the health or medicine problem?
Why is it an important?
What is the current state of the art?
What is the gap in the current state of the art?
Why is Machine Learning a good solution to address this gap?
In what ways do you expect ML to address existing limitations?
Are existing ML methods good enough to address your problem?
If yes, what application-specific considerations are crucial for successfully addressing the problem?
If no, what new methods are needed to address the problem?
What is your methodological advancement to address this issue?
What is a good evaluation of i) the validity of the method, ii) the relevance to the health/medicine problem, iii) utility to the applicaiton of interest
What are the limitations of your method? What are the considerations before I can declare that my method is the best out there?
What are the future directions that you think might improve on your proposal?

The project can be done individually or in groups of 2 students maximum. The project will be presented in class at the end of the semester. The project report will be in NeurIPS paper format (8 pages excluding references).
Potential sources of data (list is not exhaustive)

Instructor(s) will work with students to manage (temporary) data access to above data sources. Other data sources/tasks can be used in consultation with the instructor. Students will work closely with the instructor to scope the project and frequently update the instructor on progress on their projects at office hours. Please see the "Format and Grading" section for more details on the project.