Schedule

Given the pace of innovations in this area, the following list is subject to change.

Jump to the current module

Introduction

Aug 18
Course Introduction
Anand
Paper Presentation Preferences Fill out the form here
Aug 20
Topics, Background & Challenges
Anand
πŸ“– Challenges and Applications of Large Language Models
πŸ“– An Open Source Stack for AI Compute
Aug 22
Paper Presentation Preferences Due

Basics & Project

Pre-training

Post-Training

Inference

Oct 13
Single Instance Serving
πŸ“– NanoFlow: Towards Optimal Large Language Model Serving Throughput Required Keshav
πŸ“– DistServe: Disaggregating Prefill and Decoding for Goodput-Optimized Large Language Model Serving Required Xinyu
πŸ“– Improving DNN Inference Throughput Using Practical, Per-Input Compute Adaptation
πŸ“– Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
πŸ“– Orca: A Distributed Serving System for Transformer-Based Generative Models
πŸ“– Efficient Memory Management for Large Language Model Serving with PagedAttention
Oct 15
Multi Instance Serving
πŸ“– Llumnix: Dynamic Scheduling for Large Language Model Serving Required Oytun
πŸ“– Mooncake: Trading More Storage for Less Computation β€” A KVCache-centric Architecture for Serving LLM Chatbot Required Ziyi
πŸ“– BlitzScale: Fast and Live Large Model Autoscaling with O(1) Host Caching
πŸ“– AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Oct 20
Diffusion
πŸ“– Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models Required Mursalin
πŸ“– DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling Required Hritvik
Oct 22
Multimodality
πŸ“– ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving Required Rohit
πŸ“– DREAM: A Dynamic Scheduler for Dynamic Real-Time Multi-Model ML Workloads Required Ikhyun
Oct 27
Multimodality - Vision I
πŸ“– LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models Required Kalit
πŸ“– A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
πŸ“– An Image is Worth 1/2 Tokens After Layer 2: Plug-and-PLay Acceleration for VLLM Inference Required Hritvik
πŸ“– Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
πŸ“– [CLS] Token Tells Everything Needed for Training-free Efficient MLLMs
Oct 29
Multimodality - Vision II
πŸ“– MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Required Suyeon
πŸ“– ViLA: Efficient Video-Language Alignment for Video Question Answering
πŸ“– M-LLM Based Video Frame Selection for Efficient Video Understanding Required Chengyin

Agentic Systems

Nov 3
Workflow Optimization I
πŸ“– Parrot: Efficient Serving of LLM-based Applications with Semantic Variable Required Shangqing
πŸ“– Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning Required Uma
Nov 5
Workflow Optimization II
πŸ“– DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving Required Mursalin
πŸ“– Towards End-to-End Optimization of LLM-based Applications with Ayo Required Kewen
πŸ“– Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Nov 10
RAGs
πŸ“– METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation Required Vedaang
πŸ“– TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
πŸ“– RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving Required Kalit
πŸ“– LEANN: A Low-Storage Vector Index
Nov 12
Reasoning
πŸ“– Efficiently Serving LLM Reasoning Programs with Certaindex Required Rohit
πŸ“– ReAct: Synergizing Reasoning and Acting in Language Models Required Jae Hyung
Nov 17
Applications I
πŸ“– Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents Required Ashutosh
πŸ“– Mathematical Discoveries from Program Search with Large Language Models (FunSearch)
πŸ“– AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery Required Misun
Nov 19
Applications II
πŸ“– NetLLM: Adapting Large Language Models for Networking Required Uma
πŸ“– TextGrad: Automatic β€œDifferentiation” via Text Required Joel
πŸ“– Building AI Agents for Autonomous Clouds: Challenges and Design Principles

Hardware

Conclusion

Nov 24
Course Wrap-Up
Anand
Dec 1
Final Project Poster Presentation
Dec 8
Final Project Report + Code Due