Schedule

The schedule is still being finalized and is subject to major changes.

Given the pace of innovations in this area, the following list is subject to change.

Color Legend: Presenter Reviewer Scriber

Jump to the current module

Introduction

Aug 18
Course Introduction
Anand
Paper Presentation Preferences Fill out the form here
Aug 20
Topics, Challenges & Tips
Anand
πŸ“– How to Read a Paper
πŸ“– How to Give a Bad Talk
πŸ“– Writing Reviews for Systems Conferences
πŸ“– Challenges and Applications of Large Language Models
πŸ“– An Open Source Stack for AI Compute
Aug 22
Paper Presentation Preferences Due

Basics & Project

Pre-training

Post-Training

Inference

Oct 13
Single Instance Serving
πŸ“– NanoFlow: Towards Optimal Large Language Model Serving Throughput Required
πŸ“– DistServe: Disaggregating Prefill and Decoding for Goodput-Optimized Large Language Model Serving Required
πŸ“– Improving DNN Inference Throughput Using Practical, Per-Input Compute Adaptation
πŸ“– Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
πŸ“– Orca: A Distributed Serving System for Transformer-Based Generative Models
πŸ“– Efficient Memory Management for Large Language Model Serving with PagedAttention
Oct 15
Multi Instance Serving
πŸ“– Llumnix: Dynamic Scheduling for Large Language Model Serving Required
πŸ“– Mooncake: Trading More Storage for Less Computation β€” A KVCache-centric Architecture for Serving LLM Chatbot Required
πŸ“– BlitzScale: Fast and Live Large Model Autoscaling with O(1) Host Caching
πŸ“– AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Oct 20
Diffusion
πŸ“– Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models Required
πŸ“– DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling Required
Oct 22
Multimodality
πŸ“– ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving Required
πŸ“– DREAM: A Dynamic Scheduler for Dynamic Real-Time Multi-Model ML Workloads Required
Oct 27
Multimodality - Vision I
πŸ“– LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models Required
πŸ“– A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
πŸ“– An Image is Worth 1/2 Tokens After Layer 2: Plug-and-PLay Acceleration for VLLM Inference Required
πŸ“– Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
πŸ“– [CLS] Token Tells Everything Needed for Training-free Efficient MLLMs
Oct 29
Multimodality - Vision II
πŸ“– MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Required
πŸ“– ViLA: Efficient Video-Language Alignment for Video Question Answering
πŸ“– M-LLM Based Video Frame Selection for Efficient Video Understanding Required

Agentic Systems

Nov 3
Workflow Optimization I
πŸ“– Parrot: Efficient Serving of LLM-based Applications with Semantic Variable Required
πŸ“– Cognify: Supercharging Gen-AI Workflows With Hierarchical Autotuning Required
Nov 5
Workflow Optimization II
πŸ“– DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving Required
πŸ“– Towards End-to-End Optimization of LLM-based Applications with Ayo Required
πŸ“– Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Nov 10
RAGs
πŸ“– METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation Required
πŸ“– TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
πŸ“– RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving Required
πŸ“– LEANN: A Low-Storage Vector Index
Nov 12
Reasoning
πŸ“– Efficiently Serving LLM Reasoning Programs with Certaindex Required
πŸ“– ReAct: Synergizing Reasoning and Acting in Language Models Required
Nov 17
Applications I
πŸ“– Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents Required
πŸ“– Mathematical Discoveries from Program Search with Large Language Models (FunSearch)
πŸ“– AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery Required
Nov 19
Applications II
πŸ“– NetLLM: Adapting Large Language Models for Networking Required
πŸ“– TextGrad: Automatic β€œDifferentiation” via Text Required
πŸ“– Building AI Agents for Autonomous Clouds: Challenges and Design Principles

Hardware

Conclusion

Nov 24
Course Wrap-Up
Anand
Dec 1
Final Project Poster Presentation
Dec 8
Final Project Report + Code Due