Large Language Models (LLMs) have been revolutionary across several domains, but their training remains computationally expensive and resource-intensive. This seminar explores the principles, methods, and system-level strategies for training LLMs efficiently.
We begin with a conceptual introduction to the Transformer architecture, the backbone of modern LLMs, covering its key components. We then walk through the phases of model training—from data preparation and pretraining to fine-tuning highlighting the computational trade-offs at each stage.
Next, we emphasize on system-level optimization techniques, including GPU architectures, FLOPs analysis, data and model parallelism. We also cover efficient fine-tuning methods, such as parameter-efficient tuning (e.g., LoRA, Prefix-tuning, Prompt-tuning).
By the end of this seminar, participants will have an understanding of how to design, implement, and optimize training pipelines for large-scale models — a skill set increasingly critical across many domains.
This seminar course is open to senior undergraduate and graduate students. A background on coding and machine learning is required to participate in this course. Students must have successfully completed two or more of the following courses, or their equivalent:
The course will be taught in English and will be held both in-person and online mode. The details of online platform will be provided later. All lectures, office hours, presentations, and communication will be conducted in English.
We will use Moodle for all course-related discussions. We wish for this class to be an interactive and a fun learning experience for everyone and thus encourage you to participate in discussions both in-class and on Moodle.
The seminar takes place weekly every Monday from 16:00 until 18:00 at MPI-SWS building E1 5 room 029, starting on October 20, 2025.
The goal of the course is to give you both conceptual knowledge and hands-on experience about various techniques that allow for efficient training and finetuning of large language models. The course will have the following components:
The mini-assignments and reports (A+R) will count for 50% of the grade, the presentations (IP+FP) will count for 30% (15% for in-class presentations and 15% for the final presentation), and participation will count for the remaining 20% (including student attendance and class participation). To pass the seminar, a student must receive at least 50% of the maximum possible points for the assignments+reports, presentations, and participation.
Each lecture (L) will be taught in class and reading materials will be uploaded.
Thereafter, everyone in the class is expected to submit a report (R) by choosing two papers from the reading materials. The report should cover the following points:
Each in-class presentation (IP) will have a set of presenters who would be making a 20 minute long presentation. The presentations will be assigned based on student preferences. After the first lecture (October 20), students will be asked to state their paper preferences. The presentations will be then assigned based on student preferences following First-Come-First-Serve (FCFS) method. The presentations should cover atleast:
The final presentation (FP) can be in either one of the modes:
In order to get a hands-on experience, we will be holding hands-on sessions where the students will be made acquainted with tips and tricks on efficiently training large language models. Post the hands-on session, we will be releasing mini-assignments (A) that you will be implementing. The details of what to implement will be released on Moodle. You will be required to submit a 1-2 page report detailing the experiment you ran along with the results obtained by the deadline.
Lectures will be held in building E1 5, room 029, every Monday from 16:00 -- 18:00. Following is a tentative schedule, you will be informed if there are any changes.