Efficient Training of Large Language Models: From Basics to Fine-Tuning

We had our first class on October 20. Class attendees who are interested to register are requested to mail at the group mail ID (as announced in class) so that we can register them for the course on Moodle. Registration closes on Wednesday (22nd October). We can accomodate 10 seats for the course.

Large Language Models (LLMs) have been revolutionary across several domains, but their training remains computationally expensive and resource-intensive. This seminar explores the principles, methods, and system-level strategies for training LLMs efficiently.
We begin with a conceptual introduction to the Transformer architecture, the backbone of modern LLMs, covering its key components. We then walk through the phases of model training—from data preparation and pretraining to fine-tuning highlighting the computational trade-offs at each stage.
Next, we emphasize on system-level optimization techniques, including GPU architectures, FLOPs analysis, data and model parallelism. We also cover efficient fine-tuning methods, such as parameter-efficient tuning (e.g., LoRA, Prefix-tuning, Prompt-tuning).
By the end of this seminar, participants will have an understanding of how to design, implement, and optimize training pipelines for large-scale models — a skill set increasingly critical across many domains.

This seminar course is open to senior undergraduate and graduate students. A background on coding and machine learning is required to participate in this course. Students must have successfully completed two or more of the following courses, or their equivalent:

The students should also have a good level of familiarity with deep learning frameworks like PyTorch and libraries like HuggingFace’ Transformers. Additionally, this course also requires some level of research maturity since we will be going over the concepts deeply and would expect students to think over them, criticize, and identify possible research directions in the space of developing efficient training methods for LLMs.

The course will be taught in English and will be held both in-person and online mode. The details of online platform will be provided later. All lectures, office hours, presentations, and communication will be conducted in English.

We will use Moodle for all course-related discussions. We wish for this class to be an interactive and a fun learning experience for everyone and thus encourage you to participate in discussions both in-class and on Moodle.

The seminar takes place weekly every Monday from 16:00 until 18:00 at MPI-SWS building E1 5 room 029, starting on October 20, 2025.

The goal of the course is to give you both conceptual knowledge and hands-on experience about various techniques that allow for efficient training and finetuning of large language models. The course will have the following components:

Lectures and Hands-On - led by the instructors and TAs
Paper reports and presentations - led by the students
Projects - Mini project and Main Project - led by the students.

You are encouraged to actively participate in class discussion as your participation will make up part of your grade.

The projects (both mini and main) will count for 50% of the grade, the paper reports and presentations will count for 30%, and class participation will count for the remaining 20% (including attendance and discussions, asking questions during lectures and paper presentations). To pass the seminar, a student must receive at least 50% of the maximum possible points for each of the above.

Introduction to building blocks of transformers

Components of transformer architecture

Phases of training of LLMs

Pre-training and Fine-tuning

Hands-on Session
Paper presentation - Reading Material (on Moodle)
Efficient training strategies

System Level Optimisation

GPU and FLOPs
Parallelism

Model Level Optimisation

Low Rank Adaptation- LoRA
Prefix tuning and Prompt tuning

Hands-on Session
Paper presentation - Reading Material (TBD)
Main Project Presentation

Each lecture will be taught in class and reading materials will be uploaded. Thereafter, everyone in the class is expected to submit a report by forming a group of 2 and choosing one paper from the reading materials. The students will be asked to given preferences, after which we will be assigning the papers. The report should cover the following points:

Paper summaries (short, 5-6 sentences)
Three strengths of each of the 2 papers
Three weaknesses of each of the 2 papers
At least one way in which each of the 2 papers could be improved
At least one way in which each of the 2 papers could be extended

The reports must be submitted on Moodle by the deadline mentioned in the class.

Each paper presentation will have a set of 2 presenters who would be making a 10 minute long presentation and the other 10 minutes will be for QnA. The presentations should cover atleast:

A comprehensive explanation of the topic.
Assessment of strengths and weaknesses of the presented approach

The main project presentation can be in either one of the modes:

Customized - We will be providing you with a well defined project. Details will be provided on Moodle.
Exploratory - Implementing a new efficient training mechanism inspired from the existing methods. One must motivate the newly developed algorithm by critiquing the existing methods in details. If it’s convincing, we shall try to formalise this mode to a research paper.

In order to get a hands-on experience, we will be holding hands-on sessions where the students will be made acquainted with tips and tricks on efficiently training large language models. Post the hands-on session, we will be releasing mini-projects that you will be implementing. The details of what to implement will be released on Moodle. You will be required to submit a 1-2 page report detailing the experiment you ran along with the results obtained by the deadline.

Lectures will be held in building E1 5, room 029, every Monday from 16:00 -- 18:00. Following is a tentative schedule, you will be informed if there are any changes.

Introduction -- 20/10 , 27/10

Course logistics
Components of transformer block and training workflow

Phases of training LLMs -- 03/11

Pre-training and Fine-tuning

Hands-on Session -- 10/11
Paper Presentation -- 17/11

Reading Material (up on Moodle)

Efficient training mechanisms -- Part I

System Level Optimisation -- 24/11 - 15/12

GPUs and FLOPs -- 24/11 , 01/12
Parallelism -- 08/12 , 15/12

Efficient training mechanisms -- Part II

Model Level Optimisation -- 05/01

LoRA, Prefix-tuning, and Prompt-tuning

Hands-on Session -- 12/01
In-class Presentations -- 19/01

Reading Material (TBD)

Final Presentation -- 26/01

Efficient Training of Large Language Models: From Basics to Fine-Tuning

Announcements

Description

Intended Audience / Prerequisites

Logistics

Seminar Structure

Seminar Format

Seminar Schedule