© Photo from Blog by AI Simplified

Efficient Training of Large Language Models: From Basics to Fine-Tuning

Announcements

Description

Large Language Models (LLMs) have been revolutionary across several domains, but their training remains computationally expensive and resource-intensive. This seminar explores the principles, methods, and system-level strategies for training LLMs efficiently.
We begin with a conceptual introduction to the Transformer architecture, the backbone of modern LLMs, covering its key components. We then walk through the phases of model training—from data preparation and pretraining to fine-tuning highlighting the computational trade-offs at each stage.
Next, we emphasize on system-level optimization techniques, including GPU architectures, FLOPs analysis, data and model parallelism. We also cover efficient fine-tuning methods, such as parameter-efficient tuning (e.g., LoRA, Prefix-tuning, Prompt-tuning).
By the end of this seminar, participants will have an understanding of how to design, implement, and optimize training pipelines for large-scale models — a skill set increasingly critical across many domains.

Intended Audience / Prerequisites

This seminar course is open to senior undergraduate and graduate students. A background on coding and machine learning is required to participate in this course. Students must have successfully completed two or more of the following courses, or their equivalent:

The students should also have a good level of familiarity with deep learning frameworks like PyTorch and libraries like HuggingFace’ Transformers. Additionally, this course also requires some level of research maturity since we will be going over the concepts deeply and would expect students to think over them, criticize, and identify possible research directions in the space of developing efficient training methods for LLMs.

The course will be taught in English and will be held both in-person and online mode. The details of online platform will be provided later. All lectures, office hours, presentations, and communication will be conducted in English.

Logistics

Instructors

Teaching Assistants (TA)


Contact Details

We will use Moodle for all course-related discussions. We wish for this class to be an interactive and a fun learning experience for everyone and thus encourage you to participate in discussions both in-class and on Moodle.


Seminar Timing

The seminar takes place weekly every Monday from 16:00 until 18:00 at MPI-SWS building E1 5 room 029, starting on October 20, 2025.

Seminar Structure

Seminar Content

The goal of the course is to give you both conceptual knowledge and hands-on experience about various techniques that allow for efficient training and finetuning of large language models. The course will have the following components:

  • Lectures and Hands-On on important concepts (L+H) - led by the instructors and TAs
  • Presentations - In-class presentations (IP) and Final presentations (FP) led by the students
  • Mini-assignments and reports (A+R) - submitted by the students.
You are encouraged to actively participate in class discussion as your participation will make up part of your grade.


Seminar Grading

The mini-assignments and reports (A+R) will count for 50% of the grade, the presentations (IP+FP) will count for 30% (15% for in-class presentations and 15% for the final presentation), and participation will count for the remaining 20% (including student attendance and class participation). To pass the seminar, a student must receive at least 50% of the maximum possible points for the assignments+reports, presentations, and participation.


Seminar ToC

  • Introduction to building blocks of transformers
    • Components of transformer architecture (L)
  • Phases of training of LLMs
    • Pre-training and Fine-tuning (L)
    • In-class presentation (IP) - Reading Material (TBD)
  • Hands-on Session (H)
  • Efficient training strategies
    • System Level Optimisation
      • GPU and FLOPs (L)
      • Parallelism (L)
    • In-class presentation (IP) - Reading Material (TBD)
    • Model Level Optimisation
      • Low Rank Adaptation- LoRA (L)
      • Prefix tuning and Prompt tuning (L)
  • Hands-on Session (H)
  • Final Presentation (FP)

Seminar Format

Lecture Format

Each lecture (L) will be taught in class and reading materials will be uploaded. Thereafter, everyone in the class is expected to submit a report (R) by choosing two papers from the reading materials. The report should cover the following points:

  • Paper summaries (short, 5-6 sentences)
  • Three strengths of each of the 2 papers
  • Three weaknesses of each of the 2 papers
  • At least one way in which each of the 2 papers could be improved
  • At least one way in which each of the 2 papers could be extended
The reports must be submitted on Moodle before the date of presentation for that module.


Presentation Format

Each in-class presentation (IP) will have a set of presenters who would be making a 20 minute long presentation. The presentations will be assigned based on student preferences. After the first lecture (October 20), students will be asked to state their paper preferences. The presentations will be then assigned based on student preferences following First-Come-First-Serve (FCFS) method. The presentations should cover atleast:

  1. A comprehensive explanation of the topic.
  2. Assessment of strengths and weaknesses of the presented approach


The final presentation (FP) can be in either one of the modes:

  • Literature review of efficient training mechanisms (published from 2024-present) that might use the methods taught in the class. This will involve discussing and critiquing the works, and discussing on what could have been done to improve the drawbacks.
  • Implementing a new efficient training mechanism inspired from the existing methods. One must motivate the newly developed algorithm by critiquing the existing methods in details. If it’s convincing, we shall try to formalise this mode to a research paper.

Mini Assignments

In order to get a hands-on experience, we will be holding hands-on sessions where the students will be made acquainted with tips and tricks on efficiently training large language models. Post the hands-on session, we will be releasing mini-assignments (A) that you will be implementing. The details of what to implement will be released on Moodle. You will be required to submit a 1-2 page report detailing the experiment you ran along with the results obtained by the deadline.

Seminar Schedule

Lectures will be held in building E1 5, room 029, every Monday from 16:00 -- 18:00. Following is a tentative schedule, you will be informed if there are any changes.

  • Introduction -- 20/10 , 27/10
    • Course logistics
    • Components of transformer block and training workflow
  • Phases of training LLMs -- 03/11
    • Pre-training and Fine-tuning
  • In-class Presentations -- 10/11
    • Reading Material (TBD)
  • Hands-on Session -- 17/11
  • Efficient training mechanisms -- Part I
    • System Level Optimisation -- 24/11 - 15/12
      • GPUs and FLOPs -- 24/11 , 01/12
      • Parallelism -- 08/12 , 15/12
  • In-class Presentations -- 05/01
    • Reading Material (TBD)
  • Efficient training mechanisms -- Part II
    • Model Level Optimisation -- 12/01
      • LoRA, Prefix-tuning, and Prompt-tuning
  • Hands-on Session -- 19/01
  • Final Presentation -- 26/01