Seminar/Proseminar: Large Language Models
This seminar provides an exploration of large language models (LLMs), covering both foundational concepts and the latest advancements in the field. Participants will gain a comprehensive understanding of the architecture, training, and applications of LLMs, based on seminal research papers. The course will be organised as a journal club: students present individual papers, which are then discussed in the group to make sure we all get the ideas presented.
### Potential Topics
- Neural networks and deep learning basics
- Sequence modeling and RNNs (Recurrent Neural Networks)
- Vaswani et al.'s "Attention is All You Need" paper
- Self-attention mechanism
- Multi-head attention and positional encoding
- GPT-1: Radford et al.'s pioneering work
- GPT-2: Scaling and implications
- GPT-3: Architectural advancements and few-shot learning
- BERT (Bidirectional Encoder Representations from Transformers)
- T5 (Text-To-Text Transfer Transformer)
- DistilBERT and efficiency improvements
- Mamba:l and other SSMs: Design principles and performance
- Flash Attention et al: Improving efficiency and scalability
- Training regimes and resource requirements
- Fine-tuning and transfer learning
- Emergence of new capabilities
(19334617)
Dozent/in | Prof. Dr. Tim Landgraf |
---|---|
Institution | Dahlem Center for Machine Learning and Robotics |
Leistungspunkte | 5 |
Raum | Arnimallee 6 Seminarraum 007/008 |
Beginn | 14.10.2024 | 10:00 |
Ende | 10.02.2025 | 12:00 |
Links auf Kursbeschreibung
The course emphasizes critical thinking and effective communication skills. Students are expected to actively engage in discussions, critique the methodologies and conclusions of papers, and consider the broader implications of the research.
Prerequisites: An understanding of basic reinforcement learning principles and algorithms, as well as a general familiarity with machine learning concepts. Proficiency in reading and understanding machine learning research papers is strongly recommended.