LLM from scratch [Ongoing]
a project building an LLM from scratch in python
This project is a personal journey to build a Large Language Model (LLM) completely from scratch.
Overview
Inspired by and referencing the tutorial LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF, I am implementing the core components of the LLM architecture step-by-step. The goal is to gain a deep, practical understanding of how modern language models process text, learn patterns, and generate responses by coding everything myself rather than relying on high-level abstractions.
Core Objectives
- Construct the Transformer architecture (Attention, Encoders/Decoders) from the ground up.
- Understand tokenization, positional encoding, and self-attention algorithms mathematically and programmatically.
- Maintain minimal external dependencies to ensure a fundamental grasp of tensor operations.
Devlog
- 2026-03-08 : LLM from scratch - 1.3 Multi Head Self Attention
- 2026-03-08 : LLM from scratch - 1.2 Single Head Self Attention
- 2026-03-08 : LLM from scratch - 1.1 Positional Encoding