LLM from scratch [Ongoing]

This project is a personal journey to build a Large Language Model (LLM) completely from scratch.

Overview

Inspired by and referencing the tutorial LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF, I am implementing the core components of the LLM architecture step-by-step. The goal is to gain a deep, practical understanding of how modern language models process text, learn patterns, and generate responses by coding everything myself rather than relying on high-level abstractions.

Core Objectives

Construct the Transformer architecture (Attention, Encoders/Decoders) from the ground up.
Understand tokenization, positional encoding, and self-attention algorithms mathematically and programmatically.
Maintain minimal external dependencies to ensure a fundamental grasp of tensor operations.

Devlog

2026-03-08 : LLM from scratch - 1.3 Multi Head Self Attention
2026-03-08 : LLM from scratch - 1.2 Single Head Self Attention
2026-03-08 : LLM from scratch - 1.1 Positional Encoding