LLM from scratch [Ongoing]

a project building an LLM from scratch in python

This project is a personal journey to build a Large Language Model (LLM) completely from scratch.

Overview

Inspired by and referencing the tutorial LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF, I am implementing the core components of the LLM architecture step-by-step. The goal is to gain a deep, practical understanding of how modern language models process text, learn patterns, and generate responses by coding everything myself rather than relying on high-level abstractions.

Core Objectives

  • Construct the Transformer architecture (Attention, Encoders/Decoders) from the ground up.
  • Understand tokenization, positional encoding, and self-attention algorithms mathematically and programmatically.
  • Maintain minimal external dependencies to ensure a fundamental grasp of tensor operations.

Devlog