Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Share:

Similar Tracks

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU Umar Jamil

Let's build GPT: from scratch, in code, spelled out. Andrej Karpathy

Coding Stable Diffusion from scratch in PyTorch Umar Jamil

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training Umar Jamil

Llama 4 From Scratch in PyTorch - Vision Language Models + MoE Priyam Mazumdar

MCP vs API: Simplifying AI Agent Integration with External Data IBM Technology

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training Umar Jamil

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch Umar Jamil

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer Umar Jamil

Rotary Positional Embeddings: Combining Absolute and Relative Efficient NLP

But what is quantum computing? (Grover's Algorithm) 3Blue1Brown

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer freeCodeCamp.org

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation Umar Jamil

Variational Autoencoder - Model, ELBO, loss function and maths explained easily! Umar Jamil

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference. Umar Jamil

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs DeepLearning Hero