Quantization vs Pruning vs Distillation: Optimizing NNs for Inference Share: Download MP3 Similar Tracks Fine-tuning Whisper to learn my Chinese dialect (Teochew) Efficient NLP Optimize Your AI - Quantization Explained Matt Williams Model Distillation: Same LLM Power but 3240x Smaller Adam Lucek Attention is all you need (Transformer) - Model explanation (including math), Inference and Training Umar Jamil Knowledge Distillation: How LLMs train each other Julia Turc Speculative Decoding: When Two LLMs are Faster than One Efficient NLP Why Does Diffusion Work Better than Auto-Regression? Algorithmic Simplicity The Most Accurate Speech-to-text APIs in 2025 Efficient NLP Speech LLMs: Models that listen and talk back Efficient NLP Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ) Maarten Grootendorst Rotary Positional Embeddings: Combining Absolute and Relative Efficient NLP Training LLM to play chess using Deepseek GRPO reinforcement learning Efficient NLP Residual Vector Quantization for Audio and Speech Embeddings Efficient NLP Understanding the LLM Inference Workload - Mark Moyou, NVIDIA PyTorch A better Hugging Face model search with OpenAI, RAG, pgvector Efficient NLP Compressing Large Language Models (LLMs) | w/ Python Code Shaw Talebi Exploring the 24 Areas of Natural Language Processing Research Efficient NLP Reinforcement Learning, by the Book Mutual Information