Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Share:

Similar Tracks

Fine-tuning Whisper to learn my Chinese dialect (Teochew) Efficient NLP

Optimize Your AI - Quantization Explained Matt Williams

Model Distillation: Same LLM Power but 3240x Smaller Adam Lucek

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training Umar Jamil

Knowledge Distillation: How LLMs train each other Julia Turc

Speculative Decoding: When Two LLMs are Faster than One Efficient NLP

Why Does Diffusion Work Better than Auto-Regression? Algorithmic Simplicity

The Most Accurate Speech-to-text APIs in 2025 Efficient NLP

Speech LLMs: Models that listen and talk back Efficient NLP

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ) Maarten Grootendorst

Rotary Positional Embeddings: Combining Absolute and Relative Efficient NLP

Training LLM to play chess using Deepseek GRPO reinforcement learning Efficient NLP

Residual Vector Quantization for Audio and Speech Embeddings Efficient NLP

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA PyTorch

A better Hugging Face model search with OpenAI, RAG, pgvector Efficient NLP

Compressing Large Language Models (LLMs) | w/ Python Code Shaw Talebi

Exploring the 24 Areas of Natural Language Processing Research Efficient NLP

Reinforcement Learning, by the Book Mutual Information