μTransfer: Tuning GPT-3 hyperparameters on one GPU | Explained by the inventor

μTransfer: Tuning GPT-3 hyperparameters on one GPU | Explained by the inventor

Share:

Similar Tracks

Greg Yang - "Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer" AutoML Seminars

But what is a neural network? | Deep learning chapter 1 3Blue1Brown

What is Low-Rank Adaptation (LoRA) | explained by the inventor Edward Hu

Attention in transformers, step-by-step | DL6 3Blue1Brown

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote Snowflake Inc.

Fine-tuning Large Language Models (LLMs) | w/ Example Code Shaw Talebi

Transformers (how LLMs work) explained visually | DL5 3Blue1Brown

How are Images Compressed? [46MB ↘↘ 4.07MB] JPEG In Depth Branch Education

An introduction to Policy Gradient methods - Deep Reinforcement Learning Arxiv Insights

Rotary Positional Embeddings: Combining Absolute and Relative Efficient NLP

LoRA explained (and a bit about precision and quantization) DeepFindr

Mixture of Experts: How LLMs Are Getting Smarter Without Getting Slower (LLaMA 4, DeepSeek) Julia Turc

Understanding GANs (Generative Adversarial Networks) | Deep Learning DeepBean

Graph Neural Networks - a perspective from the ground up Alex Foo

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (μTransfer) Aleksa Gordić - The AI Epiphany

AI Agents, Clearly Explained Jeff Su

Introduction to Generative AI Google Cloud

AI art, explained Vox

Feature Learning in Infinite-Width Neural Networks Physics Meets ML

How AI Could Save (Not Destroy) Education | Sal Khan | TED TED