All 50 labs in this category
Autoregressive Generation Loop
Step through prompt → predict → append → repeat.
Open lab →Attention Score Matrix
Q · Kᵀ produces an N × N matrix of similarities.
Open lab →Backprop Chain Rule
Watch gradients flow backwards through a 3-layer net.
Open lab →Beam Search Tree
K parallel hypotheses expand and get pruned.
Open lab →BF16 vs FP16 vs FP32 — Range and Precision
Why BF16 won over FP16 for LLM training.
Open lab →Cache Blocking for Matmul
Tile the matrices so each block fits in cache.
Open lab →Cache Hits and Misses
L1/L2/L3 latency stacked up.
Open lab →Complete Transformer Block
Animate data flowing through pre-norm + attention + residual + pre-norm + FFN + residual.
Open lab →Full CPU SLM Stack — Top to Bottom
Application → engine → kernels → CPU instructions.
Open lab →Cross-Entropy Loss Surface
See how loss changes as the model's predicted probability shifts.
Open lab →DataLoader Pipeline
Workers prefetching batches into a queue.
Open lab →Dot Product Geometry (2D)
Drag vectors; see dot product, magnitude, angle.
Open lab →DPO Preference Loss
Direct preference optimization vs reward + PPO.
Open lab →Embedding Lookup as Gather
Token IDs become rows of the embedding matrix.
Open lab →End-to-End CPU SLM Recipe
Train → quantize → serve, all on CPU.
Open lab →FFN Expansion + Activation
d → d_ff → d. Two matmuls with an activation in between.
Open lab →FlashAttention Tiling
Tile attention block-by-block; keep working set in SRAM.
Open lab →Forward vs Backward FLOPs
Backward is ~2× forward. Total training ~3× forward.
Open lab →Gradient Accumulation
K micro-batches build up to an effective large batch.
Open lab →Gradient Clipping in Action
See spikes get truncated to max_norm.
Open lab →CPU Inference Latency Breakdown
Per-token time = bandwidth-bound weight reads + compute.
Open lab →KV Cache Memory Growth
Watch KV cache memory grow with context length.
Open lab →LayerNorm Statistics
Watch mean, variance, and normalized output for a tensor.
Open lab →Logits to Token (Argmax vs Sample)
See how the final projection produces logits and how decoding picks a token.
Open lab →LoRA — Low-Rank Decomposition
Replace ΔW (d×d) with A·B (d×r · r×d).
Open lab →Loss Curves Diagnosis
Healthy, spiky, divergent — what each looks like.
Open lab →LR Schedule — Warmup + Cosine
Visualize the canonical LLM training learning rate.
Open lab →Matrix Multiplication Step-Through
Watch Y = X·W computed entry-by-entry.
Open lab →CPU Training Memory Calculator
Adjust model size; see RAM needed.
Open lab →MoE Top-K Routing
Tokens route to top-K experts; load balance matters.
Open lab →Multi-Token Prediction Heads
N heads predicting tokens at +1, +2, +3, +4.
Open lab →Multi-Head Split + Concat
One big projection reshapes into h heads then back.
Open lab →SGD vs Adam — Step Trajectories
Two optimizers descending the same loss surface.
Open lab →SLM Parameter Breakdown
Where the parameters live: embedding, attention, FFN.
Open lab →Perplexity Calculator
Perplexity from loss; what numbers mean.
Open lab →Positional Encoding Curves
Sinusoidal at different dimensions = different frequencies.
Open lab →Q4_K Block Layout
Block of 256 weights = sub-groups × 4-bit values + scales.
Open lab →Repetition Penalty
Reduce logits of recent tokens to break loops.
Open lab →Residual Gradient Flow
With and without residuals: how gradient survives depth.
Open lab →RMSNorm vs LayerNorm — Side by Side
See the difference: RMSNorm skips mean centering.
Open lab →RoPE Extension Strategies
Linear, NTK-aware, YaRN compared.
Open lab →Sampling Strategies Compared
See how greedy/top-k/top-p differ on the same distribution.
Open lab →SIMD Register — AVX-512 + AMX
See how one instruction operates on multiple values.
Open lab →SLM Architecture Comparison
Phi-3, Qwen 2.5, Gemma 2 — hyperparams side by side.
Open lab →Softmax Numerical Stability + Temperature
See subtraction trick and temperature in action.
Open lab →Speculative Decoding Acceptance
Watch draft tokens get accepted or rejected.
Open lab →Tied Embeddings Savings
Untied vs tied params for SLMs.
Open lab →Tokenizer Compression Comparison
Same text, different tokenizers.
Open lab →Weight Initialization Distributions
Xavier, Kaiming, normal — visualized.
Open lab →Weight Layout on Disk (GGUF / SafeTensors)
See how tensors are arranged in a binary model file.
Open lab →