Thumbnails do more for video engagement than most algorithmic improvements. Generating them at scale (millions/day) requires careful frame extraction, smart-crop, and CDN strategy — and most teams underinvest until quality regression incidents force the conversation.
Frame extraction strategy
Don't extract from the first frame (often black/intro). Sample at 10%, 25%, 50%, 75% of duration; score by ML model (face detection, aesthetic score); pick top. ~50ms/video on GPU with ffmpeg + ML.
Smart-crop for variable aspect ratios
Source is 16:9; need 1:1 for feed, 9:16 for mobile, 21:9 for hero. Use ML saliency detection to crop around faces/main subject. Tools: open_clip + saliency model, AWS Rekognition, GCP Vision.
CDN delivery and cache
Generate WebP + AVIF for modern clients, JPEG fallback. Serve from CDN with long cache, version in URL. Lazy-generate on first request, async-warm popular videos. ~95% cache hit rate sustainable.