A working engineer's library: 469 in-depth articles paired with 135 single-file interactive labs on transformers, distributed systems, Apache Cassandra, agent protocols, networking, security, and real-time media. All static HTML, all yours to read offline.
A depth-first walk through how autoregressive transformers actually work — from linear algebra to CPU inference kernels.
Every article is self-contained, uses concrete examples and numbers, and avoids marketing fluff. Click any category to browse its index. 469 articles across 25 categories.
Linear algebra, attention math, training, CPU inference, weight storage, SLM lifecycle.
Read series →LLM architectures, inference servers, sampling, prompt caching, observability.
Browse →Attention variants, RoPE, MoE, RMSNorm, MTP, sparse attention, FlashAttention.
Browse →RAG, hallucination mitigation, embeddings, evals, prompt engineering, agentic safety.
Browse →Tool use, agentic workflows, memory systems, observability, human handoff.
Browse →Phi, Qwen, Gemma; on-device inference; distillation; tool-call fine-tunes.
Browse →INT8/INT4, GGUF, AWQ, GPTQ, SmoothQuant, FP8 KV cache.
Browse →Model Context Protocol — server design, OAuth 2.1, transports, testing patterns.
Browse →Agent Development Kit — lifecycle, tools, evals, guardrails, sessions.
Browse →Agent-to-agent — discovery, trust models, orchestration, error recovery.
Browse →Agent Payment Protocol — subscriptions, settlement, chargebacks, compliance.
Browse →Data modeling, consistency, compaction, repair, vector search 5, multi-DC.
Browse →Raft, consensus, CRDTs, vector clocks, BFT, quorum systems, 2PC.
Browse →Rate limiters, circuit breakers, sharding, real-system blueprints.
Browse →Postgres internals, isolation levels, replication, pgvector, DuckDB, CDC.
Browse →Kafka, Flink, Pulsar, exactly-once, CDC, tiered storage, backpressure.
Browse →Virtual threads, async/await, work-stealing, lock contention, memory models.
Browse →OpenTelemetry, Prometheus at scale, SLOs, eBPF, alert fatigue.
Browse →mTLS, OAuth+PKCE, passkeys, SBOM, zero trust, K8s pod security.
Browse →TLS 1.3, DNS, BBR, anycast, mTLS rotation, service mesh patterns.
Browse →gRPC, WebSocket, HTTP/2, HTTP/3, SSE, reconnect strategies, HOL blocking.
Browse →Opus, jitter buffers, VAD, AEC, LUFS targets, WebRTC pipeline.
Browse →HLS, DASH, AV1, ABR strategies, bitrate ladders, DRM, VMAF.
Browse →How to design APIs, run postmortems, estimate, pick brokers, review PRs.
Browse →Miscellaneous engineering topics that don't fit cleanly elsewhere.
Browse →Each lab is one HTML file with embedded JavaScript. Open it in a browser and play. Every lab has interactive controls, a clear ★ Key Takeaway, and a ▶ What to Try guide.
From matrix multiplication step-through to KV cache memory growth and CPU inference latency — each lab visualizes one concept from the article series.
Three good starting points if you're new to the site. Each one opens onto dozens more.
The four primitives every transformer is built on: vectors, matrices, dot products, element-wise ops. Read this first to understand everything else in the transformer math series.
Read article → CASSANDRAQuery-first design and the partition-key contract — the most consequential decision in any Cassandra schema.
Read article → DISTRIBUTEDLeader election, log replication, and why Raft is easier than Paxos — without the formal correctness proofs.
Read article →
Belgavi's AI Lab is a personal initiative by Sandeep Belgavi Ashok Kumar, a Senior Engineering Manager and Architect working at the intersection of scalable distributed systems and modern AI.
The site captures deep technical knowledge in two complementary forms: focused written explanations and interactive single-file labs that make the underlying mechanics visible. Topics span LLM internals, agent protocols (MCP, A2A, ADK, AP2), Apache Cassandra, distributed consensus, system design, networking, security, observability, and the realtime media stack (WebRTC, HLS, DASH).
Every article aims for concrete numbers and working examples over marketing claims. Every lab runs in a browser — no install, no build step, no JavaScript framework dependency.