TCP throughput is bounded by (receive window) / RTT. The default receive window in old TCP was 64 KB — limiting a 100ms RTT connection to ~5 Mbps regardless of available bandwidth. Window scaling (RFC 7323) lifts this. Without proper tuning, you can't fill a fat pipe.
Bandwidth-delay product (BDP)
BDP = bandwidth × RTT. A 1 Gbps link with 100ms RTT has BDP = 12.5 MB. Your TCP window must be >= BDP to fully utilize the link. Default 64 KB doesn't even come close.
Window scaling option
TCP header allocates 16 bits for window size = 64 KB max. Window scaling option multiplies by 2^N (up to 2^14). Modern OSes negotiate scaling automatically during the SYN handshake.
Linux tuning
# Max TCP buffer sizes
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
# TCP autotuning min/default/max
sysctl -w net.ipv4.tcp_rmem='4096 87380 134217728'
sysctl -w net.ipv4.tcp_wmem='4096 65536 134217728'
# Enable BBR congestion control (better than Cubic on lossy links)
sysctl -w net.ipv4.tcp_congestion_control=bbrBBR vs Cubic
Cubic is loss-based: assumes packet loss = congestion. Fails on lossy non-congested links (WiFi, cellular). BBR is bandwidth-based: probes actual link capacity. Significantly better on long-fat networks and lossy links. Used by YouTube and Google's services.
Measuring
ss -i shows per-socket cwnd, ssthresh, RTT. tcptrace on a pcap visualizes window progression. Throughput should scale linearly with cwnd × MSS / RTT — if not, autotuning or kernel buffers are the bottleneck.