Technology thesis · Semiconductors & Chips
medium conviction growthInference-Optimized Semiconductors
Inference is now the majority of AI compute spend, and custom silicon – hyperscaler TPUs and Trainium plus Cerebras – is the real threat to Nvidia’s pricing power, not its training moat.
Position maintained continuously · last reviewed Jun 24, 2026
The thesis
Core thesis
As AI shifts from training to deployment, inference becomes the dominant workload and the largest share of compute cost over a model's lifetime. Hyperscaler custom silicon — Google TPU (Ironwood), AWS Trainium/Inferentia, Microsoft Maia, Meta MTIA — plus wafer-scale and dataflow insurgents such as Cerebras and SambaNova are the real pressure on NVIDIA's inference pricing power, not its training moat. The contest is increasingly about cost-per-token at production scale, where domain-specific designs can out-economise general-purpose GPUs.
State of the art (2026)
By mid-2026 inference, not training, is the centre of gravity in AI silicon, and the competitive picture has split. Nvidia's Blackwell (B200, roughly 2.5x H100 inference throughput) still anchors 60–75% of inference accelerators, but custom silicon is eating the margin: Google's seventh-generation Ironwood TPU underpins Anthropic's ~1GW-in-2026 commitment of around one million chips, with a further 3.5GW via Broadcom from 2027. AWS Trainium, Microsoft Maia and Meta MTIA all scale internally. Among merchant insurgents, Cerebras floated in May 2026 (CBRS, ~$5.55B raised) on the back of its OpenAI cloud deal, while Groq, SambaNova, d-Matrix and Etched chase specialised decode economics. Etched's Sohu transformer ASIC has still not shipped in volume.
Everything below is live inside CanaryIQ
The full analysis behind the verdict — the structure is real; the content unlocks when you log in.
Signal stack
Evidence stacked leading → lagging
Technology-native KPIs
Metrics that predict trajectory, tracked over time
Landscape map
Who builds what — and who depends on whom
Catalyst calendar
Dated events that will move the position
Technology roadmap
Milestones on the path to maturity
Watchlists
Companies, people and papers — each with a remove-by condition
Decision frameworks
The same call, framed for your desk
Thesis changelog
When our view changed, and why
Change our mind
2 disconfirming conditions
The rest is inside
You've read the verdict. The file is much deeper.
The full signal stack, technology-native KPIs tracked over time, the landscape of who depends on whom, the dated catalyst calendar, decision frameworks for every desk, live watchlists and the changelog of every time our call on Inference-Optimized Semiconductors has changed — all live inside CanaryIQ.