Where MPU sits in the accelerator spectrum.
Two axes determine MPU's positioning: the memory layer (HBM / SRAM / DDR) and where scheduling happens (runtime vs. compile-time). These define who MPU competes with — and who it doesn't.

GPU
(NVIDIA)
HBM (3–8+ TB/s)
Runtime dynamic
+ General purpose, mature CUDA, high batch throughput
— Cost/power, collective overhead at scale
LPU
(Groq)
Large on-chip SRAM
Compile-time fixed
+ Ultra-low latency, deterministic single-stream
— SRAM capacity limited, expensive for large models
HC1
(Taalas)
Mask-ROM hard-wired
Fixed-function
+ Very high throughput for single model
— Single model only, no generality
MPU
(Symatics)
DDR7* / structural ingress
Compile-time structural
+ Sparse/MoE, long-context, multi-agent — best when weight reuse is high, not first load
— Pre-silicon, near-zero ecosystem
MPU's nearest "living relative" is Groq LPU, not Google TPU
Groq has already shipped in production what MPU pursues: deterministic execution, compile-time scheduling, SRAM-resident state, ultra-low-latency single-stream. The real difference is the memory layer — MPU uses DDR instead of SRAM (Groq needs hundreds of chips to fit large models in SRAM).
*TPU = compiler schedules data into a fixed, HBM-fed array. MPU = the array's wiring is the schedule, DDR-fed.
Full specification matrix
| DIMENSION | GPU | LPU | TPU | HC1 | MPU |
|---|---|---|---|---|---|
| Core idea | Massive parallel general-purpose | Deterministic single-stream | Systolic matrix engine | Hard-wired single model | Structure-as-Computation |
| Compute unit | 1000s CUDA/Tensor cores | Large systolic array | Systolic MXU array | Fixed-function MAC | Homogeneous PE array |
| Scheduling | Runtime dynamic | Compile-time fixed | Compile-plan / run-execute | Fixed at manufacture | Compile-time structural |
| Memory | HBM (3–8 TB/s) | Large on-chip SRAM | HBM | Mask-ROM | DDR7 + structural ingress |
| Interconnect | NVLink + collectives | Chip-to-chip mesh | ICI / optical | Single chip | Extend + domain |
| Strength | General, CUDA, batch | Ultra-low latency | Google throughput/$ | Very high (one model) | Sparse/MoE, long-context |
| Programmability | Highest (full CUDA) | Medium | Medium (XLA/JAX) | None | Lowest (compile to structure) |
| Maturity | Production | Production | Production (Google) | Silicon (8B) | Pre-silicon / Gen1 |
Not a general-purpose switch
The MPU Switch is the physical carrier of SIF inside one domain — a single hub performing per-row broadcast. Its low aggregate bandwidth (512 GB/s vs. NVSwitch's multi-Tbps) is a design conclusion, not under-provisioning: weights don't cross SIF, and the state that does is bounded.
Address/ID routing, credits, QoS, in-network compute
NVLink routing, SHARP reduction, credit flow
No routing, no flow control, no in-network compute
The impossible triangle
Structure-as-computation × Generality × No-HBM = pick two. Product value lies in how we choose the middle ground.
MPU picks structural computing + no-HBM, accepts generality constraints (compile-time fixed structure), and wins on MoE/sparse workloads where the per-token byte advantage is largest.