Accelerator Landscape

Where MPU sits in the accelerator spectrum.

Two axes determine MPU's positioning: the memory layer (HBM / SRAM / DDR) and where scheduling happens (runtime vs. compile-time). These define who MPU competes with — and who it doesn't.

Accelerator spectrum diagram — GPU, LPU, HC1, MPU positioning

Production · dominant

GPU
(NVIDIA)

MEMORY

HBM (3–8+ TB/s)

SCHEDULING

Runtime dynamic

+ General purpose, mature CUDA, high batch throughput

— Cost/power, collective overhead at scale

Production (inference cloud)

LPU
(Groq)

MEMORY

Large on-chip SRAM

SCHEDULING

Compile-time fixed

+ Ultra-low latency, deterministic single-stream

— SRAM capacity limited, expensive for large models

Silicon (8B hard-wired)

HC1
(Taalas)

MEMORY

Mask-ROM hard-wired

SCHEDULING

Fixed-function

+ Very high throughput for single model

— Single model only, no generality

Gen1 / pre-silicon

MPU
(Symatics)

MEMORY

DDR7* / structural ingress

SCHEDULING

Compile-time structural

+ Sparse/MoE, long-context, multi-agent — best when weight reuse is high, not first load

— Pre-silicon, near-zero ecosystem

MPU's nearest "living relative" is Groq LPU, not Google TPU

Groq has already shipped in production what MPU pursues: deterministic execution, compile-time scheduling, SRAM-resident state, ultra-low-latency single-stream. The real difference is the memory layer — MPU uses DDR instead of SRAM (Groq needs hundreds of chips to fit large models in SRAM).

*TPU = compiler schedules data into a fixed, HBM-fed array. MPU = the array's wiring is the schedule, DDR-fed.

Architecture-Class Comparison

Full specification matrix

DIMENSION	GPU	LPU	TPU	HC1	MPU
Core idea	Massive parallel general-purpose	Deterministic single-stream	Systolic matrix engine	Hard-wired single model	Structure-as-Computation
Compute unit	1000s CUDA/Tensor cores	Large systolic array	Systolic MXU array	Fixed-function MAC	Homogeneous PE array
Scheduling	Runtime dynamic	Compile-time fixed	Compile-plan / run-execute	Fixed at manufacture	Compile-time structural
Memory	HBM (3–8 TB/s)	Large on-chip SRAM	HBM	Mask-ROM	DDR7 + structural ingress
Interconnect	NVLink + collectives	Chip-to-chip mesh	ICI / optical	Single chip	Extend + domain
Strength	General, CUDA, batch	Ultra-low latency	Google throughput/$	Very high (one model)	Sparse/MoE, long-context
Programmability	Highest (full CUDA)	Medium	Medium (XLA/JAX)	None	Lowest (compile to structure)
Maturity	Production	Production	Production (Google)	Silicon (8B)	Pre-silicon / Gen1

Not a general-purpose switch

The MPU Switch is the physical carrier of SIF inside one domain — a single hub performing per-row broadcast. Its low aggregate bandwidth (512 GB/s vs. NVSwitch's multi-Tbps) is a design conclusion, not under-provisioning: weights don't cross SIF, and the state that does is bounded.

CXL Switch

Multi-Tbps

Address/ID routing, credits, QoS, in-network compute

NVSwitch 3.0

12.8 Tbps

NVLink routing, SHARP reduction, credit flow

MPU Switch

512 GB/s

No routing, no flow control, no in-network compute

The impossible triangle

Structure-as-computation × Generality × No-HBM = pick two. Product value lies in how we choose the middle ground.

KEEP

Structure-as-Computation

CONSTRAIN

Generality

KEEP

No HBM

MPU picks structural computing + no-HBM, accepts generality constraints (compile-time fixed structure), and wins on MoE/sparse workloads where the per-token byte advantage is largest.

See how performance numbers are derived

View Performance Contact us

Where MPU sits in the accelerator spectrum.

GPU(NVIDIA)

LPU(Groq)

HC1(Taalas)

MPU(Symatics)

MPU's nearest "living relative" is Groq LPU, not Google TPU

Full specification matrix

Not a general-purpose switch

The impossible triangle

See how performance numbers are derived

GPU
(NVIDIA)

LPU
(Groq)

HC1
(Taalas)

MPU
(Symatics)