Partner Portal

NDA-protected technical specifications, detailed architecture parameters, and validation documentation for ecosystem partners.

Password

Don't have access? Contact info@symaticslab.com

Partner Portal

Technical Specifications

Physical Design Parameters Memory Model Domain Composition SIF Hub & Latency SIF Control & Schedule Scale-Out & SuperPod Performance: KV Budget Gen1/Gen2 Deep Spec Open Decisions SIF Flit & Protocol Validation Deliverables

Physical Design Parameters

Process Node	TSMC N7	~76 MTr/mm², logic + SRAM mix
Die Dimensions	~20.5 × 21.0 mm	84 mm perimeter, within reticle limit
Clock Frequency	1 GHz	0.5 mm wire length per cycle
CU Array	256 cols × 128 rows	Central ~85% of die area
CU Microarchitecture	Cache(12KB I-SRAM)+R+PE+W	Full 16-bit FP16 data path; ~2–3K instructions/CU
On-Chip SRAM	384 MB total	12 KB I-SRAM × 32,768 CUs; ~0.89 MB/mm²
Operator Latency	MAC/×/+/= 1 cycle	Divide = 4 cycles; exp = 50 cycles
Pins	~4,000	~1,000 signal + ~3,000 P/G (3:1 ratio)
External Memory	4× GDDR7	~192 GB/s/chip raw bandwidth
TDP	~250–350 W	~60–80 W/cm², standard cold plate
SerDes Lanes	256 per chip	16 Gbps/lane NRZ (= 16-bit FP16 @ 1 GHz)
V-EXT Port	16 lanes	32 GB/s for inter-domain rail links

Four-Phase Memory Model

LOAD

Weights and state ingress from DDR7 via CSL into column structure.

EXECUTE

Computation on the PE array. Results travel by wire, not through memory.

READOUT

Structured readout of computed results.

RELOAD

Phase-swap via dual-bank atomic switch for multi-layer models.

Domain Composition

Component	Qty	Specification
MPU Chip (+ on-die SCC)	64	N7 ~430 mm²; ~250–350 W; 32,768 CUs each
CSL (Column State Loader)	64	One per chip; Gen1 Memory Ingress Block; eta ~45%
GDDR7 Devices	256 (4/chip)	~192 GB/s/chip; ~12.3 TB/s aggregate raw
SIF Switch Hub	1	64-port; 256 lanes; ~390 mm²/N7; 1-hop < 200 ns
Host Node	1	Dispatch / serving / image load; no data-plane scheduling
BMC / Management	1 set	I2C bus + telemetry aggregation + secure boot chain

Single-Hub Dividend

The entire domain's switch silicon converges to one 64-port hub die (256-lane class ~390 mm²/N7), one tape-out, no Spine/Root split. Within the domain, a 1-hop <200 ns replaces the prior 6-level tree.

SIF Hub: Latency Budget (Mode 2 Intra-Domain)

Source MPU SerDes TX	~30–50 ns	PCS/alignment; PHY-dependent [TBD]
Hub: 64:1 select + 1:64 broadcast	~40–60 ns	3-stage on-die pipeline, retiming REG
Dest MPU SerDes RX + enqueue	~30–50 ns	To row receive bus
Total	< 200 ns	Fixed pipe depth, zero slot jitter

Hub Architecture

ArchitectureFlat 64-port hub, stateless datapath

Per-MPU Port2 lanes/direction baseline = 4 GB/s duplex

Total Lanes256 lanes @ 16 Gbps NRZ

Aggregate I/O512 GB/s (4.096 Tbps)

Broadcast Groups128 per-row groups, one source per slot

Inter-Domain64 independent peer rails, switchless

Communication Patterns

BCAST_ROW [mode 2]Hub: 64:1 select + 1:64 broadcast — 1 hop, < 200 ns

FWD_COL [mode 3]Direct peer rail, 16:1 serialized — point-to-point, no hub

UCASTBroadcast to one destination — 1 hop, < 200 ns

DISPATCH/COMBINE (MoE)Structured deterministic scatter/gather

ALLREDUCEDeleted — reduction inside CU array, not supported by design

SIF Control & Schedule Model

The hub has no run-time routing. "Routing" is a compiled schedule — the compiler fixes (source, receiver, row, phase, target window) ahead of time. The device only selects the scheduled source and broadcasts.

Slot-Table Entry

All "routing" lives here as static configuration. The row/group is implicit by lane.

src_port6 bitWhich of 64 ports sources this group this slot

mode2 bitBROADCAST / IDLE / LOOPBACK

phase_id2–3 bitWhich bank / phase this entry belongs to

Determinism Rules

+Throughput: ≤ 1 FP16 value per stream per cycle; latency = pure pipe depth
+No backpressure: rate mismatch is a config error, excluded at compile time
+Single-source: per group, per slot, exactly one source; device checks, does not arbitrate
+Source order preserved: A/B dual-register relay, depth 2
+Destination-free: initiating port carries no routing type, no destination — data + valid only

Scale-Out & SuperPod Composition

Structure Extension (Mode 3)

Lane Rate5 MHz (16-bit copper bus)

Per-Line BW80 Mbps

Total Rails64 independent

Per-Edge BW640 MB/s

Cost vs. SerDes~3,200× cheaper

Compiler ViewLarger (N×64×256) × 128 array

Domain Replication

InterconnectStandard DCN/Ethernet

ThroughputN× (linear)

Cost ScalingN× (proportional)

System ComplexityZero increase per domain

SuperPod Composition (Proposal)

An 8-domain row-of-racks chain: 8 × domain racks + 1 NET/MGMT rack + 1 CDU/power rack, ~10 racks in a row.

Full Chain

One (8×64×256)×128 super logical array

Full Replication

8 independent domains × dispatch

Hybrid

2× 4-domain or 4× 2-domain chains

MPUs512 (8 domains × 64)

Compute~33.6 PFLOPS FP16

Power~210 kW

Bisection BW640 MB/s (one edge — chain only)

Per-Domain KV Budget (Agent Era)

Model @ 2 TB Domain	KV / Token	Weights Resident	128K Context	1M Context
Llama 405B (dense GQA-8 FP16)	504 KB	810 GB	~63 GB	~2 seq/domain
DeepSeek-V3 (MoE FP8)	74 KB	671 GB	~9.2 GB	~22 seq/domain
DeepSeek-V3 (MoE+MLA FP8)	69 KB	671 GB	~8.6 GB	~24 seq/domain

Resolution: Make KV Budget Explicit

MPU does not bypass the capacity problem — it relocates it. Write "per-domain KV budget" as an explicit spec parallel to "per-token weight bytes." Each +8 GB/chip = +512 GB domain KV budget.

Defensible Claim

"While KV fits, MPU re-reads from high-bandwidth local DRAM, not spilling to slow storage" — capacity-gated. The KV-friendly regime (MoE+MLA) is exactly MPU's per-watt sweet spot.

Gen1 vs Gen2: Full Specification

Capability	Gen1	Gen2
Ingress efficiency (eta)	~45%	~90%
CSL function	Deterministic Memory Ingress Block	Prefetch, reorder, compression, bandwidth shaping, multi-domain scheduling
SIF form	FPGA prototype (reduced-scale hub)	Dedicated fan-out ASIC (full 64-port hub, 256-lane)
SIF hub	Reduced-scale FPGA hub (≥10 Gbps)	64-port ASIC hub, 16 Gbps P0 / 32 Gbps P2
Target workload	64-chip domain validation	Production MoE, long-context, agentic
Scale-out	Physical channels reserved	Enabled: structure extension + multi-domain scheduling
RAS	CRC + phase restart path working	Full RAS after link-retry vs. restart decision
Deliverable	FPGA platform + reduced RTL co-bring-up (SOW-1)	Tape-out with three-IC implementation (SOW-3)

Open Decisions (All Sources)

Items from Switch PRD, System PRD, and Gen1/Gen2 PRD that require architectural sign-off.

Sources: Switch PRD D-1..D-10 | System PRD D-S1..D-S9 | Gen1/Gen2 PRD Section 12

Switch PRD (D-1 to D-10)

#	Decision	Recommendation	When
D-1	Per-MPU hub port width	2 lanes/dir, single die; widen only on validated batch evidence	Before SOW-2
D-2	Board-level long-reach SerDes (16/32 Gbps)	Complete IP evaluation during SOW-1	SOW-1
D-3	Per-lane rate path (16 -> 32 Gbps)	Lock 16 Gbps Gen1; 32 Gbps as Gen2 headroom	Gen1 freeze
D-4	Source self-listen (loopback at hub)	Lean yes — simplifies compiler consistency checks	FPGA phase
D-5	Link retry vs. phase restart	Baseline phase restart; decide retry from measured FPGA BER	After FPGA measure
D-6	Flit length and field widths	Set with D-2 PHY flit granularity	After D-2
D-7	On-device management MCU	Lean no — pure register plane + BMC to shrink attack surface	Before SOW-2
D-8	Inter-domain rail topology	Line/ring baseline over copper; optical now moot (80 Mbps copper is trivial)	Gen2 planning
D-9	Broadcast-only vs. subset-multicast (dst_mask)	Broadcast-only if every crossing is one row -> all chips	Gen1 freeze
D-10	Data-dependent routing (in-band route-id) for MoE dispatch	Out of Gen1; if ever needed, precompiled route-id indexing a route table	Gen2 planning

System PRD (D-S1 to D-S9)

#	Decision	Recommendation	When
D-S1	Root/Spine tiering	Converged to single hub — CLOSED	—
D-S2	Inter-domain rail medium	80 Mbps/line copper; confirm pinout/cable budget	SOW-1
D-S3	Host node spec	Commodity server + standard NIC; not custom	Before SOW-2
D-S4	Spare & repair strategy	Recompile re-route as baseline; hot spares per customer SLA	Gen1 system def
D-S5	Rack / power / cold-plate vendor	Taiwan-local chassis + cold-plate partner	SOW-2
D-S6	SIF / CIF naming	Standardize on SIF across doc set	Immediate
D-S7	Memory capacity solution	Capacity-bandwidth tension; decide with memory decision	Same window
D-S8	Pod chain-length ceiling	8-domain proposal baseline; frozen with D-S2 + SOW-1	After SOW-1
D-S9	Per-domain KV budget spec	Write as explicit spec; size memory by weights + KV together	Same window

SIF-Link: Flit Format & Protocol

Data-Plane Flit

Fixed-length flit; payload = FP16 value sequence of a row stream.

Row tag7 bitOne of 128 rows; consistency check only

Phase/slot8–12 bitCompiler slot ID; determinism check

Seq#8 bitIntra-phase sequence; recovery anchor

PayloadN × FP16Row-stream data; N set by flit-length decision (D-6)

CRC16 bitLink-layer error detection

Physical Layering

3D in-stackMPU ↔ CSL — Witmem/eTopus 3D, 5 Gbps

In-package D2DMPU ↔ egress chiplet — Xinyuan UCIe: 16 lane × 16 Gbps

Board: MPU↔hub64 MPUs to domain switch — 16–32 Gbps NRZ long-reach (D-2)

Inter-domain railMPU i ↔ MPU i — 16-bit bus @ 5 MHz (80 Mbps) copper only

Validation Phases: Detailed Deliverables

SOW-1

90-day kickoff

SIF FPGA

Map reduced-scale single hub on FPGA. Measure source-to-hub-to-receiver broadcast latency against <200 ns target. Verify source order preservation across late-start/early-arrive cases.

✓FPGA hub platform (8–16 MPU scale)
✓Latency measurement vs. <200 ns target
✓Source order preservation proven
✓Inter-domain direct link proven
✓SerDes IP evaluation + link sim

SOW-2

Post-SIF closure

CSL Ingress

DDR/analog-DDR -> CSL prefetch/reorder/buffer -> MPU Column State Ingress, column-by-column. Verify ingress cadence matches compute cadence.

✓Single cluster (4-chip) ingress closed
✓Compute-ingress cadence match proven
✓Cold plate / rack physical design
✓Power delivery characterization

SOW-3

Post-tapeout

System Integration

Integrate on-chip compute, SIF traverse, and CSL ingress into a closed loop. 64-chip domain bring-up: ~25 kW rack, liquid cooling, complete management and telemetry stack.

✓64-chip domain bring-up
✓~25 kW rack operational
✓ASIC tape-out basis
✓RAS validation (CRC + restart)

Confidential & NDA-Protected

This information is provided under NDA to ecosystem partners only. Do not distribute. For questions, contact info@symaticslab.com.

Partner Portal

Technical Specifications

Contents

Physical Design Parameters

Four-Phase Memory Model

Domain Composition

Single-Hub Dividend

SIF Hub: Latency Budget (Mode 2 Intra-Domain)

Hub Architecture

Communication Patterns

SIF Control & Schedule Model

Slot-Table Entry

Determinism Rules

Scale-Out & SuperPod Composition

Structure Extension (Mode 3)

Domain Replication

SuperPod Composition (Proposal)

Per-Domain KV Budget (Agent Era)

Resolution: Make KV Budget Explicit

Defensible Claim

Gen1 vs Gen2: Full Specification

Open Decisions (All Sources)

Switch PRD (D-1 to D-10)

System PRD (D-S1 to D-S9)

SIF-Link: Flit Format & Protocol

Data-Plane Flit

Physical Layering

Validation Phases: Detailed Deliverables

SIF FPGA

CSL Ingress

System Integration