NCE Subsystem

Legal Disclaimer
Revision History
Introduction
Processor and Device IDs
Package Mechanical Specifications
Memory Mapping
- Functional Description
- Memory Map
  - Boot Block Update Scheme
Security Technologies
Intel® Virtualization Technology (Intel® VT)
Platform Environmental Control Interface (PECI)
- PECI Bus Architecture
Intel® GMM and Neural Network Accelerator (Intel® GNA 3.0)
Intel® Image Processing Unit (Intel® IPU6)
Intel® Neural Processing Unit (Intel® NPU)
- Functional Description
  - Processor Subsystem
  - NCE Subsystem
Audio Voice and Speech
Power Management
Power Delivery
Electrical Specifications
Thermal Management
System Clocks
Real Time Clock (RTC)
- Signal Description
- I/O Signal Planes and States
Memory
USB Type-C* Sub System
Universal Serial Bus (USB)
Intel® Volume Management Device (Intel® VMD) Technology
PCI Express* (PCIe*)
Serial ATA (SATA)
Graphics
- Processor Graphics
  - Media Support (Intel® QuickSync and Clear Video Technology HD)
  - Graphics Core Cache
- Platform Graphics Hardware Feature
  - Hybrid Graphics
Display
Processor Sideband Signals
General Purpose Input and Output
- Functional Description
- Signal Description
Interrupt Timer Subsystem (ITSS)
- Feature Overview
- Functional Description
GPIO Serial Expander
Intel® Serial I/O Inter-Integrated Circuit (I2C) Controllers
Intel® Serial I/O Improved Inter-Integrated Circuit (I3C) Controllers
Gigabit Ethernet Controller
Connectivity Integrated (CNVi)
Controller Link
Integrated Sensor Hub (ISH)
System Management Interface and SMLink
Host System Management Bus (SMBus) Controller
Serial Peripheral Interface (SPI)
Enhanced Serial Peripheral Interface (eSPI)
Intel® Serial IO Generic SPI (GSPI) Controllers
Touch Host Controller (THC)
Intel® Serial I/O Universal Asynchronous Receiver/Transmitter (UART) Controllers
Private Configuration Space Port ID
Testability and Monitoring
Miscellaneous Signals

The Neural Compute Subsystem is a hardware accelerator for Deep Neural Network (DNN) workloads. It features a highly configurable pipeline to support DNN (Deep Neural Network) operations such as CNN (Convolutional Neural Networks), LSTM (Long Short-Term memory) and LRN (Local Response Norm). It also leverages activation and weight sparsity optimal performance.

Neural Compute Subsystem is built from up to 2 NCE Tiles (fixed) where each Tile is a primary unit of compute. Each Tile can support 2K Multiply Accumulate circuits (MACs) and two Activation SHAVE Engines (ACTShave). Tiles can be deployed to operate independently across multiple networks (threads) or be aggregated to form a multi cluster engine processing a single network (thread). Refer to the diagram below showing the 4K4M configuration.

NCE Subsystem supports two DMA engines. Each engine supports in-line weight decompression and write data broadcast capability into the local Connection MatriX (CMX) memory (dedicated SRAM).

For hardware assisted task synchronization, the NCE Subsystem provides barriers and workload FIFOs. Barriers remove as much software overhead as possible through ISR loops and programming sequences to keep the computing and data-movement pipelines full.

Some NCE Subsystem Features

Processor Subsystem Some NCE Subsystem Features