Deep Learning Accelerators (NCE)

Legal Disclaimer
Revision History
Introduction
Processor and Device IDs
Package Mechanical Specifications
Memory Mapping
- Functional Description
- Memory Map
  - Boot Block Update Scheme
Security Technologies
Intel® Virtualization Technology (Intel® VT)
Instructions Set Enhancements
Intel® Image Processing Unit (Intel® IPU7)
Intel® Neural Processing Unit (Intel® NPU)
- Functional Description
  - HOST Control
  - Deep Learning Accelerators (NCE)
Audio Voice and Speech
Power Management
Power Delivery
Electrical Specifications
Thermal Management
System Clocks
Real Time Clock (RTC)
- Signal Description
- I/O Signal Planes and States
Integrated System Memory
USB Type-C* Sub System
Universal Serial Bus (USB)
PCI Express* (PCIe*)
Graphics
- Processor Graphics
  - Media Support (Intel® QuickSync and Clear Video Technology HD)
  - Graphics Core Cache
- Platform Graphics Hardware Feature
  - Hybrid Graphics
Display
Processor Sideband Signals
General Purpose Input and Output
- Functional Description
- Signal Description
Interrupt Timer Subsystem (ITSS)
- Feature Overview
- Functional Description
Intel® Serial I/O Inter-Integrated Circuit (I2C) Controllers
Intel® Serial I/O Improved Inter-Integrated Circuit (I3C) Controllers
Gigabit Ethernet Controller
Connectivity Integrated (CNVi)
Controller Link
Integrated Sensor Hub (ISH)
System Management Interface and SMLink
Serial Peripheral Interface (SPI)
Enhanced Serial Peripheral Interface (eSPI)
Intel® Serial IO Generic SPI (GSPI) Controllers
Touch Host Controller (THC)
Intel® Serial I/O Universal Asynchronous Receiver/Transmitter (UART) Controllers
Private Configuration Space Port ID
Testability and Monitoring
- Signal Description
- I/O Signal Planes and States
Miscellaneous Signals

The Neural Compute Engine (NCE) is a hardware accelerator for Deep Neural Network (DNN) workloads. It features a highly configurable pipeline for maximum support of DNN operations, such as Long Short-Term memory (LSTM) and Local Response Norm (LRN). It also leverages sparsity and low precision for optimal performance.

The Neural Compute Engine is built from up to 6 Neural Compute Tiles, where each tile is a primary unit of computing. Each NCE Tile incorporates its own memory, and DPU and ACT-SHAVE compute resources, and can independently work on a single workload at a given time or be aggregated as a cluster of tiles running the same workload.

The NCE Subsystem features efficient Inter-Tile-Interconnect (ITI) with multicasting and broadcasting capability to allow NCE Tile to efficiently share data between tiles, in case NCE Tiles are aggregated to split workload.

The NCE Subsystem incorporates a DMA engine with a compression unit and broadcasting/multicasting capability for populating multiple NCE Tiles Memory concurrently.

For hardware assisted task synchronization, the NCE Subsystem provides barriers and workload FIFOs. Barriers remove as much software overhead as possible through Interrupt Service Routing (ISR )loops and programming sequences to keep the compute and data-movement pipelines full.

Feature Set

HOST Control Feature Set