Intel® Core™ Ultra Processor
Datasheet, Volume 1 of 2
Supporting Intel® Core™ Ultra Processor for U/H/U-Type4-series Platforms, formerly known as Meteor Lake
| ID | Date | Version | Classification |
|---|---|---|---|
| 792044 | 03/05/2024 | Public |
A newer version of this document is available. Customers should click here to go to the newest version.
NCE Subsystem
The Neural Compute Subsystem is a hardware accelerator for Deep Neural Network (DNN) workloads. It features a highly configurable pipeline to support DNN (Deep Neural Network) operations such as CNN (Convolutional Neural Networks), LSTM (Long Short-Term memory) and LRN (Local Response Norm). It also leverages activation and weight sparsity optimal performance.
Neural Compute Subsystem is built from up to 2 NCE Tiles (fixed) where each Tile is a primary unit of compute. Each Tile can support 2K Multiply Accumulate circuits (MACs) and two Activation SHAVE Engines (ACTShave). Tiles can be deployed to operate independently across multiple networks (threads) or be aggregated to form a multi cluster engine processing a single network (thread). Refer to the diagram below showing the 4K4M configuration.
NCE Subsystem supports two DMA engines. Each engine supports in-line weight decompression and write data broadcast capability into the local Connection MatriX (CMX) memory (dedicated SRAM).
For hardware assisted task synchronization, the NCE Subsystem provides barriers and workload FIFOs. Barriers remove as much software overhead as possible through ISR loops and programming sequences to keep the computing and data-movement pipelines full.