Quantize Model

Edge Developer Toolbox Developer Guide

ID 783775

Date 06/07/2024

Version 24.05

Confidential

In the quantization section of the Deep Learning Workbench, models can be quantized post-training for improved performance. The Deep Learning Workbench utilizes the Neural Network Compression Framework (NNCF) to optimize neural network inference.

NNCF modifies deep learning network model objects to enable faster inference, ultimately improving performance by reducing memory footprint with minimal accuracy loss. While NNCF is compatible with PyTorch, TensorFlow, ONNX, and the OpenVINO™ toolkit, currently, only IR-based PyTorch models are supported in the Deep Learning Workbench.

In the workbench tab, click Quantize Model:

To see all models available for quantization, check the list on this page. If you want to import your model and have it appear in this list, go to the ‘Import Custom Model’ section. Import your model and convert it to the IR format compatible with the OpenVINO™ toolkit; it will become visible in the custom model list.

Select a model from the list and click Next to continue:

The quantization procedure necessitates a dataset. You can either upload a dataset in zip format containing jpg or png image files or utilize one of your pre-loaded datasets.

Choose the file type as ZIP, then upload your dataset as the ZIP folder. Name your dataset for future reference and versioning. Finally, click Submit to proceed.

After the dataset is uploaded, click Next.

In the following step, choose “Compression Algorithm” and select INT8, as only INT8 is currently supported. Select Target Device for Optimization from various hardware options. Specify the desired preset and list any ignored layer names and types.

Choose a compression algorithm and a target device for optimization.

Comprehensive information on quantization and logs is accessible, offering valuable insights for debugging and clarifying the steps involved in the quantization process.

Proceed to the Select Target Environment and Benchmarking section.