Edge Developer Toolbox Developer Guide

ID 783775
Date 06/07/2024
Version 24.05
Confidential
Document Table of Contents

Quantize Model

In the quantization section of the Deep Learning Workbench, models can be quantized post-training for improved performance. The Deep Learning Workbench utilizes the Neural Network Compression Framework (NNCF) to optimize neural network inference.

NNCF modifies deep learning network model objects to enable faster inference, ultimately improving performance by reducing memory footprint with minimal accuracy loss. While NNCF is compatible with PyTorch, TensorFlow, ONNX, and the OpenVINO™ toolkit, currently, only IR-based PyTorch models are supported in the Deep Learning Workbench.

In the workbench tab, click Quantize Model:

click Quantize Model

Click Quantize Model in the Deep Learning Workbench.

To see all models available for quantization, check the list on this page. If you want to import your model and have it appear in this list, go to the ‘Import Custom Model’ section. Import your model and convert it to the IR format compatible with the OpenVINO™ toolkit; it will become visible in the custom model list.

Select a model from the list and click Next to continue:

Select a model from the list

Select a model from the list.

The quantization procedure necessitates a dataset. You can either upload a dataset in zip format containing jpg or png image files or utilize one of your pre-loaded datasets.

Choose the file type as ZIP, then upload your dataset as the ZIP folder. Name your dataset for future reference and versioning. Finally, click Submit to proceed.

Upload your dataset and name it

Upload your dataset and name it.

After the dataset is uploaded, click Next.

In the following step, choose “Compression Algorithm” and select INT8, as only INT8 is currently supported. Select Target Device for Optimization from various hardware options. Specify the desired preset and list any ignored layer names and types.

Choose a compression algorithm and a target device

Choose a compression algorithm and a target device for optimization.

Comprehensive information on quantization and logs is accessible, offering valuable insights for debugging and clarifying the steps involved in the quantization process.

Proceed to the Select Target Environment and Benchmarking section.