Select Target Environment and Benchmarking

Edge Developer Toolbox Developer Guide

ID 783775

Date 06/07/2024

Version 24.05

Confidential

In this step, you can choose the processor that will be used to run the benchmark.

Intel® Core™ i9-12900K processor is selected from the processor options for this example. Under Target Architecture, choose the CPU option. Click Next to continue:

Choose the CPU option.

Proceed to the next section.

Benchmark the Model

Note:

For Hugging Face text_gen and code_gen models, Benchmark Model measures metrics such as Model Load Time, Output Size, Generation Time, First Token Latency, etc. that are available upon benchmark completion. For image_gen models, generation time, steps, height and width are available as benchmark metrics. Therefore, Throughput, Latency and Advanced settings options are not available.

For vision-based models, continue with the steps below.

You can choose to optimize the algorithm for either latency optimization or throughput optimization.

For advanced users, there’s an advanced mode that lets users adjust settings like sync or async API, batch size, the number of inference requests, and the number of streams—parameters that the user may already be familiar with from the OpenVINO™ benchmark app.

Throughput Optimization is chosen for this example. Click Benchmark to continue:

Choose Throughput Optimization and click Benchmark.

The benchmarking begins and you can check on the progress in the status box. After some time, the benchmarking is done and the results are shown. Click Next to continue:

The results after benchmarking.

In the following step, we will view the benchmarking results presented in tabular form. This display includes throughput and latency values, as well as system telemetry details like the average CPU and memory usage of the selected model on the chosen hardware. While benchmarking results on the chosen hardware for quantized models, selected precision is also shared in the table. Additionally, the platform provides per-layer performance counters, offering a detailed breakdown of each layer’s computational efficiency.

For models imported from Hugging Face, continue with the steps below.

The example below shows a text_gen model from Hugging Face. Benchmarking results on the chosen hardware show metrics like Model Load Time, Output Size, Generation Time, First Token Latency, other token latency. This example shows Intel® Xeon™ Platinum 8480+ Processor.

Example text_gen model from Hugging Face.

The model’s answer to the prompt input is also shown by clicking on View Results/Output. In addition to metrics above, the result accuracy can be used as additional data to determine model performance on the hardware of choice.

Telemetry data for Average CPU, memory, power usage, CPU temperature and Total energy are also available.

At this point, you have a set of benchmarks for one model. You can continue by benchmarking another model, or you can return to Edge Developer Toolbox to use the model in any of the workflows.