Intel® Ethernet Controller E810 Application Device Queues (ADQ)

Configuration Guide

ID 609008
Date 04/03/2023
Version 2.8
Document Table of Contents

wrk Clients Configuration

Note:ADQ is not required for the clients. The following example sets up clients without ADQ for use in benchmarking the ADQ or non-ADQ NGINX SUT server performance using the wrk benchmark.

The following variables are used in the examples in this section:

$iface The interface in use.
$duration Test time in format <value><unit> (example: 60s for 60 seconds, 3m for 3 minutes).
$client_​num The total number of physical systems being used as clients.
$pathtoicepackage The path to the ice driver package.
$connection_​scale The number of connections per client thread.
$num_​queues_​tc1 From the SUT, the number of queues for application traffic class on the SUT.
$ipaddr From the SUT, the IP Address of the application server's interface under test.
$app_​port From the SUT, the TCP port of the nginx application being run on the SUT.
  1. Perform wrk build:
    1. Download wrk. git clone --branch 4.1.0 https://github.com/wg/wrk.git wrk
    2. Install wrk. yum install -y openssl-devel cd wrk make cp wrk /usr/local/bin
  2. Perform Client configuration.
    1. Enable throughput-performance tuned profile. tuned-adm profile throughput-performance Note:The tuned-adm daemon is not installed by default in RHEL9.0 systems. Install it with the command yum install tuned.

      Check settings are applied correctly:

      cat /etc/tuned/active_profile

      Output: throughput-performance

      cat /etc/tuned/profile_mode

      Output: manual

    2. Stop the irqbalance service.

      systemctl stop irqbalance

    3. Run the set_​irq_​affinity script for all interfaces (included in the scripts folder of the ice package). ${pathtoicepackage}/scripts/set_irq_affinity -x all $iface
    4. Set limits file in /etc/security/limits.conf to include: hard nofile 32768 soft nofile 32768
  3. Perform wrk benchmarking.

    A test run would be one wrk instance per physical client system, each with multiple application threads. Due to the design of wrk itself, it is recommended to keep the number of threads of the client benchmark per client system less than or equal to the number of physical local CPU cores on the client systems. For more predictable results, it is also recommended to run the client benchmark with the total number of threads from all client systems equal to an even multiple of the number of application threads on the SUT. This ensures an even distribution of client load.

    For example, in the case of an ADQ enabled NGINX server with 110 TC1 queues and 10 physical client systems each with 24 physical CPU cores per socket, a recommended configuration would be 10 instances of wrk (one running on each physical client), each with 22 threads. The number of connections can then be linearly scaled to increase the workload evenly across the SUT.

    Note:Average latency is just a measure of how fast the SUT can respond.

    If the SUT is not overloaded then it does not need ADQ to have a decent latency number.

    ADQ shows benefit when you scale up the system beyond what it would normally be able to handle.

    We can start a wrk instance on each client concurrently via a command line script (example below) and then the results added together.

    Example:

    Note:$num_​queues_​tc1, $ipaddr, and $app_​port are the same numbers used from the SUT configuration file="test${buffer_size}.dat" threads=$(( 2 * $num_queues_tc1 / $client_num )) conns=$(( $threads * $connection_scale )) numactl --cpunodebind=netdev:$iface --membind=netdev:$iface wrk --latency -t ${threads} -c ${conns} -d ${duration} "http://${ipaddr}:${app_port}/${file}"