wrk Clients Configuration

Intel® Ethernet Controller E810 Application Device Queues (ADQ)

Configuration Guide

Download as ZIP

ID 609008

Date 04/03/2023

Version 2.8

Note:ADQ is not required for the clients. The following example sets up clients without ADQ for use in benchmarking the ADQ or non-ADQ NGINX SUT server performance using the wrk benchmark.

The following variables are used in the examples in this section:

$iface	The interface in use.
$duration	Test time in format <value><unit> (example: 60s for 60 seconds, 3m for 3 minutes).
$client_num	The total number of physical systems being used as clients.
$pathtoicepackage	The path to the ice driver package.
$connection_scale	The number of connections per client thread.
$num_queues_tc1	From the SUT, the number of queues for application traffic class on the SUT.
$ipaddr	From the SUT, the IP Address of the application server's interface under test.
$app_port	From the SUT, the TCP port of the nginx application being run on the SUT.

Perform wrk build:
1. Download wrk. git clone --branch 4.1.0 https://github.com/wg/wrk.git wrk
2. Install wrk. yum install -y openssl-devel cd wrk make cp wrk /usr/local/bin

Perform Client configuration.
1. Enable throughput-performance tuned profile. tuned-adm profile throughput-performance Note:The tuned-adm daemon is not installed by default in RHEL9.0 systems. Install it with the command yum install tuned.
  Check settings are applied correctly:
  cat /etc/tuned/active_profile
  Output: throughput-performance
  cat /etc/tuned/profile_mode
  Output: manual
2. Stop the irqbalance service.
  
  systemctl stop irqbalance
3. Run the set_irq_affinity script for all interfaces (included in the scripts folder of the ice package). ${pathtoicepackage}/scripts/set_irq_affinity -x all $iface
4. Set limits file in /etc/security/limits.conf to include: hard nofile 32768 soft nofile 32768

Perform wrk benchmarking.

A test run would be one wrk instance per physical client system, each with multiple application threads. Due to the design of wrk itself, it is recommended to keep the number of threads of the client benchmark per client system less than or equal to the number of physical local CPU cores on the client systems. For more predictable results, it is also recommended to run the client benchmark with the total number of threads from all client systems equal to an even multiple of the number of application threads on the SUT. This ensures an even distribution of client load.

For example, in the case of an ADQ enabled NGINX server with 110 TC1 queues and 10 physical client systems each with 24 physical CPU cores per socket, a recommended configuration would be 10 instances of wrk (one running on each physical client), each with 22 threads. The number of connections can then be linearly scaled to increase the workload evenly across the SUT.
Note:Average latency is just a measure of how fast the SUT can respond.
If the SUT is not overloaded then it does not need ADQ to have a decent latency number.

ADQ shows benefit when you scale up the system beyond what it would normally be able to handle.

We can start a wrk instance on each client concurrently via a command line script (example below) and then the results added together.

Example:
Note:$num_queues_tc1, $ipaddr, and $app_port are the same numbers used from the SUT configuration file="test${buffer_size}.dat" threads=$(( 2 * $num_queues_tc1 / $client_num )) conns=$(( $threads * $connection_scale )) numactl --cpunodebind=netdev:$iface --membind=netdev:$iface wrk --latency -t ${threads} -c ${conns} -d ${duration} "http://${ipaddr}:${app_port}/${file}"