Intel® Ethernet Controller E810 Application Device Queues (ADQ)
Configuration Guide
wrk Clients Configuration
The following variables are used in the examples in this section:
$iface | The interface in use. |
$duration | Test time in format <value><unit> (example: 60s for 60 seconds, 3m for 3 minutes). |
$client_num | The total number of physical systems being used as clients. |
$pathtoicepackage | The path to the ice driver package. |
$connection_scale | The number of connections per client thread. |
$num_queues_tc1 | From the SUT, the number of queues for application traffic class on the SUT. |
$ipaddr | From the SUT, the IP Address of the application server's interface under test. |
$app_port | From the SUT, the TCP port of the nginx application being run on the SUT. |
- Perform wrk build:
- Download wrk.
git clone --branch 4.1.0 https://github.com/wg/wrk.git wrk - Install wrk.
yum install -y openssl-devel cd wrk make cp wrk /usr/local/bin
- Download wrk.
- Perform Client configuration.
- Enable throughput-performance tuned profile.
tuned-adm profile throughput-performance Note:The tuned-adm daemon is not installed by default in RHEL9.0 systems. Install it with the command yum install tuned .Check settings are applied correctly:
cat /etc/tuned/active_profile Output:
throughput-performance cat /etc/tuned/profile_mode Output:
manual -
Stop the irqbalance service.
systemctl stop irqbalance - Run the set_irq_affinity script for all interfaces (included in the scripts folder of the ice package).
${pathtoicepackage}/scripts/set_irq_affinity -x all $iface - Set limits file in /etc/security/limits.conf to include:
hard nofile 32768 soft nofile 32768
- Enable throughput-performance tuned profile.
-
Perform wrk benchmarking.
A test run would be one wrk instance per physical client system, each with multiple application threads. Due to the design of wrk itself, it is recommended to keep the number of threads of the client benchmark per client system less than or equal to the number of physical local CPU cores on the client systems. For more predictable results, it is also recommended to run the client benchmark with the total number of threads from all client systems equal to an even multiple of the number of application threads on the SUT. This ensures an even distribution of client load.
For example, in the case of an ADQ enabled NGINX server with 110 TC1 queues and 10 physical client systems each with 24 physical CPU cores per socket, a recommended configuration would be 10 instances of wrk (one running on each physical client), each with 22 threads. The number of connections can then be linearly scaled to increase the workload evenly across the SUT.
Note:Average latency is just a measure of how fast the SUT can respond. If the SUT is not overloaded then it does not need ADQ to have a decent latency number.
ADQ shows benefit when you scale up the system beyond what it would normally be able to handle.
We can start a wrk instance on each client concurrently via a command line script (example below) and then the results added together.
Example:
Note:$num_queues_tc1, $ipaddr, and $app_port are the same numbers used from the SUT configuration file="test${buffer_size}.dat" threads=$(( 2 * $num_queues_tc1 / $client_num )) conns=$(( $threads * $connection_scale )) numactl --cpunodebind=netdev:$iface --membind=netdev:$iface wrk --latency -t ${threads} -c ${conns} -d ${duration} "http://${ipaddr}:${app_port}/${file}"