Intel® Ethernet 700/800 Series

Windows Performance Tuning Guide

ID Date Version Classification
784543 02/06/2024 1.1 Public
Document Table of Contents

​Low Receive (RX) Performance and/or Discarded Packets

Discarded or dropped RX packets, can indicate an issue with system or resource contention. For example, if the CPU is not servicing buffers fast enough or resources are limited in the PCIe slot/memory/system, this can cause discarded packets.

If discarded packets are observed in the Task Manager or PowerShell commands, such as the Get-Counter cmdlet examples in ​Analyze Performance Counters (typeperf), or the netstat -e and netsh interface IPv4 show ipstats commands, the settings below may help to alleviate resource contention.

Note:​These same tunings can help with environments where there are no discarded packets, but RX performance is lower than expected due to similar resource constraints.

Apply each of these suggestions one at a time to measure any differences in performance introduced by each change:

  1. ​Increase Receive Buffers size to 4096. See ​Receive Buffers.
  2. ​Increase Transmit Buffers size to 4096. See ​Transmit Buffers. Note:​Increasing the TX buffer size is not typically needed. Bottlenecks are usually caused by RX packets coming into the adapter and getting processed by the CPU. However, the TX buffer setting is worth testing, in case it does result in a performance benefit for your workload.
  3. ​Verify the CPU Affinity for the device, and use the CPU cores local to the adapter. Best performance is typically achieved using CPU cores local to the device. However, there are situations where using remote cores or all cores may be beneficial, depending on the application and workload. See ​CPU Affinity.
  4. ​Update the RSS Base Processor Number (if needed). See ​RSS Base Processor Number:
    1. ​If the device is on a NUMA node other than NUMA 0, set the RSS Base Processor Number to the first physical CPU core for the local NUMA node. For example, on a server with 2 CPUs, each with 40 cores, the cores are split 0 to 39 for NUMA 0, and 40 to 79 for NUMA 1 (on a server with consecutive CPU core layout).
    2. ​If the device is installed on NUMA 1, the RSS Base Processor Number should be set to 40 for the first CPU core on NUMA 1.
    3. If the device is installed on NUMA 0, it is best to avoid Core 0, which is used for OS administration tasks. For NUMA 0, use Cores 1 or 2 for the RSS Base Processor, depending on the CPU layout. It is best to avoid using Core 0 for RSS or for application traffic.
  5. ​Reduce the Maximum Number of RSS Queues to 8 (or fewer for lower core count servers). See ​Maximum Number of RSS Queues.
  6. Experiment with the Interrupt Moderation Rate threshold. See ​Interrupt Moderation Rate.
    1. ​Experiment with the values, Adaptive, High, and Low, to see if there are any resulting differences in the performance or the number of discarded packets. Note: When a higher ITR setting is used, the interrupt rate is lower, which can result in better CPU performance for higher link speeds. With lower link speeds, a lower ITR setting may give best performance to service interrupts faster.
  7. ​For latency-sensitive workloads, try disabling Interrupt Moderation all together. See ​Interrupt Moderation.