Intel® Ethernet 700/800 Series
Windows Performance Tuning Guide
Low Receive (RX) Performance and/or Discarded Packets
Discarded or dropped RX packets, can indicate an issue with system or resource contention. For example, if the CPU is not servicing buffers fast enough or resources are limited in the PCIe slot/memory/system, this can cause discarded packets.
If discarded packets are observed in the Task Manager or PowerShell commands, such as the Get-Counter cmdlet examples in Analyze Performance Counters (typeperf), or the netstat -e and netsh interface IPv4 show ipstats commands, the settings below may help to alleviate resource contention.
Apply each of these suggestions one at a time to measure any differences in performance introduced by each change:
- Increase Receive Buffers size to 4096. See Receive Buffers.
- Increase Transmit Buffers size to 4096. See Transmit Buffers.
Note:Increasing the TX buffer size is not typically needed. Bottlenecks are usually caused by RX packets coming into the adapter and getting processed by the CPU. However, the TX buffer setting is worth testing, in case it does result in a performance benefit for your workload. - Verify the CPU Affinity for the device, and use the CPU cores local to the adapter. Best performance is typically achieved using CPU cores local to the device. However, there are situations where using remote cores or all cores may be beneficial, depending on the application and workload. See CPU Affinity.
- Update the RSS Base Processor Number (if needed). See RSS Base Processor Number:
- If the device is on a NUMA node other than NUMA 0, set the RSS Base Processor Number to the first physical CPU core for the local NUMA node. For example, on a server with 2 CPUs, each with 40 cores, the cores are split 0 to 39 for NUMA 0, and 40 to 79 for NUMA 1 (on a server with consecutive CPU core layout).
- If the device is installed on NUMA 1, the RSS Base Processor Number should be set to 40 for the first CPU core on NUMA 1.
- If the device is installed on NUMA 0, it is best to avoid Core 0, which is used for OS administration tasks. For NUMA 0, use Cores 1 or 2 for the RSS Base Processor, depending on the CPU layout. It is best to avoid using Core 0 for RSS or for application traffic.
- Reduce the Maximum Number of RSS Queues to 8 (or fewer for lower core count servers). See Maximum Number of RSS Queues.
- Experiment with the Interrupt Moderation Rate threshold. See Interrupt Moderation Rate.
- Experiment with the values, Adaptive, High, and Low, to see if there are any resulting differences in the performance or the number of discarded packets.
Note: When a higher ITR setting is used, the interrupt rate is lower, which can result in better CPU performance for higher link speeds. With lower link speeds, a lower ITR setting may give best performance to service interrupts faster.
- Experiment with the values, Adaptive, High, and Low, to see if there are any resulting differences in the performance or the number of discarded packets.
- For latency-sensitive workloads, try disabling Interrupt Moderation all together. See Interrupt Moderation.