Intel® Ethernet 800 Series
Linux Performance Tuning Guide
4th Gen Intel® Xeon® Scalable Processors
The power management of the Intel 4th Gen Intel® Xeon® Scalable processor is extremely aggressive compared to 3rd Gen Intel® Xeon® Scalable processors. To avoid cores from entering low-power states, try reducing the number of cores in use to keep them awake for longer.
Recommended BIOS settings for the highest performance:
- Hyper-threading enable/disable (based on the workload requirement and performance goals) on the CPU.
- Set the system profile to “Performance” for the maximum performance (Note: this results in higher power consumption.
- Set the CPU power management to “Maximum Performance” to prioritize maximum CPU performance over power efficiency.
- Enable Turbo Boost. Disabling Turbo Boost in the system BIOS settings typically prevents the CPU from dynamically increasing its clock speed beyond its base frequency.
Note:Disabling Turbo Boost may be suitable for certain use cases where consistent performance, power efficiency, or thermal management are prioritized over maximum performance. - Turn off Single Root I/O Virtualization (SR-IOV) feature, if the system is not utilizing virtualization technologies.
- Disable C-states to instruct the CPU to stay active and prevent entering deeper idle states.
- Disable C1E, to ensure that the CPU remains active and does not enter the C1E idle state.
- Set the uncore frequency to “maximum” to instruct the system to operate at the highest available frequency.
- On Dell platforms, set MADT (Multiple APIC Description Table) core emulation to “Linear” (or “Round-Robin” depending on BIOS) to provide a clear and predictable mapping of CPU cores.
Recommended OS level tunings for optimized performance:
- Set CPU frequency scaling governor to “performance”.
cpupower frequency-set -g performance cpupower frequency-info - Disable C-States.
cpupower idle-set -D0 - Set core Rx (rmem) and Tx (wmem) buffers to max value.
sysctl -w net.core.rmem_max=16777216 sysctl -w net.core.wmem_max=16777216 - Set network device backlog.
sysctl -w net.core.netdev_max_backlog=8192 - Set tuned profile (workload dependent for throughput/latency).
tuned-adm profile network-throughput
Recommended adapter level tunings for optimized performance:
- Limit number of queues to use for application traffic. Use the minimum number of queues required to keep the associated CPU cores active to prevent them from entering deeper idle states (adjust for the workload):
ethtool -L <ethX> combined 32 - Set interrupt moderation rates.
ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs-high 50 rx-usecs 50 tx-usecs 50 Try adjusting the transmit/receive/high-priority coalescing timer higher (80/100/150/200) or lower (25/20/10/5) to find optimal value for the workload.
- Set the Rx/TX ring sizes.
ethtool -G <ethX> rx 4096 tx 4096 Note:If you see Rx packet drops with "ethtool -S | grep drop", try reducing the ring size to <4096. Try to find the optimal value for the workload where packets aren't dropped. - Set IRQ Affinity. Use cores local to the NIC, or specific core mapping (where # cores is equal to the number of queues set in step 1.).
systemctl stop irqbalance set_irq_affinity -X local <ethX> OR
set_irq_affinity -X <cores> <ethX>