Intel® Ethernet Adapters and Devices User Guide

ID Date Version Classification
705831 08/30/2024 Public
Document Table of Contents

Optimizing Performance

You can configure advanced settings on Intel network adapters to help optimize server performance. This section provides tips for:

Note:
  • Linux users: See the README file in the Linux driver package for Linux-specific performance enhancement details.

  • The recommendations below are guidelines and should be treated as such. Additional factors such as installed applications, bus type, network topology, and operating system also affect system performance.

  • These adjustments should be performed by a highly skilled network administrator. They are not guaranteed to improve performance. Not all settings shown here may be available through network driver configuration, operating system or system BIOS.

  • When using performance test software, refer to the documentation of the application for optimal results.

General Optimization

  • Install the adapter in an appropriate slot.

    Note:

    Some PCIe x8 slots are actually configured as x4 slots. These slots have insufficient bandwidth for full line rate with some dual port devices. The driver can detect this situation and will write the following message in the system log: “PCI-Express bandwidth available for this card is not sufficient for optimal performance. For optimal performance a x8 PCI-Express slot is required.” If this error occurs, moving your adapter to a true x8 slot will resolve the issue.

  • For an Intel Ethernet 700 Series adapter to reach its full potential, you must install it in a PCIe Gen3 x8 slot. Installing it in a shorter slot, or a Gen2 or Gen1 slot, will impact the throughput the adapter can attain.

  • Use the proper cabling for your device.

  • Increase the number of TCP and Socket resources from the default value. For Windows based systems, we have not identified system parameters other than the TCP Window Size which significantly impact performance.

  • Increase the allocation size of Driver Resources (transmit/receive buffers). However, most TCP traffic patterns work best with the transmit buffer set to its default value, and the receive buffer set to its minimum value.

Jumbo Frames

Enabling jumbo frames may increase throughput. You must enable jumbo frames on all of your network components to get any benefit.

RSS Queues

If you have multiple 10Gbps (or faster) ports installed in a system, the RSS queues of each adapter port can be adjusted to use non-overlapping sets of processors within the adapter’s local Non-Uniform Memory Access (NUMA) Node/Socket. Change the RSS Base Processor Number for each adapter port so that the combination of the base processor and the max number of RSS processors settings ensure non-overlapping cores.For Microsoft Windows systems, do the following:

  1. Identify the adapter ports to be adjusted and inspect their RssProcessorArray using the Get-NetAdapterRSS PowerShell cmdlet.

  2. Identify the processors with NUMA distance 0. These are the cores in the adapter’s local NUMA Node/Socket and will provide the best performance.

  3. Adjust the RSS Base processor on each port to use a non-overlapping set of processors within the local set of processors. You can do this manually or using the following PowerShell command:

    Set-NetAdapterAdvancedProperty -Name <Adapter Name> -DisplayName "RSS Base
    Processor Number" -DisplayValue <RSS Base Proc Value>
    
  4. Use the Get-NetAdapterAdvancedproperty cmdlet to check that the right values have been set:

    Get-NetAdapterAdvancedproperty -Name <Adapter Name>
    

For example: For a 4-port adapter with Local processors 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, and Max RSS processor of 8, set the RSS base processors to 0, 8, 16 and 24.

CPU Affinity

When passing traffic on multiple network ports using an I/O application that runs on most or all of the cores in your system, consider setting the CPU Affinity for that application to fewer cores. This should reduce CPU utilization and in some cases may increase throughput for the device. The cores selected for CPU Affinity must be local to the affected network device’s Processor Node/Group. You can use the PowerShell command Get-NetAdapterRSS to list the cores that are local to a device. You may need to increase the number of cores assigned to the application to maximize throughput. Refer to your operating system documentation for more details on setting the CPU Affinity.

Optimization for Specific Usage Models

The following sections describe possible tasks you can try to optimize performance for specific server usage models.

Optimize for Quick Response and Low Latency

Useful for: Video, audio, and High Performance Computing Cluster (HPCC) servers

  • Minimize or disable interrupt moderation rate.

  • Disable offload TCP segmentation.

  • Disable jumbo packets.

  • Increase transmit descriptors.

  • Increase receive descriptors.

  • Increase RSS queues.

Optimize for Throughput

Useful for: Data backup/retrieval and file servers

  • Enable jumbo packets.

  • Increase transmit descriptors.

  • Increase receive descriptors.

  • On systems that support NUMA, set the Preferred NUMA Node on each adapter to achieve better scaling across NUMA nodes.

Optimize for CPU Utilization

Useful for: Application, web, mail, and database servers

  • Maximize interrupt moderation rate.

  • Keep the default setting for the number of receive descriptors; avoid setting large numbers of receive descriptors.

  • Decrease RSS queues.

  • In Hyper-V environments, decrease the max number of RSS CPUs.