Intel® Ethernet 800 Series Linux Flow Control

Configuration Guide for RDMA Use Cases

ID 635330
Date 07/13/2023
Version 1.3
Document Table of Contents

Example 5 - PCP PFC with Multiple TCs (1 for RDMA, 1 for LAN) – with VLANs

This example describes how to run both RDMA and LAN traffic on the same link using VLANs.

These steps can be used in a back-to-back configuration, or if you are using a switch, be sure to configure the neighboring switch ports for the same configuration (consult the appropriate switch manual for more detail).

Settings in this example:

  • Non-willing mode — In this example, adapter settings are configured explicitly using lldptool (vs. configuring DCB on a switch and using willing mode on adapters).
  • Software DCB — Required to use non-willing mode.
  • Three traffic classes:
    • 1 lossy TC for general LAN traffic on the parent interface.
    • 1 loss-less TC for RDMA traffic on VLAN 100.
    • 1 lossy TC for LAN traffic on VLAN 200.
  • PFC enabled for only the RDMA traffic class (this makes it loss-less).

Perform the following steps on both servers:

  1. Disable LFC (LFC and PFC cannot co-exist). # ethtool -A <interface> rx off tx off
  2. Verify that LFC is disabled. # ethtool -a <interface> Pause parameters for <interface>: Autonegotiate: on RX: off TX: off RX negotiated: off TX negotiated: off
  3. Configure the adapter for software DCB mode by disabling firmware DCB mode. # ethtool --set-priv-flags <interface> fw-lldp-agent off
  4. Verify that firmware DCB is disabled. # ethtool --show-priv-flags <interface> | grep fw-lldp-agent fw-lldp-agent : off
  5. Install OpenLLDP (the software that controls PFC and other DCB settings), if not already installed.
    • RHEL: # yum install lldpad
    • SLES or Ubuntu: zypper or apt-get might work (untested)
    • All operating systems:

      Download and build from source from https://github.com/intel/openlldp.

  6. Start the LLDP daemon. # lldpad -d
  7. Verify LLDP is active by showing current LLDP settings on the interface.

    The following example shows the OpenLLDP default:

    # lldptool -ti <interface> Chassis ID TLV MAC: 68:05:ca:a3:89:78 Port ID TLV MAC: 68:05:ca:a3:89:78 Time to Live TLV 120 IEEE 8021QAZ ETS Configuration TLV Willing: yes CBS: not supported MAX_TCS: 8 PRIO_MAP: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0 TC Bandwidth: 0% 0% 0% 0% 0% 0% 0% 0% TSA_MAP: 0:strict 1:strict 2:strict 3:strict 4:strict 5:strict 6:strict 7:strict IEEE 8021QAZ PFC TLV Willing: yes MACsec Bypass Capable: no PFC capable traffic classes: 8 PFC enabled: none End of LLDPDU TLV
  8. Plan your DCB configuration
    • RDMA is loss-less (PFC enabled).
    • LAN traffic is lossy (PFC disabled).
    Traffic Stream Loss-less TC Priority ToS Bandwidth Interface
    General Traffic No 0 0 0 10% Parent
    RDMA Application Yes 1 2 8 50% VLAN 100
    LAN Application No 2 3 08.a 40% VLAN 200
    Unused No Any8.b All Others N/A 0% N/A
    Notes:
    1. LAN traffic can set VLAN priority directly using egress-qos-map when configuring the interface, so ToS mappings are not required.
    2. Unused priorities can be mapped to any TC (no traffic is being steered to specific priorities). Leaving them mapped to TC 0 is acceptable.
  9. Configure ETS.
    • Map priorities to traffic classes.
    • Allocate bandwidth.
    Note:The following is a single long command line: # lldptool -Ti <interface> -V ETS-CFG willing=no \ up2tc=0:0,1:0,2:1,3:2,4:0,5:0,6:0,7:0 \ tsa=0:ets,1:ets,2:ets,3:strict,4:strict,5:strict,6:strict,7:strict \ tcbw=10,50,40,0,0,0,0,0

    Output:

    willing = no up2tc = 0:0,1:0,2:1,3:2,4:0,5:0,6:0,7:0 TSA = 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict tcbw = 10% 50% 40% 0% 0% 0% 0%
  10. Enable PFC on Priority 2 (non-willing). # lldptool -Ti <interface> -V PFC willing=no enabled=2

    Output:

    willing = no prio = 2
  11. Verify new settings. # lldptool -ti <interface> Chassis ID TLV MAC: 68:05:ca:a3:89:78 Port ID TLV MAC: 68:05:ca:a3:89:78 Time to Live TLV 120 IEEE 8021QAZ ETS Configuration TLV Willing: no CBS: not supported MAX_TCS: 8 PRIO_MAP: 0:0 1:0 2:1 3:2 4:0 5:0 6:0 7:0 TC Bandwidth: 10% 50% 40% 0% 0% 0% 0% 0% TSA_MAP: 0:ets 1:ets 2:ets 3:strict 4:strict 5:strict 6:strict 7:strict IEEE 8021QAZ PFC TLV Willing: no MACsec Bypass Capable: no PFC capable traffic classes: 8 PFC enabled: 0x4 ← This is a mask. 0x4 = 0b0000_0100, meaning PFC is enabled on TC 2. End of LLDPDU TLV
  12. Repeat DCB settings on the neighbor node:
    • If using a back-to-back configuration, either repeat the DCB configuration on the other host or enable willing mode on that host.
    • If using a switch, configure the same DCB scheme on the switch port. Consult the appropriate switch manual for details.
  13. Create VLAN 100 for RDMA traffic:
    1. Create VLAN 100 as a part of the parent interface (like parent interface eth0, with an IP Address of 192.168.0.3). # ip link add eth0.100 link eth0 type vlan id 100
    2. Bring the new interface up. # ip link set eth0.100 up
    3. Set the IP Address on the new interface. In the example address, the third octet is 100, same as the VLAN ID, but the values do not need to match. However, the VLAN IP Address does need to be in a different subnet than the parent address. # ip address add dev eth0.100 192.168.100.3/24
  14. Create VLAN 200 (for LAN traffic) with egress-qos-map set:
    1. Create VLAN 200 as a part of the same parent interface (still using eth0 in the example).
    2. Use egress-qos-map to map all VLAN 200 LAN traffic to priority 3 in the VLAN header (see man ip-link for documentation). # ip link add eth0.200 link eth0 type vlan id 200 egress-qos-map 0:3 1:3 2:3 3:3 4:3 5:3 6:3 7:3
    3. Bring the new interface up. # ip link set eth0.200 up
    4. Set the IP Address on the new interface. In the example address, the third octet is 200, like the VLAN ID, but it does not need to match. # ip address add dev eth0.100 192.168.200.3/24
  15. Verify new interfaces:
    1. Examine the output of ip link show and verify both new VLANs are up and have the right IP Address. 10: enp175s0f0.100@enp175s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 68:05:ca:a3:89:78 brd ff:ff:ff:ff:ff:ff inet 192.168.100.1/24 scope global enp175s0f0.100 valid_lft forever preferred_lft forever inet6 fe80::6a05:caff:fea3:8978/64 scope link valid_lft forever preferred_lft forever 12: enp175s0f0.200@enp175s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 68:05:ca:a3:89:78 brd ff:ff:ff:ff:ff:ff inet 192.168.200.3/24 scope global enp175s0f0.200 valid_lft forever preferred_lft forever inet6 fe80::6a05:caff:fea3:8978/64 scope link valid_lft forever preferred_lft forever
    2. Verify egress mappings for VLAN 200 here. # cat /proc/net/vlan/eth0.200 | grep EGRESS EGRESS priority mappings: 0:3 1:3 2:3 3:3 4:3 5:3 6:3 7:3
  16. Repeat VLAN settings on the neighbor node:
    • If using a back-to-back configuration, configure the same VLANs on the other host.
    • If using a switch, consult the appropriate switch manual for details.

    Sample commands from an Arista 7060CX:

    1. Create VLANs 100 and 200. switch>enable switch>config switch(config)>vlan 100 switch(config)>vlan 200
    2. Set the switch ports where adapters are connected to trunk mode. Example, for port 21/1. switch(config)>#interface Et21/1 switch(config-if-Et21/1)#switchport mode trunk
    3. Add the VLANs to the switch ports where adapters are connected. switch(config-if-Et21/1)#switchport trunk allowed vlan 1,100,200
    4. Show current VLANs (VLAN 1 always exists by default). switch> show vlan VLAN Name Status Ports 1 default active Et21/1, Et23/1 100 VLAN0100 active Et21/1, Et23/1 200 VLAN0200 active Et21/1, Et23/1 Note: If needed, undo settings, preface them with “no”.
      • To delete a VLAN: switch(config)>no vlan 100
      • To remove trunk mode: switch(config-if-Et23/1)#no switchport mode trunk
  17. Run the applications.
    Traffic Stream Interface Example IP Address TC Priority ToS Set Application Priority
    General Traffic Parent 192.168.0.1 0 0 0 Run normally on 192.168.0.1, no ToS options needed. prio 0 and TC 0 are the defaults.
    RDMA Application VLAN100 192.168.100.1 1 2 8

    Run on 192.168.100.1 and set ToS=8 on the application command line.

    Alternatively, if using RoCEv2:

    Set default_​roce_​tos=8 (Ctrl-F this article for syntax). This sets ToS=8 for all RoCEv2 traffic, so you do not need the application command line option.

    LAN Application VLAN 200 192.168.200.1 2 3 0 Run on 192.168.200.1 normally. No command line ToS options needed because egress-qos-map is set to use priority 3.