Performance Index

ID Date Classification
615781 12/05/2024 Public
Document Table of Contents

3rd Generation Intel® Xeon® Scalable Processors

Performance varies by use, configuration and other factors.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure.

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Estimates of SPECrate®2017_​int_​base and SPECrate®2017_​fp_​base based on Intel internal measurements. SPEC®, SPECrate® and SPEC CPU® are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.

Claim Processor Family System Configuration Measurement Measurement Period
[146] ICL-8380 shows performance improvements from 1.7x up to 2.5x on OpenVINO™ 2024.2 versus release 2024.1. Mistral-7b-v0.1: 2.3x; Llama-2-7b-chat: 1.7x; Falcon-7b-instruct: 2.5x; ChatGLM2-6b: 2.3x Intel® Xeon® Platinum 8380 Motherboard 50CYP2SB1U Coyote Pass CPU Intel® Xeon® Gold 8380 CPU @ 2.30GHz Hyper Threading on Turbo Setting on Memory 16 x 16 GB DDR4 3200MHz Operating System Ubuntu* 22.04.4 LTS Kernel version 6.5.0-28-generic BIOS Vendor Intel Corporation BIOS Version SE5C620.86B.01.01.0006.2207150335 BIOS Release 7/15/2022 NUMA nodes 2 Precision INT8/FP32 Number of concurrent inference requests 80 Test Date 6/3/2024 Intel®Xeon® Platinum 8380 INT8 Model name Gain factor for INT8 OV-2024.2 OV-2024.1 metric Input token length output token length ChatGLM2-6B 2.3 21.10 9.02 tokens/sec 1024 128 Falcon-7b-instruct 2.5 18.14 7.35 tokens/sec 32 128 Llama-2-7b-chat 1.7 16.61 10.06 tokens/sec 1024 128 Mistral-7b-v0.1 2.3 16.76 7.21 tokens/sec 1024 128 OV-2024.1: April 2024 OV-2024.2: June 2024
[145] Servers powered by Intel® Xeon® Processors will benefit from up to 2.5x performance boost for CPUs on 2nd token throughput with OpenVINO 2024.2 3rd generation Intel® Xeon® Platinum 8380 CPU Inference Engines: Intel® Xeon® Platinum 8380 Motherboard M50CYP2SB1U Coyote Pass CPU Intel® Xeon® Gold 8380 CPU @ 2.30GHz Hyper Threading on Turbo Setting on Memory 16 x 16 GB DDR4 3200MHz Operating System Ubuntu* 22.04.4 LTS Kernel version 6.5.0-28-generic BIOS Vendor Intel Corporation BIOS Version SE5C620.86B.01.01.0006.2207150335 BIOS Release 7/15/2022 NUMA nodes 2 Precision INT8/FP32 Number of concurrent inference requests 80 Test Date 6/3/2024 Model name Gain factor for INT8 OV-2024.2 OV-2024.1 metric Input token length output token length ChatGLM2-6B 2.3 28.72 9.02 tokens/sec 1024 128 Falcon-7b-instruct 2.5 27.08 7.35 tokens/sec 32 128 Llama-2-7b-chat 1.7 23.14 10.06 tokens/sec 1024 128 Mistral-7b-v0.1 2.3 25.08 7.21 tokens/sec 1024 128 OV-2024.1: April 2024 OV-2024.2: June 2024

[144] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 1.31x in Server-side Java (critical) (Intel Xeon based AWS c6i instance outperforms Graviton2 c6g for 16vCPU, 32vCPU, and 64vCPU).

3rd Generation Intel® Xeon® Platinum processor

1.27x more operations per second in Server-side Java workload (critical) for16vCPU, 1.31x for 32vCPU, and 1.04x for 64vCPU, Workload Server-Side Java, Other SW:OpenJDK17.0.1​,Ubuntu 20.04.4 LTS, Kernel 5.19.0-1025-aws.

New :c6i.4xlarge 32GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge128 GB memory capacity/instance (Xeon).

Baseline :c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge128GB memory capacity/instance (Graviton2).

Server-side Java (Operations per second)

Test by Intel.

New:May19, 2023

Baseline:May 19, 2023

[143] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 1.50x in database computations (Intel Xeon based AWS c6i instance outperforms Graviton2 c6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.44x more operations per second in MongoDB for 4vCPU, 1.50x for 8vCPU, 1.39x for 16vCPU, 1.40x for 32vCPU, and 1.19x for 64vCPU, Workload Intel Mongo Perf 4.4.10, Other SW: Ubuntu 20.04.4 LTS, Kernel 5.13.0-1019-aws, 5.13.0-1025-aws. 5.13.0-1017-aws.

New : c6i.xlarge 7 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 129 GB memory capacity/instance (Xeon).

Baseline : c6g.xlarge 7 GB memory capacity/instance, c6g.2xlarge 16 GB memory capacity/instance, c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge 129 GB memory capacity/instance (Graviton2).

Test by Intel on Apr 12, 2022.

Intel Mongo Perf (Operations per second) Test by Intel.

New: April 12, 2022

Baseline: April 12, 2022

[142] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton3 by up to 1.14x in database computations (Intel Xeon based AWS c6i instance outperforms Graviton3 c7g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.05x more operations per second in MongoDB for 4vCPU, 1.14x for 8vCPU, 1.07x for 16vCPU, 1.10x for 32vCPU, and 1.01x for 64vCPU, Workload Intel Mongo Perf 4.4.10, Other SW: Ubuntu 20.04.4 LTS, Kernel 5.13.0-1019-aws, 5.13.0-1025-aws. 5.13.0-1017-aws.

New : c6i.xlarge 7 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 129 GB memory capacity/instance (Xeon).

Baseline : c7g.xlarge 7 GB memory capacity/instance, c7g.2xlarge 16 GB memory capacity/instance, c7g.4xlarge 32 GB memory capacity/instance, c7g.8xlarge 64 GB memory capacity/instance, c7g.16xlarge 129 GB memory capacity/instance (Graviton3).

Test by Intel on Apr 12, May 25, 2022.

Intel Mongo Perf (Operations per second) Test by Intel.

New: April 12, 2022

Baseline: May 25, 2022

[141] 3rd Gen Intel® Xeon® Scalable processor delivers better scale-up performance than Graviton2 for MySQL new orders per minute (Intel Xeon based AWS m6i instance outperforms Graviton2 m6g for 8vCPU, 16vCPU, 32vCPU, 48vCPU, 64vCPU, 96vCPU, and 128vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.3x the new orders per minute in MySQL for 8vCPU (Graviton2 = 1x relative perf), 2.5x for 16vCPU (Graviton2 = 1.8x 8vCPU relative perf), 4.7x for 32vCPU (Graviton2 = 2.9x 8vCPU relative perf), 6.2x for 48vCPU (Graviton2 = 3.4x 8vCPU relative perf), 7.8x for 64vCPU (Graviton2 = 3.5x 8vCPU relative perf), 11.9x for 96vCPU (M6g does not offer Graviton2 in 96vCPU size), and 14.3x for 128vCPU (M6g does not offer Graviton2 in 128vCPU size), Workload MySQL-8.0.25, Ubuntu 20.04.3 LTS, Kernel 5.11.0-1017-aws.

New : m6i.2xlarge 32 GB memory capacity/instance, m6i.4xlarge 64 GB memory capacity/instance, m6i.8xlarge 128 GB memory capacity/instance, m6i.12xlarge 192 GB memory capacity/instance, m6i.16xlarge 256 memory capacity/instance, m6i.24xlarge 384 GB memory capacity/instance, m6i.32xlarge 512 GB memory capacity/instance (Xeon).

Baseline : m6g.2xlarge 32 GB memory capacity/instance, m6g.4xlarge 64 GB memory capacity/instance, m6g.8xlarge 128 GB memory capacity/instance, m6g.12xlarge 192 GB memory capacity/instance, m6g.16xlarge 256 GB memory capacity/instance (Graviton2). M6g does not offer Graviton2 in 96vCPU or 128vCPU sizes.

Test by Intel on Oct 16, 2021.

HammerDB MySQL (New orders per minute) Test by Intel October 16, 2021.
[140] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2, delivering up to 2.91x more connections per second (Intel Xeon based AWS c6i instance with outperforms Graviton2 c6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 2.87x more connections per second in NGINX for 4vCPU, 2.85 for 8vCPU, 2.90x for 16vCPU, 2.91x for 32vCPU, and 2.76x for 64vCPU, Workload NGINX OpenSSL RSA2K Handshakes v1.24.2.intel-13-g5ae1948f​, Other SW: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0​, ldd (Ubuntu GLIBC 2.31-0ubuntu9.7) 2.31, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1019-aws

New : c6i.xlarge 8 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 128 GB memory capacity/instance (Xeon).

Baseline : c6g.xlarge 8 GB memory capacity/instance, c6g.2xlarge 16 GB memory capacity/instance, c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge 128 GB memory capacity/instance (Graviton2).

Test by Intel on Mar 25-29, Apr 18, 2022.

NGINX OpenSSL (Connections per second) Test by Intel.

New: March 25, March 29, and April 18, 2022

Baseline: March 25, 2022

[139] 3rd Gen Intel® Xeon® Scalable processor with acceleration outperforms Graviton2, delivering up to 8.42x more connections per second (Intel Xeon based AWS c6i instance with QuickAssist Technology (QAT) engine outperforms Graviton2 c6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 7.95x more connections per second in NGINX for 4vCPU, 7.75 for 8vCPU, 7.85x for 16vCPU, 8.06x for 32vCPU, and 8.42x for 64vCPU, Workload NGINX OpenSSL RSA2K Handshakes v1.24.2.intel-13-g5ae1948f​, Other SW: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0​, ldd (Ubuntu GLIBC 2.31-0ubuntu9.7) 2.31, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1019-aws

New : c6i.xlarge 8 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 128 GB memory capacity/instance (Xeon).

Baseline : c6g.xlarge 8 GB memory capacity/instance, c6g.2xlarge 16 GB memory capacity/instance, c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge 128 GB memory capacity/instance (Graviton2).

Test by Intel on Mar 25, Apr 18, 2022.

NGINX OpenSSL (Connections per second) Test by Intel.

New: March 25 and April 18, 2022

Baseline: March 25, 2022

[138] 3rd Gen Intel® Xeon® Scalable processor with acceleration outperforms Graviton3, delivering up to 2.67x more connections per second (Intel Xeon based AWS c6i instance with Intel® QuickAssist Technology (QAT) engine outperforms Graviton3 c7g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 2.45x more connections per second in NGINX for 4vCPU, 2.40x for 8vCPU, 2.47x for 16vCPU, 2.60x for 32vCPU, and 2.67x for 64vCPU, Workload NGINX OpenSSL RSA2K Handshakes v1.24.2.intel-13-g5ae1948f, Other SW: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, ldd (Ubuntu GLIBC 2.31-0ubuntu9.7) 2.31, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1023-aws

New : c6i.xlarge 8 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 128 GB memory capacity/instance (Xeon).

Baseline : c7g.xlarge 8 GB memory capacity/instance, c7g.2xlarge 16 GB memory capacity/instance, c7g.4xlarge 32 GB memory capacity/instance, c7g.8xlarge 64 GB memory capacity/instance, c7g.16xlarge 128 GB memory capacity/instance (Graviton3).

Test by Intel on Mar 25, Apr 18, May 23, 2022.

NGINX OpenSSL (Connections per second) Test by Intel.

New: March 25 and April 18, 2022

Baseline: May 23, 2022

[137] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2, delivering up to 2.93x more connections per second (Intel Xeon based AWS r6i instance with outperforms Graviton2 r6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 2.84x more connections per second in NGINX for 4vCPU, 2.89x for 8vCPU, 2.93x for 16vCPU, 2.92x for 32vCPU, and 2.88x for 64vCPU, Workload NGINX OpenSSL RSA2K Handshakes v1.24.2.intel-13-g5ae1948f​, Other SW: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0​, ldd (Ubuntu GLIBC 2.31-0ubuntu9.7) 2.31, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1019-aws

New : r6i.xlarge 32 GB memory capacity/instance, r6i.2xlarge 64 GB memory capacity/instance, r6i.4xlarge 128 GB memory capacity/instance, r6i.8xlarge 256 GB memory capacity/instance, r6i.16xlarge 512 GB memory capacity/instance (Xeon).

Baseline : r6g.xlarge 32 GB memory capacity/instance, r6g.2xlarge 65 GB memory capacity/instance, r6g.4xlarge 130 GB memory capacity/instance, r.6g.8xlarge 260 GB memory capacity/instance, r6g.16xlarge 521 GB memory capacity/instance (Graviton2).

Test by Intel on Mar 25-29, Apr 18, 2022.

NGINX OpenSSL (Connections per second) Test by Intel.

New: March 29 and April 18, 2022

Baseline: March 25, 2022

[136] 3rd Gen Intel® Xeon® Scalable processor with acceleration outperforms Graviton2, delivering up to 8.45x more connections per second (Intel Xeon based AWS r6i instance with Intel® QuickAssist Technology (QAT) engine outperforms Graviton2 r6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 7.82x more connections per second in NGINX for 4vCPU, 7.83x for 8vCPU, 7.89x for 16vCPU, 8.15x for 32vCPU, and 8.45x for 64vCPU, Workload NGINX OpenSSL RSA2K Handshakes v1.24.2.intel-13-g5ae1948f​, Other SW: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0​, ldd (Ubuntu GLIBC 2.31-0ubuntu9.7) 2.31, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1019-aws.

New : r6i.xlarge 32 GB memory capacity/instance, r6i.2xlarge 64 GB memory capacity/instance, r6i.4xlarge 128 GB memory capacity/instance, r6i.8xlarge 256 GB memory capacity/instance, r6i.16xlarge 512 GB memory capacity/instance (Xeon).

Baseline : r6g.xlarge 32 GB memory capacity/instance, r6g.2xlarge 65 GB memory capacity/instance, r6g.4xlarge 130 GB memory capacity/instance, r.6g.8xlarge 260 GB memory capacity/instance, r6g.16xlarge 521 GB memory capacity/instance (Graviton2).

Test by Intel on Mar 25-29, Apr 18, 2022.

NGINX OpenSSL (Connections per second) Test by Intel.

New: March 29 and April 18, 2022

Baseline: March 25, 2022

[135] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 1.54x in Server-side Java (critical) (Intel Xeon based AWS c6i instance outperforms Graviton2 c6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.10x more operations per second in Server-side Java workload (critical) for 4vCPU, 1.18x for 8vCPU, 1.39x for 16vCPU, 1.47x for 32vCPU, and 1.54x for 64vCPU, Workload Server-Side Java, Other SW: OpenJDK "16.0.1" 2021-04-20​, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1017-aws.

New : c6i.xlarge 8 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 128 GB memory capacity/instance (Xeon).

Baseline : c6g.xlarge 8 GB memory capacity/instance, c6g.2xlarge 16 GB memory capacity/instance, c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge 128 GB memory capacity/instance (Graviton2).

Test by Intel on Nov 30, 2021, Mar 14-22, 2022.

Server-side Java (Operations per second) Test by Intel.

New: November 30, 2021 and March 22, 2022

Baseline: March 14 and 15, 2022

[134] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 1.43x in Server-side Java (max) (Intel Xeon based AWS c6i instance outperforms Graviton2 c6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.43x more operations per second in Server-side Java workload (max) for 4vCPU, 1.14x for 8vCPU, 1.18x for 16vCPU, 1.10x for 32vCPU, and 1.08x for 64vCPU, Workload jbb103, Other SW: OpenJDK "16.0.1" 2021-04-20​, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1017-aws.

New : c6i.xlarge 8 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 128 GB memory capacity/instance (Xeon).

Baseline : c6g.xlarge 8 GB memory capacity/instance, c6g.2xlarge 16 GB memory capacity/instance, c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge 128 GB memory capacity/instance (Graviton2).

Test by Intel on Nov 30, 2021, Mar 14-22, 2022.

Server-side Java (Operations per second) Test by Intel.

New: November 30, 2021 and March 22, 2022

Baseline: March 14 and 15, 2022

[133] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton3 by up to 1.12x in Server-side Java (critical) (Intel Xeon based AWS c6i instance outperforms Graviton3 c7g for 32vCPU and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.12x more operations per second in Server-side Java workload (critical) for 32vCPU, 1.01x for 64vCPU, Workload Server-Side Java, Other SW: OpenJDK "16.0.1" 2021-04-20​, Ubuntu 20.04.4 LTS, Kernel 5.13.0-1017-aws, 5.13.0-1025-aws.

New : c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 128 GB memory capacity/instance (Xeon).

Baseline : c7g.8xlarge 64 GB memory capacity/instance, c7g.16xlarge 129 GB memory capacity/instance (Graviton3).

Test by Intel on Nov 30, 2021, Mar 22, May 24-25, 2022.

Server-side Java (Operations per second) Test by Intel.

New: November 30, 2021 and March 22, 2022

Baseline: May 24 and 25, 2022

[132] 3rd Gen Intel® Xeon® Scalable processor with acceleration outperforms Graviton2, delivering up to 1.50x more transactions per second in WordPress (Intel Xeon based AWS c6i instance with Crypto NI outperforms Graviton2 c6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.42x more transactions per second in WordPress for 4vCPU, 1.41x for 8vCPU, 1.36x for 16vCPU, 1.50x for 32vCPU, 1.33x for 64vCPU, Workload WordPress 5.9.3 Single-tier with PHP 8.0.18, HTTPS TLSv1.3, Other SW Ubuntu 20.04.4 LTS, Crypto NI, Kernel 5.13.0-1025-aws

New : c6i.xlarge 8 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 129 GB memory capacity/instance (Xeon).

Baseline : c6g.xlarge 8 GB memory capacity/instance, c6g.2xlarge 16 GB memory capacity/instance, c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge 129 GB memory capacity/instance (Graviton2).

Test by Intel on May 27-28, 2022.

WordPress (Transactions per second) Test by Intel.

New: May 27 and 28, 2022

Baseline: May 27 and 28, 2022

[131] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2, delivering up to 1.34x more transactions per second in WordPress (Intel Xeon based AWS c6i instance outperforms Graviton2 c6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor 1.22x more transactions per second in WordPress for 4vCPU, 1.19x for 8vCPU, 1.20x for 16vCPU, 1.34x for 32vCPU, 1.19x for 64vCPU, Workload WordPress 5.9.3 Single-tier with PHP 8.0.18, HTTPS TLSv1.3, Other SW Ubuntu 20.04.4 LTS, Kernel 5.13.0-1025-aws

New : c6i.xlarge 8 GB memory capacity/instance, c6i.2xlarge 16 GB memory capacity/instance, c6i.4xlarge 32 GB memory capacity/instance, c6i.8xlarge 64 GB memory capacity/instance, c6i.16xlarge 129 GB memory capacity/instance (Xeon).

Baseline : c6g.xlarge 8 GB memory capacity/instance, c6g.2xlarge 16 GB memory capacity/instance, c6g.4xlarge 32 GB memory capacity/instance, c6g.8xlarge 64 GB memory capacity/instance, c6g.16xlarge 129 GB memory capacity/instance (Graviton2).

Test by Intel on May 26-28, 2022.

Transactions per second Test by Intel.

New: May 26 and 28, 2022

Baseline: May 27 and 28, 2022

[130] Up to 26x better performance per watt vs. AMD Milan for object detection 3rd Generation Intel® Xeon® Platinum processor

Up to 26x better performance per watt vs. AMD Milan for object detection (SSD-ResNet34)

New: 1-node Supermicro 220U-TNR, 2x Intel® Xeon® Platinum 8358 Scalable processor (32C), HT on, Turbo on, SNC off, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: x2a0, Intel X540-T2 , 1 SATA SSD, 1x P5510 NVMe, SSD-RESTNET34, Intel optimized Tensor Flow v2.7, Ubuntu 20.04LTS (5.4.0-120-generic).

Baseline: 1-node Supermicro AS-2124US-TNRP, 2x AMD EPYC Processor (32C 7543), SMT On, Boost ON, NPS=1, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: 0xa00111d, Intel X540-T2 , 1 SATA SSD, 1x P5510 NVMe, SSD-RESNET34, TF_​v2.7_​ZenDNN_​v3.2, Ubuntu 20.04LTS (5.4.0-120-generic). Test by Intel on April 2022.

SSDResNet34 Batch Size = 1, INT8 Test by Intel on April 2022
[129] Up to 1.4x better performance per watt vs. AMD Milan for life science applications (geomean of LINPACK, NAMD, LAMMPS with Intel AVX-512) 3rd Generation Intel® Xeon® Gold processor

Up to 1.4x better performance per watt vs. AMD Milan for life science applications (geomean of LINPACK, NAMD, LAMMPS with Intel® AVX-512)

LINPACK:

New: 1-node Supermicro 220U-TNR, 2x Intel® Xeon® Platinum 8358 Scalable processor (32C), HT on (1T/core), Turbo on, SNC on, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: x2a0, Intel X540-T2 , 1 SATA SSD, 1x P5510 NVMe, App Version: The Intel Distribution for LINPACK Benchmark; Build notes: Tools: Intel MPI 2019u7; threads/core: 1; Turbo: used; Build: build script from Intel Distribution for LINPACK package; 1 rank per NUMA node: 1 rank per socket, Ubuntu 20.04.3 (5.4.0-92-generic).

Baseline: 1-node Supermicro AS-2124US-TNRP, 2x AMD EPYC Processor (24C 7443, 32C 7543), SMT On (1T/core), Boost ON, NPS=4, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: 0xa00111d, Intel X540-T2 , 1 SATA SSD, 1x P5510 NVMe, App Version: AMD official HPL 2.3 MT version with BLIS 2.1; Build notes: Tools: hpc-x 2.7.0; threads/core: 1; Turbo: used; Build: pre-built binary (gcc built) from https://developer.amd.com/amd-aocl/blas-library/; 1 rank per L3 cache, 4 threads per rank, Ubuntu 20.04.3 (5.4.0-92generic).

NAMD:

New: 1-node Supermicro 220U-TNR, 2x Intel® Xeon® Platinum 8358 Scalable processor (32C), HT on, Turbo on, SNC on, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: x2a0, Intel X540-T2 , 1 SATA SSD, 1x P5510 NVMe, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL , Intel C Compiler 2020u4, Intel MPI 2019u8, Intel Threading Building Blocks 2020u4; Build knobs: -ip -fp-model fast=2 -no-prec-div -qoverride-limits -qopenmp-simd -O3 -xCORE-AVX512 -qopt-zmm-usage=high, Red Hat Enterprise Linux 8.5 (4.18.0-348.12.2.el8_​5.x86_​64).

Baseline: 1-node Supermicro AS-2124US-TNRP, 2x AMD EPYC Processor (24C 7443, 32C 7543), SMT On, Boost ON, NPS=4, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: 0xa00111d, Intel X540-T2 , 1 SATA SSD, 2x P5510 NVMe, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL , AOCC 2.2.0, gcc 9.3.0, Intel MPI 2019u8; Build knobs: -O3 -fomit-frame-pointer -march=znver1 -ffast-math, Red Hat Enterprise Linux 8.5 (4.18.0-348.12.2.el8_​5.x86_​64).

LAMMPS:

New: 1-node Supermicro 220U-TNR, 2x Intel® Xeon® Platinum 8358 Scalable processor (32C), HT on, Turbo on, SNC on, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: x2a0, Intel X540-T2 , 1 SATA SSD, 1x P5510 NVMe, App Version: v2020-10-29; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4, Intel MPI 2019u8; Build knobs: -O3 -ip -xCORE-AVX512 -qopt-zmm-usage=high, Red Hat Enterprise Linux 8.5 (4.18.0-348.12.2.el8_​5.x86_​64)

Baseline: 1-node Supermicro AS-2124US-TNRP, 2x AMD EPYC Processor (32C 7543), SMT On, Boost ON, NPS=4, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: 0xa00111d, Intel X540-T2 , 1 SATA SSD, 2x P5510 NVMe, App Version: v2020-10-29; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4, Intel MPI 2019u8; Build knobs: -O3 -ip -march=core-avx2, Red Hat Enterprise Linux 8.5 (4.18.0-348.12.2.el8_​5.x86_​64).

Tested by Intel on January-February 2022.

Geomean of LINPACK, NAMD, LAMMPS Test by Intel on January-February 2022
[128] Up to 2x better performance per watt vs. AMD Milan for networks (key exchange NGINX RSA2K) 3rd Generation Intel® Xeon® Gold processor

Up to 2x better performance per watt vs. AMD Milan for networks (key exchange NGINX RSA2K)

New: 1-node Supermicro X12DPG-QT6, 1x Intel® Xeon® Gold 6338N Scalable processor (32C), HT off, Turbo off, SNC off, Total Memory: 128 GB (8 slots/ 16GB/ 3200), ucode: x332, 1x Intel E810 2x100G, 1 SATA SSD, Ubuntu 20.04LTS (5.4.0-67-generic), NGINX 1.20.1, OpenSSL 1.1.1f, RSA2K, TLS Handshake

Baseline: 1-node Supermicro H12DSi-N6, 1x AMD EPYC Processor (32C 7513), SMT off, Boost off, NPS=2 , Total Memory: 128 GB (8 slots/ 16GB/ 3200), ucode:0xa001143, 1x Intel E810 2x100G, 1 SATA SSD, Ubuntu 20.04LTS (5.4.0-67-generic), NGINX 1.20.1, OpenSSL 1.1.1f, RSA2K, TLS Handshake. Tested by Intel on May 2022.

NGINX Web Secure Key Exchange Test by Intel on May 2022
[127] Up to 1.07x better performance per watt vs. AMD Milan for new orders per minute (MySQL) 3rd Generation Intel® Xeon® Platinum processor

Up to 1.07x better performance per watt vs. AMD Milan for new orders per minute (MySQL)

New: 1-node Supermicro 220U-TNR, 2x Intel® Xeon® Platinum 8358 Scalable processor (32C), HT on, Turbo on, SNC off, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: x2a0, Intel X540-T2 , 1 SATA SSD, 2x P5510 NVMe, HammerDB 4.3, MySQL 8.0.27, Ubuntu 20.04LTS (5.4.0-120-generic).

Baseline: 1-node Supermicro AS-2124US-TNRP, 2x AMD EPYC Processor (32C 7543), SMT On, Boost ON, NPS=1, Total Memory: 1 TB (16 slots/ 64GB/ 3200), ucode: 0xa00111d, Intel X540-T2 , 1 SATA SSD, 2x P5510 NVMe, HammerDB 4.3, MySQL 8.0.27, Ubuntu 20.04LTS (5.4.0-120-generic). Tested by Intel Feb-Mar 2022.

HammerDB w/MySQL Test by Intel Feb-Mar 2022
[126] Up to 1.05x better performance per watt vs. AMD Milan for authentication and encryption (protocol security: IPSec) 3rd Generation Intel® Xeon® Gold processor Up to 1.05x better performance per watt vs. AMD Milan for authentication and encryption (protocol security: IPSec). New: 1-node Supermicro X12DPG-QT6, 1x Intel® Xeon® Gold 6338N Scalable processor (32C), HT off, Turbo off, SNC off, Total Memory: 128 GB (8 slots/ 16GB/ 3200), ucode: x332, 3x Intel E810 2x100G, 1 SATA SSD, Ubuntu 20.04LTS (5.4.0-67-generic), VPP IPSec 21.01.

Baseline: 1-node Supermicro H12DSi-N6, 1x AMD EPYC Processor (32C 7513), SMT off, Boost off, NPS=2 , Total Memory: 128 GB (8 slots/ 16GB/ 3200), ucode: 0xa001143, 1x Intel E810 2x100G, 2x E810 1x100G, 1 SATA SSD, Ubuntu 20.04LTS (5.4.0-67-generic), VPP IPSec 21.01. Tested by Intel May 2022.

Crypto Test by Intel May 2022
[125] 1.46x average performance gains with 3rd Gen Intel Xeon Platinum 8380 processor vs. prior generation 3rd Generation Intel® Xeon® Platinum processor 1.46x average performance gain - Ice Lake vs. Cascade Lake: Geomean of 1.5x SPECrate2017_​int_​base (est), 1.52x SPECrate2017_​fp_​base (est), 1.47x STREAM Triad, 1.38x Intel Distribution of LINPACK. New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on (SPECcpu2017), off (others), Turbo on, Ubuntu 20.04, 5.4.0-66-generic, 1x S4610 SSD 960G, SPECcpu2017 (est) v1.1.0, STREAM Triad, LINPACK, ic19.1u2, MPI: Version 2019u9; MKL:2020.4.17, test by Intel on 3/15/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on (SPECcpu2017), off (others), Turbo on, Ubuntu 20.04, 5.4.0-62-generic, 1x S3520 SSD 480G, SPECcpu2017 (est) v1.1.0, STREAM Triad, Intel distribution of LINPACK, ic19.1u2, MPI: Version 2019u9; MKL:2020.4.17, test by Intel on 2/4/2021. Geomean of

Integer throughput/Floating Point throughput/STREAM/LINPACK

New: March 15, 2021

Baseline: Feb 04, 2021

[124] 1.27X average performance gains on Indexing Intensive-Medium Search Splunk workload with 3rd Gen Intel Xeon Platinum 8360Y processor vs. prior generation 3rd Generation Intel® Xeon® Platinum processor 1.27x average performance gain on Indexing intensive - Medium Search Splunk workload - Ice Lake vs. Cascade Lake: New Config: 5-node, 2x Intel Xeon Platinum 8360Y processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, ucode 0x8d05a260, HT on, Turbo on, CentOS 7.9.2009, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel X540-T2 10Gb, Splunk Perf Kit 0.4.21, Splunk 8.1.3, Splunk Operator 1.0.0, Minio Operator v2.0.9, Splunk Cluster - 3 Indexers, 2 search heads, test by Intel on 04/16/2021.,

Baseline: 5-node, 2x Intel Xeon Platinum 8260L processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933[2933]) total DDR4 memory, ucode 0x4003006, HT on, Turbo on, CentOS 7.9.2009, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel 10GbE integrated X722 , Splunk Perf Kit 0.4.21, Splunk 8.13, Splunk Operator 1.0.0, Minio Operator v2.0.9, Splunk Cluster - 3 Indexers, 2 search heads, test by Intel on 04/16/2021.

Indexing intensive - Medium Search Splunk workload Test by Intel on April 16, 2021
[123] 1.45x higher INT8 real-time inference throughput with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation

1.74x higher INT8 batch inference throughput on BERT-Large SQuAD with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation

3rd Generation Intel® Xeon® Platinum processor BERT-Large SQuAD: 1.45x higher INT8 real-time inference throughput & 1.74x higher INT8 batch inference throughput on Ice Lake vs. prior generation Cascade Lake Platinum 8380: New:1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X261, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, BERT - Large SQuAD, gcc-9.3.0, oneDNN 1.6.4, BS=1,128 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 3/12/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-48-generic, 1x Samsung_​SSD_​860, Intel SSDPE2KX040T8, BERT - Large SQuAD, gcc-9.3.0, oneDNN 1.6.4, BS=1,128 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 2/17/2021. BERT- Large SQuAD New: March 12, 2021

Baseline: Feb 17, 2021

[122] 1.59x higher INT8 real-time inference throughput with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation.

1.66x higher INT8 batch inference throughput on MobileNet-v1 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation

3rd Generation Intel® Xeon® Platinum processor MobileNet-v1: 1.59x higher INT8 real-time inference throughput & 1.66x higher INT8 batch inference throughput on Ice Lake vs. prior generation Cascade Lake Platinum. New: 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X261, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,56 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 3/12/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-48-generic, 1x Samsung_​SSD_​860, Intel SSDPE2KX040T8,, MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,56 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 2/17/2021. MobileNet-v1 New: March 12, 2021

Baseline: Feb 17, 2021

[121] 1.52x higher INT8 real-time inference throughput with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation

1.56x higher INT8 batch inference throughput on ResNet50 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation

3rd Generation Intel® Xeon® Platinum processor ResNet-50 v1.5: 1.52x higher INT8 real-time inference throughput & 1.56x higher INT8 batch inference throughput on Ice Lake vs. prior generation Cascade Lake. New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X261, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, ResNet-50 v1.5, gcc-9.3.0, oneDNN 1.6.4, BS=1,128 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 3/12/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-48-generic, 1x Samsung_​SSD_​860, Intel SSDPE2KX040T8, ResNet-50 v1.5, gcc-9.3.0, oneDNN 1.6.4, BS=1,128 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 2/17/2021. ResNet50 v1.5 New: March 12, 2021

Baseline: Feb 17, 2021

[120] 1.39x higher INT8 real time inference throughput on SSD-ResNet34 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation 3rd Generation Intel® Xeon® Platinum processor SSD-ResNet34: 1.39x higher INT8 batch inference throughput on Ice Lake vs. prior generation Cascade Lake. New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode, X261 HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, SSD-ResNet34, gcc-9.3.0, oneDNN 1.6.4, BS=1 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 3/12/2021.Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-48-generic, 1x Samsung_​SSD_​860, Intel SSDPE2KX040T8, SSD-ResNet34, gcc-9.3.0, oneDNN 1.6.4, BS=1 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 2/17/2021. SSD-ResNet34 New: March 12, 2021

Baseline: Feb 17, 2021

[119] 1.35x higher INT8 real-time inference throughput & with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation

1.42x higher INT8 batch inference throughput on SSD-MobileNet-v1 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. prior generation

3rd Generation Intel® Xeon® Platinum processor SSD-MobileNet-v1: 1.35x higher INT8 real-time inference throughput & 1.42x higher INT8 batch inference throughput on Ice Lake vs. prior generation Cascade Lake. New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X261, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, SSD-MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,448 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 3/12/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-48-generic, 1x Samsung_​SSD_​860, Intel SSDPE2KX040T8,, SSD-MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,448 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 2/17/2021. SSD-MobileNet-v1 New: March 12, 2021

Baseline: Feb 17, 2021

[118] Ice Lake customers who utilize Intel-optimization for Tensor Flow and Intel DL Boost (VNNI ) will gain over 11x higher batch AI inference performance on ResNet50 compared with stock Cascade Lake FP32 configuration 3rd Generation Intel® Xeon® Platinum processor 11x higher batch AI inference performance with Intel-optimized TensorFlow vs. stock Cascade Lake FP32 configuration New: 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X261, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, ResNet-50 v1.5, gcc-9.3.0, oneDNN 1.6.4, BS=128 FP32,INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, Unoptimized model: TensorFlow- 2.4.1, Modelzoo:https://github.com/IntelAI/models -b master, test by Intel on 3/12/2021. Baseline: 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-48-generic, 1x Samsung_​SSD_​860, Intel SSDPE2KX040T8, ResNet-50 v1.5, gcc-9.3.0, oneDNN 1.6.4, BS=128 FP32,INT8, Optimized model: TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, Unoptimized model: TensorFlow- 2.4.1, Modelzoo:https://github.com/IntelAI/models -b master, test by Intel on 2/17/2021. ResNet50 v1.5 - opt/unopt New: March 12, 2021

Baseline: Feb 17, 2021

[117] Up to 100x gains due to software improvement on SciKit learn workloads: linear regression fit, SVC inference, kdtree_​knn inference and elastic-net fit on Ice Lake with Daal4py optimizations compared with stock Scikit-learn 3rd Generation Intel® Xeon® Platinum processor Up to 100x gains due to software improvement on SciKit learn workloads: linear regression fit, SVC inference, kdtree_​knn inference and elastic-net fit on Ice Lake with Daal4py optimizations compared with stock Scikit-learn New: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-64-generic, 2x Intel_​SSDSC2KG96, Unoptimized: Python: Python 3.7.9, SciKit-Learn: Sklearn 0.24.1, Optimized: oneDAL: Daal4py 2021.2, Benchmarks: https://github.com/IntelPython/scikit-learn_bench, tested by Intel, and results as of March 2021 Scikit-learn Software optimizations New: March 12, 2021
[116] Intel Xeon Scalable offers competitive performance vs. NVIDIA’s latest GPU (~2 sec) without the likely added cost and complexity 3rd Generation Intel® Xeon® Platinum processor NVIDIA A100 is 1.9 seconds faster than 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost on Census end-to-end machine Learning performance. Hardware configuration for Intel® Xeon® Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 4x INTEL SSDSC2KG019T8, tested by Intel, and results as of March 2021. Hardware configuration for NVIDIA A100: 1-node, 2-socket AMD EPYC 7742 (64C) with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x8301034, HT on, Turbo on, Ubuntu 18.04.5 LTS, 5.4.0-42-generic,NVIDIA A100 (DGX-A100) , 1.92TB M.2 NVMe, 1.92TB M.2 NVMe RAID. Software configuration for Intel® Xeon® Platinum 8380: Python 3.7.9, Pre-processing Modin 0.8.3, Omniscidbe v5.4.1, Intel Optimized Scikit-Learn 0.24.1, OneDAL Intel® Extension for Scikit-Learn 2021.2, XGBoost 1.3.3. Software configuration for NVIDIA A100: Python 3.7.9, Pre-processing CuDF 0.17, Intel Optimized Scikit-Learn Sklearn 0.24, OneDAL CuML 0.17, XGBoost 1.3.0dev.rapidsai0.17, NVIDIA RAPIDS 0.17, CUDA Toolkit CUDA 11.0.221. Dataset source : IPUMS USA: https://usa.ipums.org/usa/, Dataset (size, shape) : (21721922, 45), Datatypes int64 and float64, Dataset size on disk 362.07 MB, Dataset format .csv.gz, Accuracy metric MSE: mean squared error; COD: coefficient of determination, tested by Intel, and results as of March 2021. Transactions per second Tested by Intel, March 2021
[115] Complete graph analytics computations used in search, social networks, recommender systems, bioinformatics, and fraud detection 2X faster on average when using 3rd Gen Intel Xeon Scalable processors with Intel Optane persistent memory 200 series. 3rd Generation Intel® Xeon® Platinum processor and Intel® Optane™ persistent memory 200 series . Katana Graph: New: Platinum 8368: 1-node, 2x Intel Xeon Platinum 8368 processor on Coyote Pass with 1024 GB (16 slots/ 64GB/ 3200) total DDR4 memory, 8192 GB (16 slots/ 512 GB/ 3200) total PMem, ucode 0x261, HT off, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, 1x Intel 480GB SSD, 2x Intel 2TB SSD, 1x Intel XC710, Galoishttps://github.com/IntelligentSoftwareSystems/Galois, GCC 9.3.0, Algorithms: Betweenness Centrality, Breadth First Search, Connected Components, test by Intel on 3/15/2021. Baseline: Platinum 8260: 1-node, 2x Intel Xeon Platinum 8260 processor on Wolf Pass with 768 GB (12 slots/ 64GB/ 2666) total DDR4 memory, 6144 GB (12 slots/ 512 GB/ 2666) total PMem, ucode 0x5003003, HT off, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, 1x Intel 480GB SSD, 2x Intel 2TB SSD, 1x Intel XC710, Galois https://github.com/IntelligentSoftwareSystems/Galois, GCC 9.3.0, Algorithms: Betweenness Centrality, Breadth First Search, Connected Components, test by Intel on 3/15/2021. Katana Graph New: March 15, 2021

Baseline: March 15, 2021

[114]

OpenVINO FP32 model running on Intel® Xeon® Platinum 8380 CPU @ 2.30GHz gives 2.2X latency improvement over baseline on Intel® Xeon® Platinum 8280 CPU @ 2.70GHz.

OpenVINO FP32 model running on Intel® Xeon® Platinum 8380 CPU @ 2.30GHz gives 1.6X throughput improvement over baseline on Intel® Xeon® Platinum 8280 CPU @ 2.70GHz.

3rd Generation Intel® Xeon® Platinum processor

Custom Deep Learning based Encoder Decoder model: Optimized:New: Tested by Intel as of 03/25/2021. 2 socket Intel® Xeon® Platinum 8380 Processor, 40 cores per socket, Ucode 0xd000270, HT On, Turbo On, OS Ubuntu 18.04.5 LTS, Kernel 5.4.0-65-generic, Total Memory 256GB, BIOS SE5C6200.86B.0022.D08.2103221623, Framework: Intel OpenVINO toolkit 2021.2.185, Python 3.6.13, Intel-openmp 2021.1.2, Numpy 1.19.5, GCC 7.5.0, model – custom Autoencoder

Baseline: Tested by Intel as of 03/25/2021. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores per socket, Ucode 0x5003003, HT On, Turbo On, OS Ubuntu 18.04.5 LTS, Kernel 5.4.0-65-generic, Total Memory 384GB, BIOS SE5C620.86B.02.01.0011.032620200659, Framework: Intel OpenVINO toolkit 2021.2.185, Python 3.6.13, Intel-openmp 2021.1.2, Numpy 1.19.5, GCC 7.5.0, model – custom Autoencoder

custom Deep Learning based Encoder Decoder model  -Fujitsu New: March 25, 2021

Baseline: March 25, 2021

[113] 1.4X improvement in hyperparameter tuning during training with Intel® Xeon® Platinum 8380 CPU vs. Intel® Xeon® Platinum 8280 CPU. 3rd Generation Intel® Xeon® Platinum processor

Predictive Analytics using XGBoost

Optimized:New: Tested by Intel as of 02/24/2021. 2 socket Intel® Xeon® Platinum 8380 Processor, 40 cores per socket, Ucode 0x8d05a260, HT On, Turbo On, OS Ubuntu 18.04.5 LTS, Kernel 5.4.0-65-generic, Total Memory 256GB, BIOS SE5C6200.86B.3021.D40.2103160200, Framework: XGBoost 1.3.3, Intel-openmp 2020.2, Intel MKL 2020.2, Numpy 1.19.2 (Intel), Pandas 1.2.1 (Intel), scikit-learn 0.23.2 (Intel), Anaconda Python 3.7.9, GCC 7.5.0, model trained – GBT Classifier, custom train data

Baseline: Tested by Intel as of 02/24/2021. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores per socket, Ucode 0x5003003, HT On, Turbo On, OS Ubuntu 18.04.5 LTS, Kernel 5.4.0-65-generic, Total Memory 384GB, BIOS SE5C620.86B.02.01.0011.032620200659, Framework: XGBoost 1.3.3, Intel-openmp 2020.2, Intel MKL 2020.2, Numpy 1.19.2 (Intel), Pandas 1.2.1 (Intel), scikit-learn 0.23.2 (Intel), Anaconda Python 3.7.9, GCC 7.5.0, model trained – GBT Classifier, custom train data

Nordigen Predictive Analytics using XGBoost New: Feb 25, 2021

Baseline: Feb 25, 2021

[112] 1.61X average performance gains on Search Intensive-Medium Indexing Splunk workload with 3rd Gen Intel Xeon Platinum 8360Y processor vs. prior generation 3rd Generation Intel® Xeon® Platinum processor 1.61x average performance gain on Search intensive - Medium Indexing Splunk workload - Ice Lake vs. Cascade Lake: New Config: 5-node, 2x Intel Xeon Platinum 8360Y processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, ucode 0x8d05a260, HT on, Turbo on, CentOS 7.9.2009, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel X540-T2 10Gb, Splunk Perf Kit 0.4.21, Splunk 8.1.3, Splunk Operator 1.0.0, Minio Operator v2.0.9, Splunk Cluster - 3 Indexers, 2 search heads, test by Intel on 04/16/2021.,

Baseline: 5-node, 2x Intel Xeon Platinum 8260L processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933[2933]) total DDR4 memory, ucode 0x4003006, HT on, Turbo on, CentOS 7.9.2009, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel 10GbE integrated X722 , Splunk Perf Kit 0.4.21, Splunk 8.13, Splunk Operator 1.0.0, Minio Operator v2.0.9, Splunk Cluster - 3 Indexers, 2 search heads, test by Intel on 04/16/2021.

Search intensive - Medium Indexing Splunk workload Test by Intel on 04/16/2021
[111] 1.58X average performance gains on Splunk Indexing workload with 3rd Gen Intel Xeon Platinum 8360Y processor vs. prior generation 3rd Generation Intel® Xeon® Platinum processor 1.58x average performance gain on Splunk Indexing workload with containers - Ice Lake vs. Cascade Lake: New Config: 5-node, 2x Intel Xeon Platinum 8360Y processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, ucode 0x8d05a260, HT on, Turbo on, CentOS 7.9.2009, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel X540-T2 10Gb, Splunk Perf Kit 0.4.21, Splunk 8.1.3, Splunk Operator 1.0.0, Minio Operator v2.0.9, Splunk Cluster - 3 Indexers, 2 search heads, test by Intel on 04/16/2021.,

Baseline: 5-node, 2x Intel Xeon Platinum 8260L processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933[2933]) total DDR4 memory, ucode 0x4003006, HT on, Turbo on, CentOS 7.9.2009, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel 10GbE integrated X722 , Splunk Perf Kit 0.4.21, Splunk 8.13, Splunk Operator 1.0.0, Minio Operator v2.0.9, Splunk Cluster - 3 Indexers, 2 search heads, test by Intel on 04/16/2021.

Splunk Indexing workload with containers Test by Intel on 04/16/2021
[110] Up to 4.8X higher Splunk indexing performance scaling containers and up to 5.1X better Splunk search performance scaling containers 3rd Generation Intel® Xeon® Platinum processor Up to 4.8X higher Splunk indexing performance scaling containers and up to 5.1X better Splunk search performance scaling containers. Ice lake Config: 5-node, 2x Intel Xeon Platinum 8360Y processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, ucode 0x8d05a260, HT on, Turbo on, CentOS 7.9.2009, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel X540-T2 10Gb, Splunk Perf Kit 0.4.21, Splunk 8.1.3, Splunk Operator 1.0.0, Minio Operator v2.0.9, test by Intel on 04/16/2021. Indexing Performance and Search Performance with containers Test by Intel on 04/16/2021
[109] 1.48X average performance gains on Splunk Parallel Search workload with 3rd Gen Intel Xeon Platinum 8380 processor vs. prior generation 3rd Generation Intel® Xeon® Platinum processor

1.48X average performance gains on Splunk Parallel Search workload - Ice Lake vs. Cascade Lake: New Config: 5-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, ucode 0x8d05a260, HT on, Turbo on, Ubuntu 18.04.5, 5.4.0-65-generic, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel X540-T2 10Gb, Dogfood 2007 dataset 103gb, Splunk 8.1.3, batch_​search_​max_​pipeline=16, test by Intel on 04/16/2021.

Baseline: 5-node, 2x Intel Xeon Platinum 6258R processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933[2933]) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 18.04.5, 5.4.0-65-generic, 1x Intel S4510 SSD 1.92TB, Intel P4510 SSD 2TB, 1x Intel 10GbE integrated X722 , Dogfood 2007 dataset 103gb, Splunk 8.1.3, batch_​search_​max_​pipeline=16, test by Intel on 04/22/2021.

Splunk Parallel Search workload Test by Intel on 04/22/2021
[108] Up to 1.53x higher HPC performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.53x higher FSI Kernel performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.60x higher Life and Material Science performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.41x higher HPCG performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.38x higher HPL performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.47x higher STREAM Triad Performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.58x higher WRF performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.28x higher Binomial Options performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.67x higher Black Scholes performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.70x higher Monte Carlo performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.51x higher OpenFOAM performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.64x higher GROMACS performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.60x higher LAMMPS performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.57x higher NAMD performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

Up to 1.61x higher RELION Plasmodium Ribosome performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen

3rd Generation Intel® Xeon® Platinum processor

New: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x055261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96 . Tested by Intel between March 12, 2021 and March 29, 2021.

Baseline: 8280: 1-node, 2x Intel Xeon Platinum 8280 (28C/2.7GHz, 205W TDP) processor on Intel Software Development Platform with 192GB (12 slots/ 16GB/ 2933) total DDR4 memory, ucode 0x4002f01, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG48 . Tested by Intel between February 1, 2021 to February 20, 2021.

1.53x higher HPC performance (geomean HPL, HPCG, STREAM Triad, WRF, Binomial Options, Black Scholes, Monte Carlo, OpenFOAM, GROMACS, LAMMPS, NAMD, RELION)

1.53x higher FSI Kernel performance (geomean Binomial Options, Black Scholes, Monte Carlo)

1.60x higher Life and Material Science performance (geomean GROMACS, LAMMPS, NAMD, RELION)

1.41x higher HPCG performance App Version: 2019u5 MKL; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512

1.38x higher HPL performance App Version: The Intel Distribution for LINPACK Benchmark 2019u5; Build notes: threads/core: 1; Turbo: used; Build: build script from Intel Distribution for LINPACK package; 1 rank per NUMA node: 1 rank per socket

1.47x higher STREAM Triad Performance App Version: McCalpin_​STREAM_​OMP-version; Build notes: Tools: Intel C Compiler 2019u5; threads/core: 1; Turbo: used; BIOS settings: HT=off Turbo=On SNC=On

1.58x higher WRF performance (geomean Conus-12km, Conus-2.5km, NWSC-3-NA-3km) App Version: 4.2.2; Build notes: Intel Fortran Compiler 2020u4, Intel MPI 2020u4; threads/core: 1; Turbo: used; Build knobs:-ip -w -O3 -xCORE-AVX2 -vec-threshold0 -ftz -align array64byte -qno-opt-dynamic-align -fno-alias $(FORMAT_​FREE) $(BYTESWAPIO) -fp-model fast=2 -fimf-use-svml=true -inline-max-size=12000 -inline-max-total-size=30000

1.28x higher Binomial Options performance App Version: v1.0; Build notes: Tools: Intel C Compiler 2020u4, Intel Threading Building Blocks ; threads/core: 2; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-domain-exclusion=31 -fimf-accuracy-bits=11 -no-prec-div -no-prec-sqrt

1.67x higher Black Scholes performance App Version: v1.3; Build notes: Tools: Intel MKL, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 1; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt -fimf-domain-exclusion=31

1.70x higher Monte Carlo performance App Version: v1.1; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 1; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt

1.51x higher OpenFOAM performance (geomean 20M_​cell_​motorbike, 42M_​cell_​motorbike) App Version: v8; Build notes: Tools: Intel FORTRAN Compiler 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512

OpenFOAM Disclaimer: This offering is not approved or endorsed by OpenCFD Limited, producer and distributor of the OpenFOAM software via www.openfoam.com, and owner of the OPENFOAM® and OpenCFD® trademark

1.64x higher GROMACS performance (geomean ion_​channel_​pme, lignocellulose_​rf, water_​pme, water_​rf) App Version: v2020.5_​SP; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512

1.60x higher LAMMPS performance (geomean Polyethylene, Stillinger-Weber, Tersoff, Water) App Version: v2020-10-29; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512 -qopt-zmm-usage=high

1.57x higher NAMD performance (geomean Apoa1, f1atpase, STMV) App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL, Intel C Compiler 2020u4, Intel MPI 2019u8, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -ip -fp-model fast=2 -no-prec-div -qoverride-limits -qopenmp-simd -O3 -xCORE-AVX512 -qopt-zmm-usage=high

1.61x higher RELION Plasmodium Ribosome performance App Version: 3_​1_​1; Build notes: Tools: Intel C Compiler 2020u4, Intel MPI 2019u9; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -g -debug inline-debug-info -xCOMMON-AVX512 -qopt-report=5 –restrict

HPC, HPCG, HPL, STREAM, WRF, FSI, Binomial Options, Black Scholes, Monte Carlo, OpenFOAM, Life and Material Science, GROMACS, LAMMPS, NAMD, RELION New: March 2021

Baseline: February 2021

[107]

1.54x higher NAMD STMV performance using the AVX Tile Algorithm on Platinum 8380 vs. without AVX Tiles

2.43x higher NAMD STMV performance using the AVX Tiles Alogrithm on Platinum 8380 vs. prior gen without AVX Tiles

1.57x higher NAMD STMV performance on Platinum 8380 vs. prior gen without AVX Tiles

3rd Generation Intel® Xeon® Platinum processor

New: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, Tested by Intel between March 12, 2021 and March 29, 2021.

Baseline: 8280: 1-node, 2x Intel Xeon Platinum 8280 (28C/2.7GHz, 205W TDP) processor on Intel Software Development Platform with 192GB (12 slots/ 16GB/ 2933) total DDR4 memory, ucode 0x4002f01, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG48 .Tested by Intel between February 1, 2021 to February 20, 2021

1.54x higher performance on NAMD STMV from using the AVX Tiles Algorithm vs. without AVX Tiles

2.43x higher performance on NAMD STMV with AVX Tiles Algorithm vs. prior gen without AVX Tile

1.57x higher performance on NAMD STMV without AVX Tiles Algorithm vs. prior gen

NAMD with Tiles: App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL, Intel C Compiler 2020u4, Intel MPI 2019u8, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -ip -fp-model fast=2 -no-prec-div -qoverride-limits -qopenmp-simd -O3 -xCORE-AVX512 -qopt-zmm-usage=high

NAMD without Tiles: App Version: 2.15-Alpha1 (built without AVX tiles algorithm); Build notes: Tools: Intel MKL, Intel C Compiler 2020u4, Intel MPI 2019u8, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -ip -fp-model fast=2 -no-prec-div -qoverride-limits -qopenmp-simd -O3 -xCORE-AVX512 -qopt-zmm-usage=high -DNAMD_​KNL

tested by Intel and results as of March 2021

NAMD 3rd Gen Intel Xeon and 2nd Gen Intel Xeon claims for demo New: March 2021

Baseline: February 2021

[106] 1.42x higher OpenFOAM performance on Gold 6354 vs. Gold 6154 3rd Generation Intel® Xeon® Platinum processor

1.42x higher performance on OpenFOAM Motorbike 42M

New: 6354: 1-node, 2x Intel Xeon Gold 6354 (18C/3.0GHz, 205W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3, 4.18.0-240.10.1.el8_​3.x86_​64, 1x Intel_​SSDSC2KG96 .Tested by Intel between March 12, 2021 and March 29, 2021.

Baseline: 6154: 1-node, 2x Intel Xeon Gold 6154 (18C/3.0GHz, 200W TDP) processor on Intel Software Development Platform with 192GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x2006a0a, HT on, Turbo on, CentOS Linux 8.3, 4.18.0-240.10.1.el8_​3.x86_​64, 1x Intel_​SSDSC2KG96 . Tested by Intel between February 1, 2021 to February 20, 2021

App Version: v8; Build notes: Tools: Intel FORTRAN Compiler 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512

tested by Intel and results as of March 2021

OpenFOAM Disclaimer: This offering is not approved or endorsed by OpenCFD Limited, producer and distributor of the OpenFOAM

OpenFOAM for Oracle New: March 2021

Baseline: February 2021

[105] up to 1.52x higher manufacturing performance on 3rd Gen Intel Xeon Scalable platform vs. prior gen 3rd Generation Intel® Xeon® Platinum processor

1.52x higher manufacturing performance (geomean Altair RADIOSS, Ansys Fluent, Ansys LS-DYNA, Converge, Numeca, OpenFOAM)

New: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96 .Tested by Intel between March 12, 2021 and March 29, 2021.

Baseline: 8280: 1-node, 2x Intel Xeon Platinum 8280 (28C/2.7GHz, 205W TDP) processor on Intel Software Development Platform with 192GB (12 slots/ 16GB/ 2933) total DDR4 memory, ucode 0x4002f01, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG48. Tested by Intel between February 1, 2021 to February 20, 2021.

1.47x higher Altair RADIOSS performance (geomean Neon1M/80ms, T10M/8ms) App Version: 2020; Build notes: Tools: Intel FORTRAN Compiler 2021u1, Intel C Compiler 2021u1, Intel MPI 2021u1; threads/core: 1; Turbo: used;

1.54x higher Ansys Fluent performance (geomean aircraft_​wing_​14m. aircraft_​wing_​2m, combustor_​12m, combustor_​16m, combustor_​71m, exhaust_​system_​33m, fluidized_​bed_​2m, ice_​2m, landing_​gear_​15m, oil_​rig_​7m, pump_​2m, rotor_​3m, sedan_​4m) App Version: 2021 R1; Build notes: One thread per core; Multi-threading Enabled; Turbo Boost Enabled; Intel FORTRAN Compiler 19.5.0; Intel C/C++ Compiler 19.5.0; Intel Math Kernel Library 2020.0.0; Intel MPI Library 2019 Update 8

1.48x higher Ansys LS-DYNA performance (geomean 3cars-150ms, car2car-120ms, ODB_​10M-30ms) App Version: R11; Build notes: Tools: Intel Compiler 2019u5 (AVX512), Intel MPI 2019u9; threads/core: 1; Turbo: used

1.52x higher Converge SI8_​engine_​PFI_​SAGE_​transient_​RAN performance App Version: 3.0.17; Build notes: Tools: Intel MPI 2019u9; threads/core: 1; Turbo: used; 3.0.17 Converge official converge-intelmp binary

1.61x higher Numeca performance (geomean FO_​hpcc_​single_​passage, FT_​hpcc_​single_​passage) FineOpen App Version: v10.1; Build notes: Tools: Customer pre-built binaries (Intel Fortran Compiler 2019, Intel C Compiler 2015), Intel MPI 2019u9; threads/core:1 ; Turbo: used; Build knobs: Fortran = -O2 -fp-model precise, C = -O2 -fPIC -pipe -Wno-deprecated -Wreturn-type -fp-model precise -std=c++11 FineTurbo App Version: v15.1; Build notes: Tools: Customer pre-built binaries (Intel Fortran Compiler 2015, Intel C Compiler 2015), Intel MPI 2019u4; threads/core:1 ; Turbo: used; Build knobs: Fortran = -O2 -fp-model precise, C = -O2 -fPIC -pipe -Wno-deprecated -Wreturn-type -fp-model precise -std=c++11

1.51x higher OpenFOAM performance (geomean 20M_​cell_​motorbike, 42M_​cell_​motorbike) App Version: v8; Build notes: Tools: Intel FORTRAN Compiler 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512

OpenFOAM Disclaimer: This offering is not approved or endorsed by OpenCFD Limited, producer and distributor of the OpenFOAM

tested by Intel and results as of March 2021

Manufacturing New: March 2021

Baseline: February 2021

[104] 3rd Gen Intel Xeon Platinum 8358 vs AMD EPYC 7543: Xeon performs 23% better across 12 leading HPC applications and benchmarks 3rd Generation Intel® Xeon® Platinum processor

HPCG: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 2019u5 MKL; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: 2019u5 MKL; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -march=core-avx2, tested by Intel and results as of April 2021

HPL: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: The Intel Distribution for LINPACK Benchmark; Build notes: Tools: Intel MPI 2019u7; threads/core: 1; Turbo: used; Build: build script from Intel Distribution for LINPACK package; 1 rank per NUMA node: 1 rank per socket, EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: AMD official HPL 2.3 MT version with BLIS 2.1; Build notes: Tools: hpc-x 2.7.0; threads/core: 1; Turbo: used; Build: pre-built binary (gcc built) from https://developer.amd.com/amd-aocl/blas-library/; 1 rank per L3 cache, 4 threads per rank, tested by Intel and results as of April 2021

STREAM Triad: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: McCalpin_​STREAM_​OMP-version; Build notes: Tools: Intel C Compiler 2019u5; threads/core: 1; Turbo: used; BIOS settings: HT=on Turbo=On SNC=On. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: McCalpin_​STREAM_​OMP-version; Build notes: Tools: Intel C Compiler 2019u5; threads/core: 1; Turbo: used; BIOS settings: HT=on Turbo=On SNC=On, tested by Intel and results as of April 2021

WRF Geomean of Conus-12km, Conus-2.5km, NWSC-3 NA-3km: Platinum 8358: 1-node 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 4.2.2; Build notes: Intel Fortran Compiler 2020u4, Intel MPI 2020u4; threads/core: 1; Turbo: used; Build knobs:-ip -w -O3 -xCORE-AVX2 -vec-threshold0 -ftz -align array64byte -qno-opt-dynamic-align -fno-alias $(FORMAT_​FREE) $(BYTESWAPIO) -fp-model fast=2 -fimf-use-svml=true -inline-max-size=12000 -inline-max-total-size=30000. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: 4.2.2; Build notes: Intel Fortran Compiler 2020u4, Intel MPI 2020u4; threads/core: 1; Turbo: used; Build knobs: -ip -w -O3 -march=core-avx2 -ftz -align all -fno-alias $(FORMAT_​FREE) $(BYTESWAPIO) -fp-model fast=2 -inline-max-size=12000 -inline-max-total-size=30000, tested by Intel and results as of April 2021

Binomial Options: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v1.0; Build notes: Tools: Intel C Compiler 2020u4, Intel Threading Building Blocks ; threads/core: 2; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-domain-exclusion=31 -fimf-accuracy-bits=11 -no-prec-div -no-prec-sqrt. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: v1.0; Build notes: Tools: Intel C Compiler 2020u4, Intel Threading Building Blocks ; threads/core: 2; Turbo: used; Build knobs: -O3 -march=core-avx2 -fimf-domain-exclusion=31 -fimf-accuracy-bits=11 -no-prec-div -no-prec-sqrt, tested by Intel and results as of April 2021

Monte Carlo: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v1.1; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 1; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: v1.1; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -O3 -march=core-avx2 -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt, tested by Intel and results as of April 2021

Ansys Fluent Geomean of aircraft_​wing_​14m, aircraft_​wing_​2m, combustor_​12m, combustor_​16m, combustor_​71m, exhaust_​system_​33m, fluidized_​bed_​2m, ice_​2m, landing_​gear_​15m, oil_​rig_​7m, pump_​2m, rotor_​3m, sedan_​4m: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 2021 R1; Build notes: One thread per core; Multi-threading Enabled; Turbo Boost Enabled; Intel FORTRAN Compiler 19.5.0; Intel C/C++ Compiler 19.5.0; Intel Math Kernel Library 2020.0.0; Intel MPI Library 2019 Update 8. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: 2021 R1; Build notes: One thread per core; Multi-threading Enabled; Turbo Boost Enabled; Intel FORTRAN Compiler 19.5.0; Intel C/C++ Compiler 19.5.0; Intel Math Kernel Library 2020.0.0; Intel MPI Library 2019 Update 8, tested by Intel and results as of April 2021

Ansys LS-DYNA Geomean of car2car-120ms, ODB_​10M-30ms: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: R11; Build notes: Tools: Intel Compiler 2019u5 (AVX512), Intel MPI 2019u9; threads/core: 1; Turbo: used. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: R11; Build notes: Tools: Intel Compiler 2019u5 (AMDAVX2), Intel MPI 2019u9; threads/core: 1; Turbo: used, tested by Intel and results as of April 2021

OpenFOAM 42M_​cell_​motorbike: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v8; Build notes: Tools: Intel FORTRAN Compiler 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: v8; Build notes: Tools: Intel FORTRAN Compiler 2020u4, Intel C Compiler 2020u4, Intel MPI 2019u8; threads/core: 1; Turbo: used; Build knobs: -O3 -ip -march=core-avx2, tested by Intel and results as of April 2021

LAMMPS Geomean of Polyethylene, Stillinger-Weber, Tersoff, Water: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v2020-10-29; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512 -qopt-zmm-usage=high. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: v2020-10-29; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -march=core-avx2, tested by Intel and results as of April 2021

NAMD Geomean of Apoa1, STMV: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL , Intel C Compiler 2020u4, Intel MPI 2019u8, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -ip -fp-model fast=2 -no-prec-div -qoverride-limits -qopenmp-simd -O3 -xCORE-AVX512 -qopt-zmm-usage=high. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL , AOCC 2.2.0, gcc 9.3.0, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -fomit-frame-pointer -march=znver1 -ffast-math, tested by Intel and results as of April 2021

RELION Plasmodium Ribosome: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 3_​1_​1; Build notes: Tools: Intel C Compiler 2020u4, Intel MPI 2019u9; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -g -debug inline-debug-info -xCOMMON-AVX512 -qopt-report=5 –restrict. EPYC 7543: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: 3_​1_​1; Build notes: Tools: Intel C Compiler 2020u4, Intel MPI 2019u9; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -g -debug inline-debug-info -march=core-avx2 -qopt-report=5 -restrict, tested by Intel and results as of April 2021

Manufacturing New: April 2021

Baseline: April 2021

[103] 3rd Gen Intel Xeon processor outperforms Graviton2 by up to 1.53x for Mongo DB maximum throughput (Intel Xeon based AWS M6i instance outperforms Graviton2 M6g for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU) 3rd Generation Intel® Xeon® Platinum processor 1.17x relative performance Mongo DB max throughput for 4vCPU, 1.24x for 8vCPU, 1.53x for 16vCPU and 1.03x for 32vCPU and 1.06x for 64vCPU, Storage/instance: io2 EBS, 1024G, Workload MongoDB v5.0.3 Ent, Other SW: Stress-NG v0.10.05, YCSB v0.17.0, Ubuntu 20.04 LTS, Kernel 5.11.0-1020-aws.

New: m6i.xlarge 16 GB memory capacity/instance, m6i.2xlarge 32 GB memory capacity/instance, m6i.4xlarge 64 GB memory capacity/instance, m6i.8xlarge 128 GB memory capacity/instance, m6i.16xlarge 256 memory capacity/instance (Xeon).

Baseline: m6g.xlarge 16 GB memory capacity/instance, m6g.2xlarge 32 GB memory capacity/instance, m6g.4xlarge 64 GB memory capacity/instance, m6g.8xlarge 128 GB memory capacity/instance, m6g.16xlarge 256 GB memory capacity/instance (Graviton2).

Throughput per second New: October 2021

Baseline: October 2021

[102] 3rd Gen Intel Xeon processor outperforms Graviton2 by up to 1.27x for SPECrate 2017_​fp_​base (est) (Intel Xeon based AWS M6i instance ICC compiler outperforms Graviton2 M6g GCC compiler for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU) 3rd Generation Intel® Xeon® Platinum processor 1.16x higher SPECrate 2017_​fp_​base (est) on ICC 2021.1 revB than GCC 11.1 Ofast compiler for 4vCPU, 1.21x for 8vCPU, 1.22x for 16vCPU and 1.27x for 64vCPU. Storage/instance: Amazon Elastic Block Store 512GB, Kernel: 5.11.0-1017-aws, Workload: SPECcpu2017 v1.1.8, Other SW: GCC 11.1.

New: m6i.xlarge 16 GB memory capacity/instance, m6i.2xlarge 32 GB memory capacity/instance, m6i.4xlarge 64 GB memory capacity/instance, m6i.8xlarge 128 GB memory capacity/instance (Xeon).

Baseline: m6g.xlarge 16 GB memory capacity/instance, m6g.2xlarge 32 GB memory capacity/instance, m6g.4xlarge 64 GB memory capacity/instance, m6g.8xlarge 128 GB memory capacity/instance (Graviton2).

Operations per second New: November 2021

Baseline: November 2021

[101] 3rd Gen Intel Xeon processor outperforms Graviton2 by up to 1.22x for SPECrate 2017_​int_​base (est) (Intel Xeon based AWS M6i instance ICC compiler outperforms Graviton2 M6g GCC compiler for 4vCPU, 8vCPU, 16vCPU, 32vCPU, and 64vCPU) 3rd Generation Intel® Xeon® Platinum processor 1.19x higher SPECrate 2017_​int_​base (est) on ICC 2021.1revB than GCC 11.1 Ofast compiler for 4vCPU, 1.17x for 8vCPU, 1.20x for 16 vCPU, and 1.22 for 64vCPU. Storage/instance: Amazon Elastic Block Store 512GB, Kernel: 5.11.0-1017-aws, Workload: SPECcpu2017 v1.1.8, Other SW: GCC 11.1.

New: m6i.xlarge 16 GB memory capacity/instance, m6i.2xlarge 32 GB memory capacity/instance, m6i.4xlarge 64 GB memory capacity/instance, m6i.8xlarge 128 GB memory capacity/instance (Xeon).

Baseline: m6g.xlarge 16 GB memory capacity/instance, m6g.2xlarge 32 GB memory capacity/instance, m6g.4xlarge 64 GB memory capacity/instance, m6g.8xlarge 128 GB memory capacity/instance (Graviton2).

Operations per second New: November 2021

Baseline: November 2021

[100] Intel Xeon Scalable systems are lower cost (up to 17%) without the added GPU complexity 3rd Generation Intel® Xeon® Platinum processor System pricing is estimated and based on an average of comparable configurations as the test systems as priced on www.colfax-intl.com and www.thinkmate.com on September 20, 2021. 4U rackmount systems used for 3rd Gen Intel® Xeon® Platinum 8380 Scalable processors: Thinkmate GPX XN6-24S3-10GPU and Colfax CX41060s-XK8. 4U rackmount servers used for NVIDIA A100 with AMD EPYC 7742 host CPUs: Thinkmate GPX QT24-24E2-8GPU and Colfax CX4860s-EK8. See www.colfax-intl.com and www.thinkmate.com for more details. Price/performance Test by Intel, September 20, 2021
[99] 3rd Gen Intel Xeon Platinum 8380 processor delivers up to 1.65x higher performance on cloud data analytics usage vs. prior generation platform enabling faster business decisions 3rd Generation Intel® Xeon® Platinum processor 1.65x higher responses with CloudXPRT - Data Analytics: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic​, 1x S4610 SSD 960G, CloudXPRT v1.0, Data Analytics (Analytics per minute @ p.95 <= 90s), test by Intel on 3/12/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic​, 1x S3520 SSD 480G, CloudXPRT v1.0, test by Intel on 2/4/2021. Intel contributes to the development of benchmarks by participating in, sponsoring, and/or contributing technical support to various benchmarking groups, including the BenchmarkXPRT Development Community administered by Principled Technologies. CloudXPRT Data Analytics New: March 12, 2021

Baseline: Feb 04, 2021

[98,97,81] over 50% higher performance on latency sensitive workloads such as database, e-commerce, and web server applications with 3rd Gen Intel Xeon Scalable platform 3rd Generation Intel® Xeon® Platinum processor Geomean of (HammerDB MySQL, Server Side Java, WordPress with HTTPS)

1.64x HammerDB MySQL: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel SSD 960GB OS Drive, 1x Intel P5800 1.6T, x Onboard 1G/s, HammerDB 4.0, MySQL 8.0.22, test by Intel on 3/11/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel 240GB SSD OS Drive, 1x Intel 6.4T P4610, x Onboard 1G/s, HammerDB 4.0, MySQL 8.0.22, test by Intel on 2/5/2021.

1.6x higher throughput under SLA and 1.4x higher throughput for Server Side Java: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-64-generic, 1x SSDSC2BA40, Java workload, JDK 1.15.0.1, test by Intel on 3/15/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-64-generic, 1x INTEL_​SSDSC2KG01, Java workload, JDK 1.15.0.1, test by Intel on 2/18/2021.

1.48x higher responses on WordPress with HTTPS: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic, 1x Intel 895GB SSDSC2KG96, 1x XL710-Q2, WordPress 4.2 with HTTPS, gcc 9.3.0, GLIBC 2.31-0ubuntu9.1, mysqld Ver 10.3.25-MariaDB-0ubuntu0.20.04.1, PHP 7.4.9-dev (fpm-fcgi), Zend Engine v3.4.0, test by Intel on 3/15/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic, 1x Intel 1.8T SSDSC2KG01, 1x Intel X722, test by Intel on 2/5/2021.

geomean of MySQL DB, Server Side Java, WordPress New: March 15, 2021

Baseline: Feb 05, 2021

[98] 3rd Gen Intel Xeon Platinum 8380 processor delivers 1.58x higher performance on cloud microservices usage vs. prior generation platform enabling faster business decisions 3rd Generation Intel® Xeon® Platinum processor 1.58x higher responses with CloudXPRT Web Microservices: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic​, 1x S4610 SSD 960G, CloudXPRT v1.0, Web Microservices (Requests per minute @ p.95 latency <= 3s), test by Intel on 3/12/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04, 5.4.0-54-generic, 1x S3520 SSD 480G, CloudXPRT v1.0, test by Intel on 2/4/2021. Intel contributes to the development of benchmarks by participating in, sponsoring, and/or contributing technical support to various benchmarking groups, including the BenchmarkXPRT Development Community administered by Principled Technologies. CloudXPRT Web Microservices New: March 12, 2021

Baseline: Feb 04, 2021

[97] 3rd Gen Intel Xeon Platinum 8380 processor can process up to 1.48x higher secure requests to content management system vs. prior generation platform 3rd Generation Intel® Xeon® Platinum processor 1.48x higher responses on WordPress with HTTPS: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic, 1x Intel 895GB SSDSC2KG96, 1x XL710-Q2, WordPress 4.2 with HTTPS, gcc 9.3.0, GLIBC 2.31-0ubuntu9.1, mysqld Ver 10.3.25-MariaDB-0ubuntu0.20.04.1, PHP 7.4.9-dev (fpm-fcgi), Zend Engine v3.4.0, test by Intel on 3/15/2021. Baseline:Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic, 1x Intel 1.8T SSDSC2KG01, 1x Intel X722, test by Intel on 2/5/2021. WordPress New: March 15, 2021

Baseline: Feb 05, 2021

[96] Up to 1.6x higher Server Side Java throughput performance within a given SLA with 3rd Gen Intel® Xeon® Platinum 8380 processor vs. prior generation platform.

Up to 1.4x higher Server Side Java throughput performance with 3rd Gen Intel® Xeon® Platinum 8380 processor vs. prior generation platform

3rd Generation Intel® Xeon® Platinum processor 1.6x higher throughput under SLA and 1.4x higher throughput for Server Side Java: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-64-generic, 1x SSDSC2BA40, Java workload, JDK 1.15.0.1, test by Intel on 3/15/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-64-generic, 1x INTEL_​SSDSC2KG01, Java workload, JDK 1.15.0.1, test by Intel on 2/18/2021. Server Side Java New: March 15, 2021

Baseline: Feb 18, 2021

[94] Up to 21% more PostgreSQL database transactions vs. AMD EPYC 3rd Generation Intel® Xeon® Platinum processor

Up to 21% more PostgreSQL database transactions vs. AMD EPYC.

New: 1-node, Intel Software Development Platform, 2x Intel® Xeon Platinum 8358 (32C, 2.6GHz, 250W TDP), HT On, Turbo ON, Total Memory: 512 GB (16 slots/ 32GB/ 3200 MHz), ucode: x260, 1x SSD boot, 1xP5800x 1.6TB, Red Hat 8.3, kernel: 4.18.0-240.el8.x86_​64, HammerDB v4.1, PostgreSQL 13.0. Tested by Intel as of April 2021.

AMD EPYC: 1-node, Dell PowerEdge R7525, 2x AMD 7543 (32C, 2.8GHz, 240W cTDP), SMT On, Boost ON, Total Memory: 512 GB (16 slots/ 32GB/ 3200 MHz), ucode: 0xa001119, 1x NVMe boot, 1x PM1733 3.84TB, Red Hat 8.3, kernel: 4.18.0-240.el8.x86_​64, HammerDB v4.1, PostgreSQL 13.0. Tested by Intel as of April 2021.

HammerDB 4.1 w/PostgreSQL New: April 2021

Baseline: April 2021

[93] Up to 12% more MySQL transactions and 34% lower average new transaction database latency vs. AMD EPYC 3rd Generation Intel® Xeon® Platinum processor

Up to 12% more MySQL transactions and 34% lower average new transaction database latency vs. AMD EPYC.

New: 1-node, Intel Software Development Platform, 2x Intel® Xeon Platinum 8358 (32C, 2.6GHz, 250W TDP), HT On, Turbo ON, Total Memory: 512 GB (16 slots/ 32GB/ 3200 MHz), ucode: x260, 1x SSD boot, 1xP5800x 1.6TB, Red Hat 8.3, kernel: 4.18.0-240.el8.x86_​64, HammerDB v4, MySQL 8.0.22. Average P95 new order (NEWORD) latency as measured by HammerDB. Tested by Intel as of April 2021.

AMD /EPYC: 1-node, Dell PowerEdge R7525, 2x AMD 7543 (32C, 2.8GHz, 240W cTDP), SMT On, Boost ON, Total Memory: 512 GB (16 slots/ 32GB/ 3200 MHz), ucode: 0xa001119, 1x NVMe boot, 1x PM1733 3.84TB, Red Hat 8.3, kernel: 4.18.0-240.el8.x86_​64, HammerDB v4, MySQL 8.0.22. Average P95 new order (NEWORD) latency as measured by HammerDB. Tested by Intel as of April 2021.

HammerDB 4 w/MySQL New: April 2021

Baseline: April 2021

[91,92] 1.62x average performance gains across network and communications workloads on 3rd Gen Intel Xeon Scalable "N" processors and Intel Ethernet 800 series compared to prior generation platform

With Intel® 3rd Gen Xeon® Scalable processors, CoSP's can achieve up to 21% boost in vBNG performance, while enabling increased flexibility for fixed and mobile convergence, manageability and scalability to expand use cases to address both mobile and broadband workloads.

With Intel® 3rd Gen Xeon® Scalable processors, CoSP's can increase 5G UPF performance by 42%. Combined with Intel Ethernet 800 series adapters, they can deliver the performance, efficiency and trust for use cases that require low latency, including augmented reality, cloud-based gaming, discrete automation and even robotic-aided surgery.

With Intel® 3rd Gen Xeon® Scalable processors and the latest Intel® Optane™ Persistent Memory you can get up to 63% higher throughput and 33% more memory capacity, enabling you to serve the same number of subscribers at higher resolution or a greater number of subscribers at the same resolution.

With Intel® 3rd Gen Xeon® Scalable processors, you can support up to 94% more secure networking connections and achieve significantly faster speeds to support cloud, edge and work-from-home use cases.

With the higher core performance and new crypto acceleration of Intel® 3rd Gen Xeon® Scalable processors, CoSP's can achieve 72% better CMTS platform performance. Additional QAT offload can add up to another 10% boost.

With Intel® 3rd Gen Xeon® Scalable processors, enhance Vector Packet Processing - Forward Information Base performance by 66% vs. the prior generation.

With Intel® 3rd Gen Xeon® Scalable processors, enhance DPDK L3 Forwarding performance by 88% vs. the prior generation.

With Intel® 3rd Gen Intel® Xeon® Scalable processors, Ethernet 800 series and vRAN dedicated accelerators, CoSP's can get to 2x Massive MIMO throughput in a similar power envelope for a best-in-class 3x100mhz 64T64R configuration.

3rd Generation Intel® Xeon® Platinum processor & Intel® Ethernet 800 Series Network Adapters ​

1.62x average network performance gains: geomean of Virtual Broadband Network Gateway, 5G User Plane Function, Virtual Cable Modem Termination System, Vector Packet Processing - Forward Information Base 512B, DPDK L3 Forward 512B, CDN-Live, Vector Packet Processing - IP Security 1420B.

1.2x  Virtual Broadband Network Gateway: New: Gold 6338N: 1-node, 2(1 socket used)x Intel Xeon Gold 6338N on Intel* Whitley with 256 GB (16 slots/ 16GB/ 2666) total DDR4 memory, ucode 0x261, HT on, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 3x E810-CQDA2 (Tacoma Rapids), vBNG 20.07, Gcc 9.3.0​, test by Intel on 3/11/2021. Baseline: Gold 6252N: 1-node, 2(1 socket used)x Intel Xeon Gold 6252N on SuperMicro* X11DPG-QT with 192 GB (12 slots/ 16GB/ 2933)  total DDR4 memory, ucode 0x5002f01, HT on, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 3x E810-CQDA2 (Tacoma Rapids), vBNG 20.07, Gcc 9.3.0​,  test by Intel on 2/2/2021.

1.42x 5G User Plane Function: New: 1-node, 2(1 socket used)x Intel Xeon Gold 6338N on Whitley Coyote Pass 2U  with 128 GB (8 slots/ 16GB/ 2666)  total DDR4 memory, ucode 0x261, HT on, Turbo off, Ubuntu 18.04.5 LTS, 4.15.0-134-generic, 1x Intel 810 (Columbiaville), FlexCore 5G UPF, Jan’ 2021​ MD5 checksum: c4ad7f8422298ceb69d01e67419ff1c1, GCC 7.5.0, 5G UPF228 Gbps / 294 Gbps,  test by Intel on 3/16/2021. Baseline: 1-node, 2(1 socket used)x Intel Xeon Gold 6252N on SuperMicro* X11DPG-QT with 96 GB (6 slots/ 16GB/ 2934)  total DDR4 memory, ucode 0x5003003, HT on, Turbo off, Ubuntu 18.04.5 LTS, 4.15.0-132-generic, 1x Intel 810 (Columbiaville), FlexCore 5G UPF, Jan’ 2021  MD5 checksum: c4ad7f8422298ceb69d01e67419ff1c1, GCC 7.5.0, 5G UPF161 Gbps / 213 Gbps,  test by Intel on 2/12/2021.

1.63x CDN Live: New: 1 node, 2x Intel® Xeon® Gold 6338N Processor, 32 core HT ON Turbo ON, Total DRAM 256GB (16 slots/16GB/2666MT/s), Total Optane Persistent Memory 200 Series 2048GB (16 slots/128GB/2666MT/s), BIOS SE5C6200.86B.2021.D40.2103100308 (ucode: 0x261), 4x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus 6.0.7r2. 2 clients, Test by Intel as of 3/11/2021. Baseline: Gold 6252N: 2x Intel® Xeon® Gold 6252N Processor, 24 core HT ON Turbo ON, Total DRAM 192GB (12 slots/16GB/2666MT/s), Total Optane Persistent Memory 100 Series 1536GB(12 slots/128GB/2666MT/s), 1x Mellanox MCX516A-CCAT, BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode: 0x5003003), Ubuntu 20.04, kernel 5.4.0-65-generic, wrk master 4/17/2019. Test by Intel as of 2/15/2021. Throughput measured with 100% Transport Layer Security (TLS) traffic with 93.3% target cache hit ratio and keep alive on, 512 total connections.

1.94x Vector Packet Processing - IP Security 1420B: New: 1-node, 2(1 socket used)x Intel Xeon Gold 6338N on Intel* Whitley with 128 GB (8 slots/ 16GB/ 2666)  total DDR4 memory, ucode 0x261, HT on, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 1x E810-2CQDA2 (Chapman Beach), v21.01-release, Gcc 9.3.0​, VPPIPSEC(24c24t) test by Intel on 3/17/2021 .Baseline: 1-node, 2(1 socket used)x Intel Xeon Gold 6252N on SuperMicro* X11DPG-QT with 96 GB (6 slots/ 16GB/ 2933)  total DDR4 memory, ucode 0x5002f01, HT off, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 1x E810-CQDA2 (Tacoma Rapids), v21.01-release, Gcc 9.3.0​,  VPPIPSEC(18c18t) test by Intel on 2/2/2021.

1.72x Virtual Cable Modem Termination System: New: Gold 6338N: 1-node, 2(1 socket used)x Intel Xeon Gold 6338N on Coyote Pass with 256 GB (16 slots/ 16GB/ 2666) total DDR4 memory, ucode 0x261, HT on, Turbo off(no SST-BF)/on(SST-BF), Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 3x E810-CQDA2 (Tacoma Rapids), vCMTS 20.10​, Gcc 9.3.0​, SST-BF (2.4 Ghz,1.9 Ghz frequencies for the priority cores and the other cores respectively ), test by Intel on 3/11/2021. Baseline: Gold 6252N: 1-node, 2(1 socket used)x Intel Xeon Gold 6252N on SuperMicro* X11DPG-QT with 192 GB (12 slots/ 16GB/ 2933)  total DDR4 memory, ucode 0x5002f01, HT on, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 2x E810-CQDA2 (Tacoma Rapids), vCMTS 20.10​, Gcc 9.3.0​, vCMTS90 (14 instances),  test by Intel on 2/2/2021.

1.66 Vector Packet Processing - Forward Information Base 512B: New: 1-node, 2(1 socket used)x Intel Xeon Gold 6338N on Intel* Whitley with 128 GB (8 slots/ 16GB/ 2666)  total DDR4 memory, ucode 0x261, HT on, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 1x E810-2CQDA2 (Chapman Beach), v20.05.1-release, Gcc 9.3.0​, VPPFIB(24c24t)​,  test by Intel on 3/17/2021. Baseline: 1-node, 2(1 socket used)x Intel Xeon Gold 6252N on SuperMicro* X11DPG-QT with 96 GB (6 slots/ 16GB/ 2933)  total DDR4 memory, ucode 0x5002f01, HT off, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 1x E810-CQDA2 (Tacoma Rapids), v20.05.1-release, Gcc 9.3.0​, VPPFIB (18c18t)​,  test by Intel on 2/2/2021.

1.88x DPDK L3 Forward 512B: New: 1-node, 2(1 socket used)x Intel Xeon Gold 6338N on Intel* Whitley with 128 GB (8 slots/ 16GB/ 2666)  total DDR4 memory, ucode 0x261, HT on, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 1x E810-2CQDA2 (Chapman Beach), v20.08.0, Gcc 9.3.0​, DPDKL3FWD (24c24t), test by Intel on 3/17/2021, Baseline: 2(1 socket used)x Intel Xeon Gold 6252N on SuperMicro* X11DPG-QT with 96 GB (6 slots/ 16GB/ 2933)  total DDR4 memory, ucode 0x5002f01, HT off, Turbo off, Ubuntu 20.04 LTS (Focal Fossa)​, 5.4.0-40-generic, 1x INTEL* 240G SSD , 1x E810-CQDA2 (Tacoma Rapids), v20.08.0, Gcc 9.3.0​​, DPDKL3FWD (12c12t),  test by Intel on 2/2/2021.

FlexRAN: 2x MIMO Throughput: Results have been estimated or simulated. Based on 2x estimated throughput from 32Tx32R (5Gbps) on 2nd Gen Intel® Xeon® Gold 6212U processor to 64Tx64R (10Gbps) on 3rd Gen Intel Xeon Gold 6338N processor at similar power ~185W.

geomean of Virtual Broadband Network Gateway, 5G User Plane Function, CDN Video-on-Demand, Virtual Cable Modem Termination System, Vector Packet Processing - Forward Information Base 512B, DPDK L3 Forward 512B, CDN-Live, Vector Packet Processing - IP Security 1420B.

FlexRAN

New: March 17, 2021

Baseline: Feb 02, 2021

[91] Up to 1.74x higher performance with the new 3rd Gen Intel Xeon Scalable Platform supporting Intel® Optane™ PMem 200 Series for CDN Live use case vs. prior generation Intel Xeon® Scalable platform supporting Intel® Total Optane Persistent Memory 100 Series

3rd Generation Intel® Xeon® Platinum processor 1.74x CDN-Live-Linear with Intel PMem. New: 1 node, 2x Intel® Xeon® Gold 6338 Processor, 32 core HT ON Turbo ON, Total DRAM 256GB (16 slots/16GB/3200MT/s), Total Optane Persistent Memory 200 Series 2048GB (16 slots/128GB Optane PMem 200 Series/3200MT/s), BIOS 1.1 (ucode: 0xd000280), 4x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus-6.0.8r1 revision 96a565db42d1ba89c21a3caa1b06a42d296581f3. 2 clients, wrk master 4/17/2019 (keep alive off, 3000 total connections). Test by Intel as of 5/10/2021. Baseline: 1 node, 2x Intel® Xeon® Gold 6252N Processor, 28 core HT ON Turbo ON, Total DRAM (192GB 12 slots/16GB/2666MT/s), Total Optane Persistent Memory 100 Series 1536GB (12 slots/128GB/2666MT/s), BIOS 2.10.2 (ucode: 0x5003003), 2x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus-6.0.8r1 revision 96a565db42d1ba89c21a3caa1b06a42d296581f3. 2 clients, wrk master 4/17/2019 (keep alive off, 3000 total connections). Test by Intel as of 5/10/2021. Throughput measured with 100% Transport Layer Security (TLS) traffic with 93.3% cache hit ratio. CDN -Live Linear with Intel Optane PMem 200 Series New: May 10, 2021

Baseline: May 10, 2021

[91] Up to 191.58Gbps network throughput with 1 socket 3rd Gen Intel Xeon Scalable platform featuring gen 4 Intel DC P5510 SSD for Video-On-Demand CDN use case using Varnish

Up to 383.33Gbps network throughput with 2 socket 3rd Gen Intel Xeon Scalable platform featuring gen 4 Intel DC P5510 SSD for Video-On-Demand CDN use case using Varnish

3rd Generation Intel® Xeon® Platinum processor

191.58Gbps CDN Video On Demand with Intel NVMe: 1 node, 1x Intel® Xeon® Gold 6338 Processor, 32 core HT ON Turbo ON, Total Memory 256GB (8 slots/32GB/3200MT/s), BIOS 1.1 (ucode: 0xd000280), 8x Intel® P5510, 1x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus-6.0.8r1 revision 96a565db42d1ba89c21a3caa1b06a42d296581f3. 2 clients, wrk master 4/17/2019 (keep alive on, 3000 total connections). Test by Intel as of 5/10/2021. Throughput measured with 100% Transport Layer Security (TLS) traffic with 100% target cache hit ratio.

383.33Gbps CDN Video On Demand with Intel NVMe: 1 node, 2x Intel® Xeon® Gold 6338 Processor, 32 core HT ON Turbo ON, Total Memory 256GB (16 slots/16GB/3200MT/s), BIOS 1.1 (ucode: 0xd000280), 12x Intel® P5510, 4x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus-6.0.8r1 revision 9ec6a3b587aa2500a6f7dd3cc7510d617ebb0c4e. 4 clients, wrk master 4/17/2019 (keep alive on, 3000 total connections). Test by Intel as of 5/10/2021. Throughput measured with 100% Transport Layer Security (TLS) traffic with 100% target cache hit ratio.

CDN Video On Demand with Intel NVMe New: May 10, 2021

Baseline: May 10, 2021

[90] Up to 4.2x more TSL encrypted web server connections per second with NGINX on 3rd Gen Intel Xeon Scalable processor with built in enhanced crypto acceleration and E810 compared to prior generation platform. 3rd Generation Intel® Xeon® Platinum processor 4.2x NGINX (TLS 1.2 Handshake) web server connections/sec with ECDHE-X25519-RSA2K Multi-buffer: New: 1-node, 2x Intel® Xeon® Gold 6338N processor on Coyote Pass with 256 GB (16 slots/ 16GB/ 2666)  total DDR4 memory, ucode x261, HT on, Turbo off, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, x 3 x Quad Ethernet Controller E810-C for SFP 25 GBE, Async NGINX v0.4.3, OpenSSL 1.1.1h, QAT Engine v0.6.4, Crypto MB-ippcp_​2020u3, GCC 9.3.0, GLIBC 2.31,  test by Intel on 3/22/2021. Baseline: 1-node, 2x Intel® Xeon® Gold 6252N processor on Supermicro X11DPG-QT with 192 GB (12 slots/ 16GB/ 2933)  total DDR4 memory, ucode 0x5003003, HT on, Turbo off, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, x 2 x Quad Ethernet Controller XXV710 for 25GbE SFP28, 1 x Dual Ethernet Controller XXV710 for 25GbE SFP28, Async NGINX v0.4.3, OpenSSL 1.1.1h, GCC 9.3.0, GLIBC 2.31,  test by Intel on 1/17/2021. NGINX v0.4.3, OpenSSL 1.1.1h New: March 22, 2021

Baseline: January 17, 2021

[89] Up to 2.4x better performance on NGINX Web Secure Key Exchange vs. AMD EPYC 3rd Generation Intel® Xeon® Platinum processor

Up to 2.4x better performance on NGINX Web Secure Key Exchange vs. AMD EPYC.

New: 1-node, 1x Intel® Xeon® Gold 6338N (32C/2.2GHz) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x260, HT on, Turbo off, 1xSSD, 3 x QUAD E810-XXVDA4 (12 x 25 GbE), Ubuntu* 20.04, 5.4.0.40-generic, Glib 2.31, Async NGINX* v0.4.3, QAT Engine v0.6.4, Crypto MB ippcp_​2020u3, Intel® Multi-Buffer Crypto for IPsec v0.54, GCC 9.3.0, OpenSSL* 1.1.1h, tested by Intel as of March 2021.

AMD EPYC: 1-node,1-socket AMD EPYC 7513 (32C/2.6GHz) on Supermicro H12SSL-CT server with 128 GB (8 slots/ 16GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost off, 1xSSD, 6 x 100GbE Controller E810-C for QSFP, Ubuntu* 20.04, 5.4.0.40-generic, Glib 2.31, nginx/1.18.0 async v0.4.5, GCC 9.3.0, OpenSSL 1.1.1j, tested by Intel as of July 2021.

NGINX Web Secure Key Exchange New: March 2021

Baseline: July 2021

[86] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 4.96x in BERT-Large inference work with INT8 precision (Intel Xeon based AWS m6i instance outperforms Graviton2 m6g with FP32 precision for 32vCPU and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor

4.96x more sentences per second in BERT-Large with INT8 precision for 32vCPU and 3.07x for 64vCPU, Workload BERT, Other SW (Graviton with FP32 precision): Ubuntu 20.04.2 LTS, Kernel 5.4.0-1060-aws, GCC 9.3.0, library TensorFlow 2.5.0, Docker 20.10.7, containerd 1.3.7, other SW (Intel with INT8 precision): Ubuntu 20.04 LTS, Kernel 5.11.0-1022-aws, GCC 8.4.0, libraries Python 3.6.9, TensorFlow 2.5.0, Docker 20.10.7, containerd 1.5.5.

New: m6i.8xlarge 128 GB memory capacity/instance, m6i.16xlarge 256 GB memory capacity/instance (Xeon).

Baseline: m6g.8xlarge 128 GB memory capacity/instance, m6g.16xlarge 256 GB memory capacity/instance (Graviton2).

Test by Intel. New: December 1, 2021; Baseline: November 10, 2021

BERT-Large inference - Natural language processing speedup (sentences per second)

New: December 1, 2021

Baseline: November 10, 2021

[85] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 2.04x in BERT-Large inference work with FP32 precision (Intel Xeon based AWS m6i instance outperforms Graviton2 m6g with FP32 precision for 32vCPU and 64vCPU). 3rd Generation Intel® Xeon® Platinum processor

2.04x more sentences per second in BERT-Large with FP32 precision for 32vCPU and 1.38x for 64vCPU, Workload BERT, Other SW (Graviton): Ubuntu 20.04.2 LTS, Kernel 5.4.0-1060-aws, GCC 9.3.0, library TensorFlow 2.5.0, Docker 20.10.7, containerd 1.3.7, other SW (Intel): Ubuntu 20.04 LTS, Kernel 5.11.0-1022-aws, GCC 8.4.0, libraries Python 3.6.9, TensorFlow 2.5.0, Docker 20.10.7, containerd 1.5.5.

New: m6i.8xlarge 128 GB memory capacity/instance, m6i.16xlarge 256 GB memory capacity/instance (Xeon).

Baseline: m6g.8xlarge 128 GB memory capacity/instance, m6g.16xlarge 256 GB memory capacity/instance (Graviton2).

Test by Intel. New: December 1, 2021; Baseline: November 10, 2021

BERT-Large inference - Natural language processing speedup (sentences per second)

New: December 1, 2021

Baseline: November 10, 2021

[84] Up to 1.72x higher virtualization performance with 3rd Gen Intel® Xeon® Scalable processor with Intel® SSD D5-P5510 Series and Intel® Ethernet Network Adapter E810 vs. prior generation platform 3rd Generation Intel® Xeon® Platinum processor 1.72x higher virtualization performance vs. prior generation: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 2048 GB (32 slots/ 64GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, RedHat 8.3, 4.18.0-240.el8.x86_​64, 1x S4610 SSD 960G, 4x P5510 3.84TB NVME, 2x Intel E810, Virtualization workload, Qemu-kvm 4.2.0-34 (inbox), WebSphere 8.5.5, DB2 v9.7, Nginx 1.14.1, test by Intel on 3/14/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 1536 GB (24 slots/ 64GB/ 2933[2666]) total DDR4 memory, ucode 0x5003005, HT on, Turbo on, RedHat 8.1 (Note: selected higher of RedHat 8.1 and 8.3 scores for baseline), 4.18.0-147.el8.x86_​64, 1x S4510 SSD 240G, 4x P4610 3.2TB NVME, 2x Intel XL710, Virtualization workload, Qemu-kvm 4.2.0-34 (inbox), WebSphere 8.5.5, DB2 v9.7, Nginx 1.14.1, test by Intel on 12/22/2020.

Virtualization workload

New: March 14, 2021

Baseline: Dec 22, 2020

[83] Process up to 1.55x higher transactions per minute with the 3rd Gen Intel Xeon Platinum 8380 processor and Intel® Optane™ SSD P5800X series vs. prior generation platform 3rd Generation Intel® Xeon® Platinum processor 1.55x higher Transactions on OLTP Database: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel SSD 960GB OS Drive, 4x Intel® Optane™ SSD P5800X Series 1.6T (2xDATA, 2XREDO), x Onboard 1G/s, HammerDB 4.0, Oracle 19c, test by Intel on 3/16/2021. Baseline: Platinum 8280 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel 240GB SSD OS Drive, 4x Intel 3.2T P4610 (2xDATA, 2xREDO), x Onboard 1G/s, HammerDB 4.0, Oracle 19c, test by Intel on 11/30/2020. HammerDB OLTP w/Oracle New: March 16, 2021

Baseline: Nov 30, 2020

[82] Support your growing business needs with the new 3rd Gen Intel® Xeon® Scalable Platform and realize up to 1.53x higher OLTP database transactions on Microsoft SQL Server compared to prior generation 3rd Generation Intel® Xeon® Platinum processor 1.53x higher OLTP brokerage performance: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Wilson City with 1536 GB (24 slots/ 64GB/ 2933) total DDR4 memory, ucode 0x261, HT on, Turbo on, Windows Server 2019, 10.0.17763 Build 17763.1339, 1x Intel 1.6TB SSD OS Drive, db & logx (69x Intel SSD D3-S4510 (960GB), 5x Intel SSD D3 S4510 (960GB), 28x Intel SSD DC S4600 (1.92TB), 8x Intel SSD D3-S4510 (960GB) ), 2x Intel X520-2 10GBASE-T, OLTP brokerage, Microsoft SQL Server 2019 RTM Cumulative Update 8, test by Intel on 3/10/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8380 processor on S2600WFT with 1536 GB (24 slots/ 64GB/ 2933[2666]) total DDR4 memory, ucode 0x00B001008D, HT on, Turbo on, Windows Server 2019, 10.0.17763 Build 17763.1339, 1x Intel 1.6TB SSD OS Drive, db & logx (69x Intel SSD D3-S4510 (960GB), 5x Intel SSD D3 S4510 (960GB), 28x Intel SSD DC S4600 (1.92TB), 8x Intel SSD D3-S4510 (960GB) ), 2x Intel X520-2 10GBASE-T, OLTP brokerage, Microsoft SQL Server 2019 RTM Cumulative Update 8, test by Intel on 2/6/21. OLTP brokerage w/Microsoft New: March 10, 2021

Baseline: Feb 06, 2021

[81] Process up to 1.64x higher transactions per minute with the 3rd Gen Intel Xeon Platinum 8380 processor and Intel® Optane™ SSD P5800X Series vs. prior generation platform 3rd Generation Intel® Xeon® Platinum processor 1.64x HammerDB MySQL: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel SSD 960GB OS Drive, 1x Intel® Optane™ SSD P5800X Series 1.6T, x Onboard 1G/s, HammerDB 4.0, MySQL 8.0.22, test by Intel on 3/11/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel 240GB SSD OS Drive, 1x Intel 6.4T P4610, x Onboard 1G/s, HammerDB 4.0, MySQL 8.0.22, test by Intel on 2/5/2021. HammerDB w/MySQL New: March 11, 2021

Baseline: Feb 05, 2021

[80] Up to 2.5x higher transactions on the new 3rd Gen Intel Xeon Scalable processor with Intel Optane PMem 200 and Intel Ethernet E810 Network Adaptor running Aerospike with index and data in PMem vs. prior generation platform

Up to 1.43x higher transactions on the new 3rd Gen Intel Xeon Scalable processor with Intel Optane PMem 200, Intel P5510 SSD and Intel Ethernet E810 Network Adaptor running Aerospike with index in PMem and data in SSD vs. prior generation platform

3rd Generation Intel® Xeon® Platinum processor and Intel® Optane™ persistent memory 200 series . 2.5x higher transactions with Index+Data in PMem and 1.43x with Index(PMem)+Data(SSD) for Aerospike Database: New: Platinum 8368: 1-node, 2x Intel Xeon Platinum 8368 processor on Coyote Pass with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, 8192 GB (16 slots/ 512 GB/ 3200) total Pmem, ucode x261, HT on, Turbo on, CentOS 8.3.2011, 4.18.0-193.el8.x86_​64, 1x Intel 960GB SSD, 7x P5510 3.84TB, 2x Intel E810-C 100Gb/s, Aerospike Enterprise Edition 5.5.0.2; Aerospike C Client 5.1.0 Benchmark Tool; 70R/30W. Dataset size: 1.1TB, 9.3 billion 64B records, PMDK libPMem, Index (PMem)+data (SSD) and Index+data (PMem), test by Intel on 3/16/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280L processor on Wolf Pass with 768 GB (12 slots/ 64GB/ 2666) total DDR4 memory, 3072 GB (12 slots/ 256 GB/ 2666) total PMem, ucode 0x5003003, HT on, Turbo on, CentOS 8.3.2011, 4.18.0-193.el8.x86_​64, 7x P4510 1.8TB PCIe 3. 1, 2x Intel XL710 40Gb/s, Aerospike Enterprise Edition 5.5.0.2; Aerospike C Client 5.1.0 Benchmark Tool; 70R/30W. Dataset size: 1.1TB, 9.3 billion 64B records, PMDK libpmem, Index (PMem)+data (SSD), test by Intel on 3/16/2021. Aerospike New: March 16, 2021

Baseline: March 16, 2021

[79] Up to 1.41x faster performance for Online Analytical Processing workloads running with Microsoft SQL Server 2019 on the new 3rd Gen Intel® Xeon® Scalable Platform compared to prior generation 3rd Generation Intel® Xeon® Platinum processor 1.41x higher OLAP Decision Support: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Wilson City with 2048 GB (32 slots/ 64GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Windows Server 2019, 17763.rs5_​release.180914-1434, 1x Intel 200GB SSD OS Drive, 2x P4608 6.4TB PCIe NVME, 1x Intel X520-2, OLAP workload (3TB dataset), Microsoft SQL Server 2019 RTM Cumulative Update 8, test by Intel on 3/10/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280L processor on Wolf Pass with 1536 GB (24 slots/ 64GB/ 2933[2666]) total DDR4 memory, ucode 0x003300005, HT on, Turbo on, Windows Server 2019, 17763.rs5_​release.180914-1434, 1x Intel 200GB SSD OS Drive, 2x P4608 6.4TB PCIe NVME, 1x Intel X520-2, OLAP workload (3TB dataset), Microsoft SQL Server 2019 RTM Cumulative Update 8, test by Intel on 1/31/21. Decision Support New: March 10, 2021

Baseline: Jan 31, 2021

[78] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 1.69x for MySQL new orders per minute (Intel Xeon based AWS M6i instance outperforms Graviton2 M6g for 8vCPU, 16vCPU, and 64vCPU) 3rd Generation Intel® Xeon® Platinum processor 1.25x more new orders per minute in MySQL HammerDB for 8vCPU, 1.36x for 16vCPU, and 1.69x for 64vCPU, Workload: MySQL-8.0.25, Other SW: HammerDB-v4.2, Ubuntu 20.04.3 LTS, Kernel 5.11.0-1017-aws.

New: m6i.2xlarge 32 GB memory capacity/instance, m6i.4xlarge 64 GB memory capacity/instance, m6i.16xlarge 256 memory capacity/instance (Xeon).

Baseline: m6g.2xlarge 32 GB memory capacity/instance, m6g.4xlarge 64 GB memory capacity/instance, m6g.16xlarge 256 GB memory capacity/instance (Graviton2).

HammerDB w/ MySQL New: October 2021

Baseline: October 2021

[77] 3rd Gen Intel® Xeon® Scalable processor outperforms Graviton2 by up to 3.84x for Resnet50 inference throughput (Intel Xeon based AWS M6i 16 vCPU instance outperforms Graviton2 M6g 16vCPU instance) 3rd Generation Intel® Xeon® Platinum processor 3.84x higher relative images per second in Resnet50 for 16vCPU, Workload: Intel TF 2.6, batch size 1, single instance, precision FP32, Other SW: Intel Optimized Tensorflow with oneDNN, Ubuntu 18.04, Kernel 5.4.0-1045-aws.

New: M6i.4xlarge 64 GB memory capacity/instance (Xeon).

Baseline: M6g.4xlarge 64 GB memory capacity/instance (Graviton2).

ResNet50 New: November 2021

Baseline: November 2021

[71] 3.34x higher IPSec AES-GCM performance,3.78x higher IPSec AES-CMAC performance,3.84x higher IPSec AES-CTR performance,1.5x higher IPSec ZUC performance on 3rd Gen Intel® Xeon® Platinum 8380 processor 3rd Generation Intel® Xeon® Platinum processor 3.34x higher IPSec AES-GCM performance,3.78x higher IPSec AES-CMAC performance,3.84x higher IPSec AES-CTR performance,1.5x higher IPSec ZUC performance: New: 8380: 1-node, 2x Intel® Xeon® Platinum 8380 CPU on M50CYP2SB2U with 512 GB GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x8d055260, HT On, Turbo Off, Ubuntu 20.04.2 LTS, 5.4.0-66-generic, 1x Intel 1.8TB SSD OS Drive, intel-ipsec-mb v0.55, gcc 9.3.0, Glibc 2.31, test by Intel on 3/17/2021. Baseline: 8280M: 1-node, 2x Intel® Xeon® Platinum 8280M CPU on S2600WFT with 384 GB GB (12 slots/ 32GB/ 2934) total DDR4 memory, ucode 0x4003003, HT On, Turbo Off, Ubuntu 20.04.2 LTS, 5.4.0-66-generic, 1x Intel 1.8TB SSD OS Drive, intel-ipsec-mb v0.55, gcc 9.3.0, Glibc 2.31, test by Intel on 3/8/2021. Crypto New: March 17, 2021

Baseline: March 08, 2021

[70] 5.63x higher OpenSSL RSA Sign 2048 performance,1.90x higher OpenSSL ECDSA Sign p256 performance,4.12x higher OpenSSL ECDHE x25519 performance,2.73x higher OpenSSL ECDHE p256 performance 3rd Gen Intel® Xeon® Platinum 8380 processor 3rd Generation Intel® Xeon® Platinum processor 5.63x higher OpenSSL RSA Sign 2048 performance,1.90x higher OpenSSL ECDSA Sign p256 performance,4.12x higher OpenSSL ECDHE x25519 performance,2.73x higher OpenSSL ECDHE p256 performance, New: 8380: 1-node, 2x Intel® Xeon® Platinum 8380 CPU on M50CYP2SB2U with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xd000270, HT On, Turbo Off, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, 1x INTEL_​SSDSC2KG01, OpenSSL 1.1.1j, GCC 9.3.0, QAT Engine v0.6.4, test by Intel on 3/24/2021. 8380: 1-node, 2x Intel® Xeon® Platinum 8380 CPU on M50CYP2SB2U with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xd000270, HT On, Turbo Off, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, 1x INTEL_​SSDSC2KG01, OpenSSL 1.1.1j, GCC 9.3.0, QAT Engine v0.6.5, test by Intel on 3/24/2021. Baseline: 8280M:1-node, 2x Intel® Xeon® Platinum 8280M CPU on S2600WFT with 384 GB (12 slots/ 32GB/ 2934) total DDR4 memory, ucode 0x5003003, HT On, Turbo Off, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, 1x INTEL_​SSDSC2KG01, OpenSSL 1.1.1j, GCC 9.3.0, test by Intel on 3/5/2021. Crypto New: March 24, 2021

Baseline: March 05, 2021

[69] Up to 1.15x higher Compression performance, up to 1.09x Hashing, 2.3x Data Integrity, up to 3.9x Encryption performance on 3rd Gen Intel Xeon Platinum 8380 vs. prior generation 3rd Generation Intel® Xeon® Platinum processor ISA-L: New: 1-node, 2x Intel® Xeon® Platinum 8380 Processor, 40 cores HT On Turbo OFF Total Memory 512 GB (16 slots/ 32GB/ 3200 MHz), Data protection (Reed Solomon EC (10+4)), Data integrity (CRC64), Hashing (Multibuffer MD5),Data encryption (AES-XTS 128 Expanded Key), Data Compression (Level 3 Compression (Calgary Corpus)), BIOS: SE5C6200.86B.3021.D40.2103160200 (ucode: 0x8d05a260), Ubuntu 20.04.2, 5.4.0-67-generic, gcc 9.3.0 compiler, yasm 1.3.0, nasm 2.14.02, isal 2.30, isal_​crypto 2.23, OpenSSL 1.1.1.i, zlib 1.2.11, Test by Intel as of 03/19/2021. Baseline: 1-node, 2x Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo OFF Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode:0x4003003), Ubuntu 20.04.2, 5.4.0-67-generic,, gcc 9.3.0 compiler, yasm 1.3.0, nasm 2.14.02, isal 2.30, isal_​crypto 2.23, OpenSSL 1.1.1.i, zlib 1.2.11 Test by Intel as of 2/9/2021. Gen on gen comparison based on cycle/Byte performance measured on single core. ISA-L New: March 19, 2021

Baseline: February 09, 2021

[63]

Up to 3x higher 4KB Rand Read/Write 70/30 IOPS performance with 3rd Gen Intel Xeon® scalable platform supporting PCIe Gen4 Intel Optane™ SSDs vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Up to 1.37x higher 4KB Rand Read/Write 70/30 IOPS performance with 3rd Gen Intel Xeon Scalable platform with Intel® SSD DC 5510 series vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Up to 2.6x higher 4KB Random Read IOPS performance with 3rd Gen Intel Xeon® scalable platform supporting PCIe Gen4 Intel Optane™ SSDs vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Up to 1.5x higher 4KB Random Read IOPS performance with 3rd Gen Intel Xeon® scalable platform and Intel DC P5510 SSDs vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Upgrade to the latest 3rd Gen Intel Xeon Scalable family and Latest Intel® SSDs and benefit from significantly lower latency and enhanced performance Up to 15% Lower latency with Intel DC P5510 SSDs and up to 94% lower latency with Intel Optane™ SSDs

3rd Generation Intel® Xeon® Platinum processor and Intel® Optane™ persistent memory 200 series . Local IOPS: New:1-node, 2x Intel® Xeon® Platinum 8380 Processor, 40 cores HT On Turbo ON Total Memory 1024 GB (16 slots/ 64GB/ 3200 MHz), BIOS:SE5C6200.86B.2021.D40.2103100308 (ucode:0x261), Fedora 30, Linux Kernel 5.7.12, gcc 9.3.1 compiler, fio 3.20, SPDK 21.01, Storage: 16x Intel® SSD D7-P5510 7.68 TB (QD = 256) or 16x Intel® Optane™ SSD 800GB P5800X (QD = 128), Network: 2x 100GbE Intel E810-C, Test by Intel as of 3/17/2021. Baseline:1-node, 2x Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 768 GB (24 slots/ 32GB/ 2666 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode:0x4003003), Fedora 30, Linux Kernel 5.7.12, gcc 9.3.1 compiler, fio 3.20, SPDK 21.01, Storage: 16x Intel® SSD DC P4610 1.6TB, Network: 1x 100GbE Intel E810-C, Test by Intel as of 2/10/2021.  FIO 3.20 New: March 17, 2021

Baseline: February 10, 2021

[62]

Up to 1.5x more 4KB Random Read IOPS/VM performance with 3rd Gen Intel Xeon® scalable platform supporting PCIe Gen4 Intel Optane™ SSDs vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Up to 1.3x more 4KB Random Read IOPS/VM performance with 3rd Gen Intel Xeon® scalable platform and Intel DC P5510 SSDs vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Up to 286K IOPS/VM on 3rd Gen Intel Xeon® scalable platform and PCIe Gen4 Intel Optane™ SSDs for 4KB Random Read vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Up to 192K IOPS/VM on 3rd Gen Intel Xeon® scalable platform and Intel DC P5510 SSDs for 4KB Random Read vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

3rd Generation Intel® Xeon® Platinum processor and Intel® Optane™ persistent memory 200 series . Storage Virtualization: New:1-node, 2x Intel® Xeon® Platinum 8380 Processor, 40 cores HT On Turbo ON Total Memory 1024 GB (16 slots/ 64GB/ 3200 MHz BIOS:SE5C6200.86B.2021.D40.2103100308 (ucode:0x261), Fedora 30, Linux Kernel 5.7.12, gcc 9.3.1 compiler, fio 3.20, SPDK 21.01, Storage: 16x Intel® SSD D7-P5510 7.68 TB (QD = 256) or 16x Intel® Optane™ SSD 800GB P5800X (QD = 128), Network: 2x 100GbE Intel E810-C, Test by Intel as of 3/17/2021. Baseline: 1-node, 2x Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 768 GB (24 slots/ 32GB/ 2666 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode:0x4003003), Fedora 30, Linux Kernel 5.7.12, gcc 9.3.1 compiler, fio 3.20, SPDK 21.01, Storage: 16x Intel® SSD DC P4610 1.6TB, Network: 1x 100GbE Intel E810-C, Test by Intel as of 2/10/2021.  FIO 3.20 New: March 17, 2021

Baseline: February 10, 2021

[61]

Up to 2.7x higher IOPS throughput (4K random 70R/30W) for NVMe-over-TCP with the 3rd Gen Intel Xeon Scalable platform with Intel® Optane™ SSD P5800X Series vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

Up to 2.7x higher IOPS throughput (4K random 70R/30W) for NVMe-over-TCP with the 3rd Gen Intel Xeon Scalable platform with Intel® SSD DC 5510 series vs. prior generation Intel Xeon® Scalable platform supporting Intel DC P4610 SSDs

3rd Generation Intel® Xeon® Platinum processor & Intel® Optane™ persistent memory 200 series NVMe-over-TCP IOPS Throughput: New: 1-node, 2x Intel® Xeon® Platinum 8380 Processor, 40 cores HT On Turbo ON Total Memory 1024 GB (16 slots/ 64GB/ 3200 MHz), BIOS:SE5C6200.86B.2021.D40.2103100308 (ucode:0x261), Fedora 30, Linux Kernel 5.7.12, gcc 9.3.1 compiler, fio 3.20, SPDK 21.01, Storage: 16x Intel® SSD D7-P5510 7.68 TB (QD = 256) or 16x Intel® Optane™ SSD 800GB P5800X (QD = 128), Network: 2x 100GbE Intel E810-C, Test by Intel as of 3/17/2021. Baseline: 1-node, 2x Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 768 GB (24 slots/ 32GB/ 2666 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode:0x4003003), Fedora 30, Linux Kernel 5.7.12, gcc 9.3.1 compiler, fio 3.20, SPDK 21.01, Storage: 16x Intel® SSD DC P4610 1.6TB, Network: 1x 100GbE Intel E810-C, Test by Intel as of 2/10/2021. FIO 3.20 New: March 17, 2021

Baseline: February 10, 2021

[60] Up to 1.54x higher IOPS throughput (4K random 70R/30W) for CEPH with the 3rd Gen Intel Xeon Scalable platform with Intel® Optane™ SSD DC 5800X series along with Intel® SSD DC 5510 serie vs. generation Intel Xeon® Scalable platform supporting Intel DC P4510 SSDs along with Intel SSD DC P4800X series 3rd Generation Intel® Xeon® Platinum processor 1.54x Ceph: New: 8368: 5-node, 2x Intel Xeon Platinum 8368 cpu on Coyote Pass with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x8d055260, HT on, Turbo on, RHEL 8.3, 4.18.0-240.10.1.el8_​3.x86_​64, 1x Intel SSD 535 256GB M.2, 6x Intel SSD DC P5510 3.84TB, 2x Intel SSD DC P5800X 400GB, 1x Intel E810-C 100GbE, FIO 3.19, 8.3.1 20191121 (Red Hat 8.3.1-5), Podman 2.0.5, Ceph Octopus 15.2.8, test by Intel on 3/16/2021. Baseline: 8280: 5-node, 2x Intel Xeon Platinum 8280 cpu on Wolf Pass with 192 GB (12 slots/ 16GB/ 2666) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, RHEL 8.3, 4.18.0-240.10.1.el8_​3.x86_​64, 1x Intel SSD DC S3700 200GB, 6x Intel SSD DC P4510 4TB, 2x Intel SSD DC P4800X 375GB, 2x Intel XXV710 2x25GbE (100GbE bond)​FIO 3.19, 8.3.1 20191121 (Red Hat 8.3.1-5), Podman 2.0.5, Ceph Octopus 15.2.8, test by Intel on 3/26/2021. CEPH New: March 16, 2021

Baseline: March 26, 2021

[59]Up to 1.91x higher performance with the new 3rd Gen Intel Xeon Scalable Platform featuring gen 4 Intel DC P5510 SSD for Video-On-Demand CDN use case vs. prior generation Intel Xeon® Scalable platform supporting Intel® P4510 SSDs 3rd Generation Intel® Xeon® Platinum processor 1.91x CDN-Video-on-Demand with Intel SSD: New: 1 node, 2x Intel® Xeon® Platinum 8380 Processor, 40 core HT ON Turbo ON, Total Memory 256GB (16 slots/16GB/2666MT/s), BIOS SE5C6200.86B.2021.D40.2103100308  (ucode: 0x261), 8x Intel® P5510, 4x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus-6.0.7r2 revision eab14f54182a8cfe32e7db037050f246740452d8 wrk master 4/17/2019, (keep alive on, 512 total connections)  Test by Intel as of 3/17/2021. Baseline: 1 node, 2x Intel® Xeon® Gold 6258R Processor, 28 core HT ON Turbo ON, Total Memory 192GB (12 slots/16GB/2666MT/s), BIOS Dell 2.10.0 (ucode: 0x5003003), 10x Intel® P4510, 2x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus-6.0.7r2 revision eab14f54182a8cfe32e7db037050f246740452d8., wrk master 4/17/2019, (keep alive on, 512 total connections) Test by Intel as of 2/15/2021. Throughput measured with 100% Transport Layer Security (TLS) traffic with 100% target cache hit ratio. CDN-Video-on-Demand with Varnish plus New: March 17, 2021

Baseline: February 15, 2021

[58]Up to 1.72x higher performance with the new 3rd Gen Intel Xeon Scalable Platform supporting Intel® Optane™ PMem 200 Series for CDN Live use case vs. prior generation Intel Xeon® Scalable platform supporting Intel® Total Optane Persistent Memory 100 Series 3rd Generation Intel® Xeon® Platinum processor 1.72x CDN-Live with Intel PMem: New: 1 node 2x Intel® Xeon® Platinum 8380 Processor, 40 core HT ON Turbo ON, Total DRAM 256GB (16 slots/16GB/2666MT/s), Total Optane Persistent Memory 200 Series 2048GB (16 slots/128GB/2666MT/s), BIOS SE5C6200.86B.2021.D40.2103100308 (ucode: 0x261), 4x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus 6.0.7r2. Test by Intel as of 3/17/2021. (keep alive off, 512 total connections), Baseline:1 node, 2x Intel® Xeon® Gold 6258R Processor, 28 core HT ON Turbo ON, Total DRAM 192GB (12 slots/16GB/2666MT/s), Total Optane Persistent Memory 100 Series 1536GB (12 slots/128GB/2666MT/s), BIOS Dell 2.10.0 (ucode: 0x5003003), 2x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus-6.0.7r2 revision eab14f54182a8cfe32e7db037050f246740452d8., (keep alive off, 512 total connections), wrk master 4/17/2019,Test by Intel as of 2/15/2021. Throughput measured with 100% Transport Layer Security (TLS) traffic with 93.3% target cache hit ratio. CDN-Live Linear with Varnish plus New: March 17, 2021

Baseline: February 15, 2021

[56] End-to-end Census demo with 20% higher workload performance on 3rd Gen Intel® Xeon® Scalable vs. AMD Milan 3rd Generation Intel® Xeon® Platinum processor End-to-end Census demo with 20% higher workload performance on 3rd Gen Intel® Xeon® Scalable vs. AMD Milan.Baseline: 1-node, each node 2x Intel® Xeon® Platinum 8380 processor on Intel SDP with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, ucode 0x55260, HT on, Turbo on, RedHat Enterprise Linux 8.2, 4.18.0-193.28.1.el8_​2.x86_​64, x 2x Intel_​SSDSC2KG019T8, End-to-end Census Workload, Python 3.7.9, Pre-processing Modin 0.8.3, Omniscidbe v5.4.1, Intel Optimized Scikit-Learn 0.24.1, Intel oneAPI Data Analytics Library (oneDAL) daal4py 2021.2, XGBoost 1.3.3, Dataset source: IPUMS USA: https://usa.ipums.org/usa/, Dataset (size, shape): (21721922, 45), Datatypes int64 and float64, Dataset size on disk 362.07 MB, Dataset format .csv.gz, Accuracy metric MSE: mean squared error; COD: coefficient of determination, test by Intel on 3/15/2021. AMD Config: 1-node, each node 2x AMD EPYC 7763 on 3rd party server with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, ucode 0xa001119, HT on, NPS=2, Turbo on, RedHat Enterprise Linux 8.3 (Ootpa), 4.18.0-240.el8.x86_​64, x 2x Intel_​SSDSC2KG019T8, End-to-end Census Workload, Python 3.7.9, Pre-processing Modin 0.8.3, Omniscidbe v5.4.1, Intel Optimized Scikit-Learn 0.24.1, Intel oneAPI Data Analytics Library (oneDAL) daal4py 2021.2, XGBoost 1.3.3, Dataset source: IPUMS USA: https://usa.ipums.org/usa/, Dataset (size, shape): (21721922, 45), Datatypes int64 and float64, Dataset size on disk 362.07 MB, Dataset format .csv.gz, Accuracy metric MSE: mean squared error; COD: coefficient of determination, test by Intel on 5/11/2021. Demo: End-to-end Census workload performance New: May 15, 2021

Baseline: March 15, 2021

[55] Federated Learning Training: Penn's 3DResUnet tumor segmentation model - 11% accuracy improvement detecting tumor boundaries using a model trained with data from 23 hospitals over a single hospitals data. 3rd Generation Intel® Xeon® Platinum processor Federated Learning Training: Penn's 3DResUnet tumor segmentation model :This demo uses data from the public BraTS data set. Led by Perelman School of Medicine, University of Pennsylvania Federated Tumor Segmentation (FeTS) project deployed to a total of 23 locations representing 29 institutions' data (of 64 committed), with 1653/9000 patient data samples. UPenn successfully deploy batch Graphene-SGX protected OpenFL workloads to the 3rd Gen Intel® Xeon® Scalable servers based HPC nodes using their existing job management infrastructure, enabling access to the medical datasets needed for their contributions to the FeTS project. Training of Penn's 3DResUnet tumor segmentation model yields results demonstrating following: i) The validation score of the model pretrained on a small dataset dropped when validated against the larger federation validation dataset (from 0.759 to 0.724). ii) Once trained on the federation training dataset, the validation score increases and surpasses the original model (from 0.724 to 0.805, an improvement of 11.18%). Demo: Privacy Preserving Analytics New: March 19, 2021

Baseline: March 19, 2021

[54] Up to 4.23X increase in image per second - Tencent PRNet Model on Intel-Tensorflow 2.4.0 Throughput Performance on 3nd Generation Intel® Xeon® Processor Scalable Family .

Up to 5.13x increase in connections per second - Tencent TGW NGINX TLS1.2 Webserver Connection-Per-Second Performance on 3nd Generation Intel® Xeon® Processor Scalable Family

3rd Generation Intel® Xeon® Platinum processor Tencent PRNet Model:

New: Test by Intel as of 03/19/2021. 2-node, 2x 3rd Gen Intel® Xeon® Scalable Processor, 36 cores HT On Turbo ON Total Memory 256 GB (16 slots/ 16GB/ 3200 MHz), BIOS: SE5C6200.86B.3020.P19.2103170131 (ucode: 0x8d05a260), CentOS 8.3, 4.18.0-240.1.1.el8_​3.x86_​64, gcc 8.3.1 compiler, PRNet Model, Deep Learning Framework: Intel-Tensorflow 2.4.0, https://github.com/Intel-tensorflow/tensorflow/releases/tag/v2.4.0, BS=1, Dummy Data, 18 instances/2 sockets, Datatype: FP32/INT8

Baseline: Test by Intel as of 03/19/2021. 2-node, 2x 2nd Gen Intel® Xeon® Scalable Processor, 24 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0438.032620191658(ucode:0x5003003), CentOS 8.3, 4.18.0-240.10.1.el8_​3.x86_​64, gcc 8.3.1 compiler, PRNet Model, Deep Learning Framework: Intel-Tensorflow 2.4.0, https://github.com/Intel-tensorflow/tensorflow/releases/tag/v2.4.0, BS=1, Dummy Data, 12 instances/2 sockets, Datatype: FP32/INT8

Tencent TGW:

New: Test by Intel as of 3/19/2021. 1-node, 2x 3rd Gen Intel® Xeon® Scalable Processor, 36 cores HT On Turbo ON Total Memory 256 GB (16 slots/ 16GB/ 3200 MHz), BIOS: SE5C6200.86B.3020.P19.2103170131 (ucode: 0x8d05a260), CentOS 8.3, 4.18.0-240.1.1.el8_​3.x86_​64, gcc 8.3.1 compiler, NGINX 1.18, OpenSSL 1.1.1f, QAT Engine 0.6.4, Ipp Crypto MB 2020 update3

Baseline: Test by Intel as of 3/13/2021. 1-node, 2x 2nd Gen Intel® Xeon® Scalable Processor, 24 cores HT On Turbo ON Total Memory 192 GB (12 slots/ 16GB/ 2933 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode:0x5003003), CentOS 8.3, 4.18.0-240.10.1.el8_​3.x86_​64, gcc 8.3.1 compiler, NGINX 1.18, OpenSSL 1.1.1f

Demo: Performance Made Flexible - Tensent New: March 19, 2021

Baseline: March 19, 2021

[53] Up to 34% overall gen to gen improvement in Images Per Second processed with Intel® 3rd Gen Xeon® Scalable processors 3rd Generation Intel® Xeon® Platinum processor Claro360 social_​distance_​V1.0, Person-detection-retail-0013 (INT8): New: Test by Intel as of 03/19/2021. 1-node, 2x Intel® Xeon® Platinum 8380 Processor, 40 cores HT On Turbo ON Total Memory 512 GB (16 slots/ 32GB/ 3200 MT/s), BIOS: BIOS: SE5C6200.86B.3021.D40.2103160200 (ucode:0x261), Ubuntu 18.04.5 LTS, 5.4.0-66-generic, claro360 workload not public, score=3659ips Baseline: Test by Intel as of 3/19/2021. 1-node, 2x Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MT/s), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode:0x5003003), Ubuntu 18.04.5 LTS, 5.4.0-66-generic, claro360 workload not public, score=2716ips Demo: Pandemic Safety Solution New: March 19, 2021

Baseline: March 19, 2021

[52] 10x performance boost due to using Intel® oneAPI AI Analytics Toolkit optimizations with Intel Distribution of Modin and Intel Extension for Scikit-Learn 2nd and 3rd Generation Intel® Xeon® processors Baseline: 1-node, each node 2x 2nd Gen Intel® Xeon® Platinum 8280L processor on Intel SDP with 384 GB (12 slots/ 32GB/ 2933[2933]) total DDR4 memory, ucode 0x4003003, CPU governor: performance, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, x 2x Intel_​SSDSC2KG019T8, End-to-end Census Workload (Stock), Scikit-learn 0.24.1, Pandas 1.2.2, Python 3.9.7, Census Data, (21721922, 45) Dataset is from IPUMS USA, University of Minnesota, www.ipums.org [Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas and Matthew Sobek. IPUMS USA: Version 10.0 [dataset]. Minneapolis, MN: IPUMS, 2020. https://doi.org/10.18128/D010.V10.0], test by Intel on 2/19/2021. Optimized Config: 1-node, each node 2x 2nd Gen Intel® Xeon® Platinum 8280L processor on Intel SDP with 384 GB (12 slots/ 32GB/ 2933[2933]) total DDR4 memory, ucode 0x4003003, CPU governor: performance, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, x 2x Intel_​SSDSC2KG019T8, End-to-end Census Workload (Optimized), Scikit-learn 0.24.1 accelerated by daal4py 2021.2, modin 0.8.3, omniscidbe v5.4.1 (Intel® oneAPI AI Analytics Toolkit optimizations - Intel Distribution of Modin and Intel Extension for Scikit-Learn), Pandas 1.2.2, Python 3.9.7, Census Data, (21721922, 45) Dataset is from IPUMS USA, University of Minnesota, www.ipums.org [Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas and Matthew Sobek. IPUMS USA: Version 10.0 [dataset]. Minneapolis, MN: IPUMS, 2020. https://doi.org/10.18128/D010.V10.0], test by Intel on 2/19/2021. Demo: End-to-end Census workload performance New: February 19, 2021

Baseline: February 19, 2021

[51] Delivers up to 50% performance increase and 31% total cost reduction with Intel Xeon Platinum 8360Y vs. the prev gen Intel Xeon Gold 5218 3rd Generation Intel® Xeon® Platinum processor Lightbits FIO: New: (3rd Gen Intel Xeon): Test by Lightbits as of 3/23/2021. 5-node, Intel® Xeon® Platinum 8360Y Processor, 36 cores, Utilized: 24 cores, HT On Turbo ON Total Memory 2560 GB (16 slots/ 32GB/ 3200 MHz, 16 slots/ DCPMM 128GB/ 2666 MHz), BIOS: SE5C6200.86B.3021.D40.2103160200, CentOS 7.8, 4.14.216-41421769bde239058b6e-rel-lb, fio-3.1 Baseline(CLX): Test by Lightbits as of 3/23/2021. 8-node, Intel® Xeon® Gold 5218 Processor, 16 cores, Utilized: 16 cores, HT On Turbo ON Total Memory 704 GB (20 slots/ 32GB/ 2666 MHz, 4 slots/ NVDIMM 16GB/ 2666 MHz), BIOS: 3.3a, CentOS 7.8, 4.14.216-41421769bde239058b6e-rel-lb, fio-3.1

Intel® Optane™ persistent memory pricing & DRAM pricing referenced in TCO calculations is provided for guidance and planning purposes only and does not constitute a final offer. Pricing guidance is subject to change and may revise up or down based on market dynamics. Please contact your OEM/distributor for actual pricing. Pricing guidance as of March, 2021.

Demo: Built for the Cloud: Lightbits FIO New: March 23, 2021

Baseline: March 23, 2021

[50] Up to 18% better performance and 20% lower average latency on Cloud Microservices vs. AMD EPYC 3rd Generation Intel® Xeon® Platinum processor

Up to 18% better performance and 20% lower average latency on Cloud Microservices vs. AMD EPYC.

New: Super Micro SYS-220U-TNR, 2x Intel® Xeon Platinum 8358 (32C, 2.6GHz, 250W TDP), HT On, Turbo ON, SNC OFF, Total Memory: 512 GB (16 slots/ 32GB/ 3200 MHz), ucode: x280, 100GbE Network, RHEL 8.4, Kernel: 4.18.0-305.el8.x86_​64. Load generator: wrk, DeathStarBench: https://github.com/delimitrou/DeathStarBench, B509c933faca3e5b4789c6707d3b3976537411a9 (base-commit+ Intel Optimizations that are not released as of July 2021). Geomean of Hotel Resv, Social Net, and Media workloads. Tested by Intel as of July 2021.

AMD EPYC: Dell PowerEdge R7525, 2x AMD 7543 (32C, 2.8GHz, 240W cTDP), SMT On, Boost ON, NPS=1, Total Memory: 1 TB (16 slots/ 64GB/ 3200 MHz), ucode: 0xa00111d, 100GbE Network, RHEL 8.4, Kernel: 4.18.0-305.el8.x86_​64. Load generator: wrk, DeathStarBench: https://github.com/delimitrou/DeathStarBench, B509c933faca3e5b4789c6707d3b3976537411a9 (base-commit+ Intel Optimizations that are not released as of July 2021). Geomean of Hotel Resv, Social Net, and Media workloads. Tested by Intel as of July 2021.

DeathStarBench New: July 2021

Baseline: July 2021

[49] Up to 6% lower cost per user with Intel® Optane™ PMem on VDI vs. AMD EPYC at equal software VMware licensing 3rd Generation Intel® Xeon® Platinum processor

Up to 6% lower cost per user with Intel® Optane™ PMem on VDI vs. AMD EPYC at equal software VMware licensing.

New: 4-node, Each node: Intel Software Development Platform 1TB PMem, 2x Intel® Xeon Platinum 8358 (32C, 2.6GHz, 250W TDP), HT On, Turbo ON, SNC OFF, Total Memory: 1 TB (16 slots/ 16GB/ 3200 MHz + 8 slots/128GB/PMem), ucode: x280, 2x 25GbE Intel Ethernet E810-XXVDA2, Gen4 x8, Per node cache tier: 2x 400GB Optane P5800x, Gen4, Per node capacity tier: 6x Intel P5510 3.84TB , Gen4, ESXi 7.0u2 17630552, vCenter 7.0u2 17694817, Citrix Virtual Apps and Desktops 7 2103, LoginVSI 4.1.40. Knowledge worker profile 2vCPU/4GB. 4-node solution cost $379K (Per node: base system @ $13757,https://www.thinkmate.com/system/rax-xt10-21s3-aiom, 1TB of Optane: 16x16GB DDR4 + 8x128GB Optane PMem 200 $12136, Cache tier: 2x Optane P5800X drives $1162.79/each, https://www.cdw.com/product/intel-optane-ssd-dc-p5800x-series-solid-state-drive-400-gb-pci-expres/6457148?pfm=srh, Capacity tier: 6x P5510 drives $758.09/each, https://www.cdw.com/product/intel-solid-state-drive-d7-p5510-series-solid-state-drive-3.84-tb-u.2/6509262?pfm=srh, 3 year of VMware S/W: ESXi License, https://www.cdw.com/product/vmware-vsan-enterprise-plus-v.-7-license-1-processor/6030177?pfm=srh, vSAN License, https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030176?pfm=srh, ESXi Support, https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030336?pfm=srh, vSAN support, https://www.cdw.com/product/vmware-vsphere-enterprise-plus-v.-7-license-1-processor/6030328?pfm=srh, $28938.50, Citrix per user: $144.39, https://www.cdw.com/product/citrix-virtual-apps-and-desktops-advanced-edition-license-1-user-device/3625316?pfm=srh). Tested by Evaluator Group as of July 2021 and prices as of July 2021.

AMD EPYC: 4-node, Each node: Dell PowerEdge R7525, 2x AMD 7543 (32C, 2.8GHz, 240W cTDP), SMT On, Boost ON, NPS=1, Total Memory: 1 TB (16 slots/ 64GB/ 3200 MHz), ucode: 0x1911000A00, 2x25GbE Mellanox Connectx-5, Gen4 x8, Per node cache tier: 2x 1.92TB Samsung PM1733 Gen4, Per node capacity tier: 6x Intel P5510 3.84TB, Gen4, ESXi 7.0u2 17630552, vCenter 7.0u2 17694817, Citrix Virtual Apps and Desktops 7 2103, LoginVSI 4.1.40. Knowledge worker profile 2vCPU/4GB. 4-node solution cost $410K (Per node: base system @ $13414, https://www.thinkmate.com/system/rax-qn10-21e2, 1TB DDR memory: 16x64GB DDR4 $21200, Cache tier: 2x Samsung PM1733 $478.79/each, https://www.cdw.com/product/samsung-pm1733-mzwlj1t9hbjr-solid-state-drive-1.92-tb-pci-express-4.0/6247494?pfm=srh, Capacity tier: 6x P5510 drives $758.09/each, https://www.cdw.com/product/intel-solid-state-drive-d7-p5510-series-solid-state-drive-3.84-tb-u.2/6509262?pfm=srh, 3 year of VMware S/W: ESXi License, https://www.cdw.com/product/vmware-vsan-enterprise-plus-v.-7-license-1-processor/6030177?pfm=srh, vSAN License, https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030176?pfm=srh, ESXi Support, https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030336?pfm=srh, vSAN support, https://www.cdw.com/product/vmware-vsphere-enterprise-plus-v.-7-license-1-processor/6030328?pfm=srh, $28938.50, Citrix per user: $144.39, https://www.cdw.com/product/citrix-virtual-apps-and-desktops-advanced-edition-license-1-user-device/3625316?pfm=srh). Tested by Evaluator Group as of July 2021 and prices as of July 2021.

LoginVSI/Citrix/VMware New: July 2021

Baseline: July 2021

[48] Up to 39% higher throughput and 26% lower latency on VMware vSAN hyper-converged infrastructure performance vs. AMD EPYC 3rd Generation Intel® Xeon® Platinum processor

Up to 39% higher throughput and 26% lower latency on VMware vSAN hyper-converged infrastructure performance vs. AMD EPYC.

New: 4-node, Each node: Intel Software Development Platform, 2x Intel® Xeon Platinum 8358 (32C, 2.6GHz, 250W TDP), HT On, Turbo ON, SNC OFF, Total Memory: 1 TB (16 slots/ 64GB/ 3200 MHz), ucode: x280, 2x 25GbE Intel Ethernet E810-XXVDA2, Gen4 x8, Per node cache tier: 2x 400GB Optane P5800x, Gen4, Per node capacity tier: 6x Intel P5510 3.84TB , Gen4, ESXi 7.0u2 17630552, vCenter 7.0u2 17694817, HCIBench 2.5.3, Vdbench 50407. Geomean of 7 test cases (R, R/W, W, varying block sizes, random and sequential). Tested by Evaluator Group as of May 2021.

AMD EPYC: 4-node, Each node: Dell PowerEdge R7525, 2x AMD 7543 (32C, 2.8GHz, 240W cTDP), SMT On, Boost ON, NPS=1, Total Memory: 1 TB (16 slots/ 64GB/ 3200 MHz), ucode: 0x1911000A00, 2x25GbE Mellanox Connectx-5, Gen4 x8, Per node cache tier: 2x 1.92TB Samsung PM1733 Gen4, Per node capacity tier: 6x Intel P5510 3.84TB , Gen4, ESXi 7.0u2 17630552, vCenter 7.0u2 17694817,1, HCIBench 2.5.3, Vdbench 50407, Geomean of 7 test cases (R, R/W, W, varying block sizes, random and sequential). Tested by Evaluator Group as of May 2021.

HCIBench New: May 2021

Baseline: May 2021

[47] 1.57x better performance on LAMMPS Geomean of Polyethylene, Stillinger-Weber, Tersoff, Water; 1.62x better performance on NAMD Geomean of Apoa1, STMV; 1.68x better performance on RELION Plasmodium Ribosome; 1.37x better performance on Binomial Options, 2.05x better performance on Monte Carlo vs. AMD EPYC* 3rd Generation Intel® Xeon® Platinum processor

1.57x better performance on LAMMPS Geomean of Polyethylene, Stillinger-Weber, Tersoff, Water: New: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v2020-10-29; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -xCORE-AVX512 -qopt-zmm-usage=high. AMD EPYC: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: v2020-10-29; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -march=core-avx2, tested by Intel and results as of April 2021.

1.62x better performance on NAMD Geomean of Apoa1, STMV: New: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL, Intel C Compiler 2020u4, Intel MPI 2019u8, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -ip -fp-model fast=2 -no-prec-div -qoverride-limits -qopenmp-simd -O3 -xCORE-AVX512 -qopt-zmm-usage=high. AMD EPYC: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL, AOCC 2.2.0, gcc 9.3.0, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -fomit-frame-pointer -march=znver1 -ffast-math, tested by Intel and results as of April 2021.

1.68x better performance on RELION Plasmodium Ribosome: New:Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 3_​1_​1; Build notes: Tools: Intel C Compiler 2020u4, Intel MPI 2019u9; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -g -debug inline-debug-info -xCOMMON-AVX512 -qopt-report=5 -restrict. AMD EPYC: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: 3_​1_​1; Build notes: Tools: Intel C Compiler 2020u4, Intel MPI 2019u9; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -g -debug inline-debug-info -march=core-avx2 -qopt-report=5 -restrict, tested by Intel and results as of April 2021.

1.37x better performance on Binomial Options: New: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v1.0; Build notes: Tools: Intel C Compiler 2020u4, Intel Threading Building Blocks ; threads/core: 2; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-domain-exclusion=31 -fimf-accuracy-bits=11 -no-prec-div -no-prec-sqrt. AMD EPYC: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: v1.0; Build notes: Tools: Intel C Compiler 2020u4, Intel Threading Building Blocks ; threads/core: 2; Turbo: used; Build knobs: -O3 -march=core-avx2 -fimf-domain-exclusion=31 -fimf-accuracy-bits=11 -no-prec-div -no-prec-sqrt, tested by Intel and results as of April 2021.

2.05x better performance on Monte Carlo: New: Platinum 8358: 1-node, 2x Intel® Xeon® Platinum 8358 (32C/2.6GHz, 250W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v1.1; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 1; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt. AMD EPYC: 1-node, 2-socket AMD EPYC 7543 (32C/2.8GHz, 240W cTDP) on Dell PowerEdge R7525 server with 1024 GB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xa001119, SMT on, Boost on, Power deterministic mode, NPS=4, Red Hat Enterprise Linux 8.3, 4.18, 2x Micron 5300 Pro, App Version: v1.1; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -O3 -march=core-avx2 -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt, tested by Intel and results as of April 2021.

Binomial Options, LAMMPS, NAMD, RELION, Monte Carlo New: April 2021

Baseline: April 2021

[46] 3X Web Microservices performance (1 second SLA) with 3rd Gen Intel Xeon Scalable processor vs. AMD EPYC* Milan 3rd Generation Intel® Xeon® Platinum processor 3.0x higher CloudXPRT Web Microservices with SLA < 1 second. New: 2-socket Intel® Xeon® Platinum 8380 (40C/2.3GHz, 270W TDP) on Intel Software Development, 512GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode x270, HT on, Turbo on, SNC off, Ubuntu 20.04 LTS, 5.8.0-40-generic, CloudXPRT version 1.1, Tested by Intel and results as of February 2021. AMD EPYC Milan: 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92, SMT on, Boost on, Power deterministic mode, NPS=1, 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, Ubuntu 20.04 LTS, 5.8.0-40-generic, CloudXPRT version 1.1. Tested by Intel and results as of March 2021. Intel contributes to the development of benchmarks by participating in, sponsoring, and/or contributing technical support to various benchmarking groups, including the BenchmarkXPRT Development Community administered by Principled Technologies. CloudXPRT Web Microservices New: March 2021

Baseline: February 2021

[45] 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost INT8 delivers up to 25x better inference throughput vs. AMD Milan FP32 across a diverse set of AI workloads that include Image Classification, Object Detection, Natural Language Processing and Image Recognition 3rd Generation Intel® Xeon® Platinum processor Up to 25x higher AI performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC 7763 (64C Milan):

New: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,56, INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on March 2021.

Baseline:1-node, 2x AMD EPYC 7763 on GigaByte with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Samsung_​MZ7LH3T8, MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,56, FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/mobilenet_v1, tested by Intel and results as of March 2021.

MobileNet-v1 (Up to 25x better inference performance) New: March 2021

Baseline: March 2021

[44] 1.3x higher AI performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. NVIDIA A100 (geomean of 20 workloads including logistic regression inference, logistic regression fit, ridge regression inference, ridge regression fit, linear regression inference, linear regression fit, elastic net inference, XGBoost Fit, XGBoost predict, SSD-ResNet34 inference, Resnet50-v1.5 inference, Resnet50-v1.5 training, BERT Large SQuaD inference, kmeans inference, kmeans fit, brute_​knn inference, SVC inference, SVC fit, dbscan fit, traintestsplit) 3rd Generation Intel® Xeon® Platinum processor 1.3x higher AI performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. NVIDIA A100 GPU: (geomean of 20 workloads including logistic regression inference, logistic regression fit, ridge regression inference, ridge regression fit, linear regression inference, linear regression fit, elastic net inference, XGBoost Fit, XGBoost predict, SSD-ResNet34 inference, Resnet50-v1.5 inference, Resnet50-v1.5 training, BERT Large SQuaD inference, kmeans inference, kmeans fit, brute_​knn inference, SVC inference, SVC fit, dbscan fit, traintestsplit)

3rd Gen Intel Xeon Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, tested by Intel, and results as of March 2021.

DL Measurements on A100: 1-node, 2-socket AMD EPYC 7742 (64C) with 256GB (8 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x8301038, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-42-generic, INTEL SSDSC2KB01, NVIDIA A100-PCIe-40GB, HBM2-40GB, Accelerator per node =1, tested by Intel, and results as of March 2021.

ML Measurements on A100: 1-node, 2-socket AMD EPYC 7742 (64C) with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x8301034, HT on, Turbo on, Ubuntu 18.04.5 LTS, 5.4.0-42-generic,NVIDIA A100 (DGX-1), 1.92TB M.2 NVMe, 1.92TB M.2 NVMe RAID tested by Intel, and results as of March 2021.

ResNet50-v1.5 Intel: gcc-9.3.0, oneDNN 1.6.4, BS=1, INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart

ResNet50-v1.5 NVIDIA:A100 (7 instance/GPU), BS=1,TensorFlow - 1.5.5 (NGC: tensorflow:21.02-tf1-py3), https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Classification/ConvNets/resnet50v1.5, TF AMP (FP16+TF32);

ResNet50-v1.5 Training Intel: gcc-9.3.0, oneDNN 1.6.4, BS=256, FP32, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart.

ResNet50-v1.5 Training NVIDIA:A100, BS=256,TensorFlow - 1.5.5 (NGC: tensorflow:21.02-tf1-py3),https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Classification/ConvNets/resnet50v1.5, TF32; BERT-Large SQuAD Intel: gcc-9.3.0, oneDNN 1.6.4, BS=1, INT8,

TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/ A100: BERT-Large SQuAD, BS=1, A100 (7 instance/GPU), TensorFlow - 1.5.5 (NGC: tensorflow:20.11-tf1-py3), https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT,TF AMP (FP16+TF32) ; SSD-ResNet34 Intel: gcc-9.3.0, oneDNN 1.6.4, BS=1, INT8,

TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, SSD-ResNet34 INVIDIA: A100 (7 instance/GPU), BS=1,Pytorch - 1.8.0a0 (NGC Container, latest supported): A100: SSD-ResNet34 (NGC: pytorch:20.11-py3), https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD, AMP (FP16 +TF32) ; Python: Intel: Python 3.7.9, Scikit-Learn: Sklearn 0.24.1, OneDAL: Daal4py 2021.2, XGBoost: XGBoost 1.3.3 Python: NVIDIA A100: Python 3.7.9, Scikit-Learn: Sklearn 0.24.1, CuML 0.17, XGBoost 1.3.0dev.rapidsai0.17, Nvidia RAPIDS: RAPIDS 0.17, CUDA Toolkit: CUDA 11.0.221 Benchmarks: https://github.com/IntelPython/scikit-learn_bench

geomean of 20 workloads including logistic regression inference, logistic regression fit, ridge regression inference, ridge regression fit, linear regression inference, linear regression fit, elastic net inference, XGBoost Fit, XGBoost predict, SSD-ResNet34 inference, Resnet50-v1.5 inference, Resnet50-v1.5 training, BERT Large SQuaD inference, kmeans inference, kmeans fit, brute_​knn inference, SVC inference, SVC fit, dbscan fit, traintestsplit New: March 2021

Baseline: March 2021

[43] 1.5x higher AI performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan (geomean of 20 workloads including logistic regression inference, logistic regression fit, ridge regression inference, ridge regression fit, linear regression inference, linear regression fit, elastic net inference, XGBoost Fit, XGBoost predict, SSD-ResNet34 inference, Resnet50-v1.5 inference, Resnet50-v1.5 training, BERT Large SQuaD inference, kmeans inference, kmeans fit, brute_​knn inference, SVC inference, SVC fit, dbscan fit, traintestsplit) 3rd Generation Intel® Xeon® Platinum processor 1.5x higher AI performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC 7763 (64C Milan): (geomean of 20 workloads including logistic regression inference, logistic regression fit, ridge regression inference, ridge regression fit, linear regression inference, linear regression fit, elastic net inference, XGBoost Fit, XGBoost predict, SSD-ResNet34 inference, Resnet50-v1.5 inference, Resnet50-v1.5 training, BERT Large SQuaD inference, kmeans inference, kmeans fit, brute_​knn inference, SVC inference, SVC fit, dbscan fit, traintestsplit)

3rd Gen Intel Xeon: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic/5.4.0-64-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, tested by Intel, and results as of March 2021.

AMD: EPYC 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Samsung_​MZ7LH3T8/INTEL SSDSC2KG019T8, tested by Intel, and results as of March 2021.

ResNet50-v1.5 Intel: gcc-9.3.0, oneDNN 1.6.4, BS=128, INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart, ResNet50-v1.5 AMD: gcc-9.3.0, oneDNN 1.6.4, BS=128, FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/resnet50v1_5

ResNet50-v1.5 Training Intel: gcc-9.3.0, oneDNN 1.6.4, BS=256, FP32, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart, ResNet50-v1.5 Training AMD: gcc-9.3.0, oneDNN 1.6.4, BS=256, FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/resnet50v1_5 SSD-ResNet34 Intel: gcc-9.3.0, oneDNN 1.6.4, BS=1, INT8,

TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, AMD: SSD-ResNet34, gcc-9.3.0, oneDNN 1.6.4, BS=1, FP32, TensorFlow- 2.4, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/object_detection/tensorflow/ssd-resnet34 BERT-Large SQuAD Intel: gcc-9.3.0, oneDNN 1.6.4, BS=1, INT8,

TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, AMD: BERT-Large SQuAD, gcc-9.3.0, oneDNN 1.6.4, BS=1, FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/language_modeling/tensorflow/bert_large Python: Python 3.7.9, SciKit-Learn: Sklearn 0.24.1, oneDAL: Daal4py 2021.2, XGBoost: XGBoost 1.3.3: Benchmarks: https://github.com/IntelPython/scikit-learn_bench

geomean of 20 workloads including logistic regression inference, logistic regression fit, ridge regression inference, ridge regression fit, linear regression inference, linear regression fit, elastic net inference, XGBoost Fit, XGBoost predict, Mobilenet-v1 inference, Resnet50-v1.5 inference, Resnet50-v1.5 training, BERT Large SQuaD inference, kmeans inference, kmeans fit, brute_​knn inference, SVC inference, SVC fit, dbscan fit, traintestsplit New: March 2021

Baseline: March 2021

[42] With Intel® 3rd Gen Xeon® Scalable processors and the latest Intel® Optane™ Persistent Memory you can get up to 63% higher throughput and 33% more memory capacity, enabling you to serve the same number of subscribers at higher resolution or a greater number of subscribers at the same resolution.

With Intel® 3rd Gen Xeon® Scalable processors, CoSP's can increase 5G UPF performance by 42%. Combined with Intel Ethernet 800 series adapters, they can deliver the performance, efficiency and trust for use cases that require low latency, including augmented reality, cloud-based gaming, discrete automation and even robotic-aided surgery.

With Intel® 3rd Gen Intel® Xeon® Scalable processors, Ethernet 800 series and vRAN dedicated accelerators, CoSP's can get up to 1.81x MIMO Midhaul Throughput in a similar power envelope for a best-in-class 3x100mhz 64T64R configuration.

3rd Generation Intel® Xeon® Platinum processor 1.63x CDN-Live Linear: New: 1 node, 2x Intel® Xeon® Gold 6338N Processor, 32 core HT ON Turbo ON, Total DRAM 256GB (16 slots/16GB/2666MT/s), Total Optane Persistent Memory 200 Series 2048GB (16 slots/128GB/2666MT/s), BIOS SE5C6200.86B.2021.D40.2103100308 (ucode: 0x261), 4x Intel® E810, Ubuntu 20.04, kernel 5.4.0-65-generic, gcc 9.3.0 compiler, openssl 1.1.1h, varnish-plus 6.0.7r2. 2 clients, Test by Intel as of 3/11/2021. Baseline: Gold 6252N: 2x Intel® Xeon® Gold 6252N Processor, 24 core HT ON Turbo ON, Total DRAM 192GB (12 slots/16GB/2666MT/s), Total Optane Persistent Memory 100 Series 1536GB(12 slots/128GB/2666MT/s), 1x Mellanox MCX516A-CCAT, BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode: 0x5003003), Ubuntu 20.04, kernel 5.4.0-65-generic, wrk master 4/17/2019. Test by Intel as of 2/15/2021. Throughput measured with 100% Transport Layer Security (TLS) traffic with 93.3% target cache hit ratio and keep alive on, 512 total connections.

1.42x 5G UPF: New: 1-node, 2(1 socket used)x 3rd Gen Intel Xeon Gold 6338N on Whitley Coyote Pass 2U with 128 GB (8 slots/ 16GB/ 2666) total DDR4 memory, ucode 0x261, HT on, Turbo off, Ubuntu 18.04.5 LTS, 4.15.0-134-generic, 1x Intel 810 (Columbiaville), FlexCore 5G UPF, Jan' 2021 MD5 checksum: c4ad7f8422298ceb69d01e67419ff1c1, GCC 7.5.0, 5G UPF228 Gbps / 294 Gbps, test by Intel on 3/16/2021. Baseline: 1-node, 2(1 socket used)x Intel Xeon Gold 6252N on SuperMicro* X11DPG-QT with 96 GB (6 slots/ 16GB/ 2934) total DDR4 memory, ucode 0x5003003, HT on, Turbo off, Ubuntu 18.04.5 LTS, 4.15.0-132-generic, 1x Intel 810 (Columbiaville), FlexCore 5G UPF, Jan' 2021 MD5 checksum: c4ad7f8422298ceb69d01e67419ff1c1, GCC 7.5.0, 5G UPF161 Gbps / 213 Gbps, test by Intel on 2/12/2021.

FleXRAN: New: 1 node, 1 socket Intel Xeon Gold 6338N Processor, 32 core HT ON Turbo ON, Total DRAM 128 GB (8 slots/16GB/2666), BIOS WLYDCRB.SYS.WR.64.2021.09.4.04.0636_​0020.P86_​P80260_​LBG_​SPS_​8d055260_​EARLYG (ucode 0x261), Intel Mount Bryce (ACC100), CentOS 7.8.2003, 3.10.0-1127.19.1.rt56.1116.el7.x86_​64, FlexRAN L1 Massive MIMO, tested by Intel on 3/18/2021. Baseline: 1 node, 1 socket Intel Xeon Gold 6212U Processor, 24 core HT ON Turbo ON, Total Dram 96 GB (6 slots/16GB/2993), BIOS SE5C620.86B.02.01.0012.070720200218, Intel Mount Bryce (ACC100), CentOS 7.8.2003, 3.10.0-1127.19.1.rt56.1116.el7.x86_​64, FlexRAN L1 Massive MIMO, tested by Intel on 2/23/2021.

Demo: VRAN, 5G UPF, CDN-Live Linear New: March 11, 2021

Baseline: February 15, 2021

[41] Up to 53% better performance, 66% higher VM density, and 46% better performance per dollar on VMware ESXi/vSAN virtualization vs. AMD EPYC 3rd Generation Intel® Xeon® Platinum processor

Up to 53% better performance, 66% higher VM density, and 46% better performance per dollar on VMware ESXi/vSAN virtualization vs. AMD EPYC.

New: 4-node, Each node: Intel Software Development Platform 2TB PMem , 2x Intel® Xeon Platinum 8358 (32C, 2.6GHz, 250W TDP), HT On, Turbo ON, SNC OFF, Total Memory: 2 TB (16 slots/ 32GB/ 3200 MHz + 16 slots/128GB/PMem), ucode: x280, 2x 25GbE Intel Ethernet E810-XXVDA2, Gen4 x8, Per node cache tier: 2x 400GB Optane P5800x, Gen4, Per node capacity tier: 6x Intel P5510 3.84TB , Gen4, ESXi 7.0u2 17630552, vCenter 7.0u2 17694817, 4-node solution cost $293K (Per node: base system @ $13757, https://www.thinkmate.com/system/rax-xt10-21s3-aiom, 2TB of Optane: 16x32GB DDR4 + 16x128GB Optane PMem 200 $23728, Cache tier: 2x Optane P5800X drives $1162.79/each, https://www.cdw.com/product/intel-optane-ssd-dc-p5800x-series-solid-state-drive-400-gb-pci-expres/6457148?pfm=srh, Capacity tier: 6x P5510 drives $758.09/each, https://www.cdw.com/product/intel-solid-state-drive-d7-p5510-series-solid-state-drive-3.84-tb-u.2/6509262?pfm=srh, 3 year of VMware S/W: ESXi License https://www.cdw.com/product/vmware-vsan-enterprise-plus-v.-7-license-1-processor/6030177?pfm=srh, vSAN License https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030176?pfm=srh, ESXi Support, https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030336?pfm=srh, vSAN support, https://www.cdw.com/product/vmware-vsphere-enterprise-plus-v.-7-license-1-processor/6030328?pfm=srh, $28938.50). Tested by Evaluator Group as of July 2021 and prices as of July 2021.

AMD EPYC: 4-node, Each node: Dell PowerEdge R7525, 2x AMD 7543 (32C, 2.8GHz, 240W cTDP), SMT On, Boost ON, NPS=1, Total Memory: 1 TB (16 slots/ 64GB/ 3200 MHz), ucode: 0x1911000A00, 2x25GbE Mellanox Connectx-5, Gen4 x8, Per node cache tier: 2x 1.92TB Samsung PM1733 Gen4, Per node capacity tier: 6x Intel P5510 3.84TB , Gen4, ESXi 7.0u2 17630552, vCenter 7.0u2 17694817, 4-node solution cost $276K (Per node: base system @ $13414, https://www.thinkmate.com/system/rax-qn10-21e2, 1TB DDR memory: 16x64GB DDR4 $21200, Cache tier: 2x Samsung PM1733 $478.79/each, https://www.cdw.com/product/samsung-pm1733-mzwlj1t9hbjr-solid-state-drive-1.92-tb-pci-express-4.0/6247494?pfm=srh, Capacity tier: 6x P5510 drives $758.09/each, https://www.cdw.com/product/intel-solid-state-drive-d7-p5510-series-solid-state-drive-3.84-tb-u.2/6509262?pfm=srh, 3 year of VMware S/W: ESXi License, https://www.cdw.com/product/vmware-vsan-enterprise-plus-v.-7-license-1-processor/6030177?pfm=srh, vSAN License, https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030176?pfm=srh, ESXi Support, https://www.cdw.com/product/vmware-support-and-subscription-production-technical-support-for-vmware/6030336?pfm=srh, vSAN support, https://www.cdw.com/product/vmware-vsphere-enterprise-plus-v.-7-license-1-processor/6030328?pfm=srh, $28938.50). Tested by Evaluator Group as of July 2021 and prices as of July 2021.

Leading virtualization benchmark New: July 2021

Baseline: July 2021

[40] 3.20x higher OpenSSL RSA Sign 2048 performance 3rd Gen Intel® Xeon® Scalable vs. AMD Milan

2.03x higher OpenSSL ECDHE x25519 performance 3rd Gen Intel® Xeon® Scalable vs. AMD Milan

3rd Generation Intel® Xeon® Platinum processor 3.20x higher OpenSSL RSA Sign 2048 performance

2.03x higher OpenSSL ECDHE x25519 performance

3rd Gen Intel Xeon: 8380: 1-node, 2x Intel® Xeon® Platinum 8380 CPU on M50CYP2SB2U with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xd000270, HT On, Turbo Off, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, 1x INTEL_​SSDSC2KG01, OpenSSL 1.1.1j, GCC 9.3.0, QAT Engine v0.6.4, Tested by Intel and results as of March 2021.

AMD: 7763: 1-node, 2x AMD EPYC 7763 64-Core Processor on R282-Z92-00 with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, HT On, Turbo Off, Ubuntu 20.04.1 LTS, 5.4.0-65-generic, 1x SAMSUNG_​MZ7LH3T8, OpenSSL 1.1.1j, GCC 9.3.0, Tested by Intel and results as of March 2021.

OpenSSL New: March 2021

Baseline: March 2021

[39] 1.18x higher LINPACK performance with 3rd Gen Intel® Xeon® Scalable vs. AMD Milan 3rd Generation Intel® Xeon® Platinum processor 1.18x higher performance on LINPACK

3rd Gen Intel Xeon: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: The Intel Distribution for LINPACK Benchmark; Build notes: Tools: Intel MPI 2019u7; threads/core: 1; Turbo: used; Build: build script from Intel Distribution for LINPACK package; 1 rank per NUMA node: 1 rank per socket

AMD: 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=4, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Samsung_​MZ7LH3T8, App Version: AMD official HPL 2.3 MT version with BLIS 2.1; Build notes: Tools: hpc-x 2.7.0; threads/core: 1; Turbo: used; Build: pre-built binary (gcc built) from https://developer.amd.com/amd-aocl/blas-library/; 1 rank per L3 cache, 4 threads per rank Tested by Intel and results as of March 2021

LINPACK New: March 2021

Baseline: March 2021

[38] 1.32x higher RELION performance with 3rd Gen Intel® Xeon® Scalable vs. AMD Milan 3rd Generation Intel® Xeon® Platinum processor 1.32x higher performance on RELION Plasmodium Ribosome

3rd Gen Intel Xeon: 8380: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 3_​1_​1; Build notes: Tools: Intel C Compiler 2020u4, Intel MPI 2019u9; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -g -debug inline-debug-info -xCOMMON-AVX512 -qopt-report=5 -restrict

AMD: 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=4, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Samsung_​MZ7LH3T8, App Version: 3_​1_​1; Build notes: Tools: Intel C Compiler 2020u4, Intel MPI 2019u9; threads/core: 2; Turbo: used; Build knobs: -O3 -ip -g -debug inline-debug-info -march=core-avx2 -qopt-report=5 -restrict Tested by Intel and results as of March 2021

RELION Plasmodium Ribosome New: March 2021

Baseline: March 2021

[37] 1.50x higher Monte Carlo FSI performance with 3rd Gen Intel® Xeon® Scalable vs. AMD Milan 3rd Generation Intel® Xeon® Platinum processor 1.50x higher performance on Monte Carlo FSI Kernel

3rd Gen Intel Xeon:8380: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: v1.1; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 1; Turbo: used; Build knobs: -O3 -xCORE-AVX512 -qopt-zmm-usage=high -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt

AMD: 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=4, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Samsung_​MZ7LH3T8, App Version: v1.1; Build notes: Tools: Intel MKL 2020u4, Intel C Compiler 2020u4, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -O3 -march=core-avx2 -fimf-precision=low -fimf-domain-exclusion=31 -no-prec-div -no-prec-sqrt Tested by Intel and results as of March 2021

Monte Carlo FSI Kernel New: March 2021

Baseline: March 2021

[36] 1.27x higher NAMD performance on 3rd Gen Intel® Xeon® Scalable vs. AMD Milan 3rd Generation Intel® Xeon® Platinum processor 1.27x higher performance on NAMD STMV 1.27x higher performance on NAMD (geomean of Apoa1, STMV)

3rd Gen Intel Xeon: 8380: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 256 GB (16 slots/ 16GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Intel_​SSDSC2KG96, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL, Intel C Compiler 2020u4, Intel MPI 2019u8, Intel Threading Building Blocks 2020u4; threads/core: 2; Turbo: used; Build knobs: -ip -fp-model fast=2 -no-prec-div -qoverride-limits -qopenmp-simd -O3 -xCORE-AVX512 -qopt-zmm-usage=high

AMD: 7763: 1-node, 2x AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=4, CentOS Linux 8.3.2011, 4.18.0-240.1.1.el8_​3.crt1.x86_​64, 1x Samsung_​MZ7LH3T8, App Version: 2.15-Alpha1 (includes AVX tiles algorithm); Build notes: Tools: Intel MKL, AOCC 2.2.0, gcc 9.3.0, Intel MPI 2019u8; threads/core: 2; Turbo: used; Build knobs: -O3 -fomit-frame-pointer -march=znver1 -ffast-math Tested by Intel and results as of March 2021

NAMD New: March 2021

Baseline: March 2021

[35] 4.5x higher INT8 real-time inference throughput on SSD-ResNet34 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan 3rd Generation Intel® Xeon® Platinum processor 4.5x higher INT8 real-time inference throughput on SSD-ResNet34 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan

3rd Gen Intel Xeon:8380: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, SSD-ResNet34, gcc-9.3.0, oneDNN 1.6.4, BS=1 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, tested by Intel, and results as of March 2021.

AMD: 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=1, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Samsung_​MZ7LH3T8, SSD-ResNet34, gcc-9.3.0, oneDNN 1.6.4, BS=1 FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/object_detection/tensorflow/ssd-resnet34, tested by Intel, and results as of March 2021.

SSD-ResNet34 New: March 2021

Baseline: March 2021

[34] 3.18x higher INT8 real-time inference throughput & 2.17x higher INT8 batch inference throughput on BERT Large SQuAD with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan 3rd Generation Intel® Xeon® Platinum processor 3.18x higher INT8 real-time inference throughput & 2.17x higher INT8 batch inference throughput on BERT Large SQuAD with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan

3rd Gen Intel Xeon: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, BERT Large SQuAD, gcc-9.3.0, oneDNN 1.6.4, BS=1,128, INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, tested by Intel, and results as of March 2021.

AMD: 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=1, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Samsung_​MZ7LH3T8, BERT Large SQuAD, gcc-9.3.0, oneDNN 1.6.4, BS=1,128, FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/language_modeling/tensorflow/bert_large, tested by Intel, and results as of March 2021.

BERT-Large SQuAD New: March 2021

Baseline: March 2021

[33] 4.01x higher INT8 real-time inference throughput & 25.05x higher INT8 batch inference throughput on MobileNet-v1 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan 3rd Generation Intel® Xeon® Platinum processor 4.01x higher INT8 real-time inference throughput & 25.05x higher INT8 batch inference throughput on MobileNet-v1 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan

3rd Gen Intel Xeon: 8380: 1-node, 2x Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,56, INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, tested by Intel, and results as of March 2021.

AMD: 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=1, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Samsung_​MZ7LH3T8, MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1, 56, FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/mobilenet_v1, tested by Intel, and results as of March 2021.

MobileNet-v1 New: March 2021

Baseline: March 2021

[32] 2.79x higher INT8 real-time inference throughput & 12x higher INT8 batch inference throughput on SSD-MobileNet-v1 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan 3rd Generation Intel® Xeon® Platinum processor 2.79x higher INT8 real-time inference throughput & 12x higher INT8 batch inference throughput on SSD-MobileNet-v1 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan

3rd Gen Intel Xeon: 8380: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, SSD-MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,448, INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, tested by Intel, and results as of March 2021.

AMD: 7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=1, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Samsung_​MZ7LH3T8, SSD-MobileNet-v1, gcc-9.3.0, oneDNN 1.6.4, BS=1,448, FP32, TensorFlow- 2.4.1, Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/object_detection/tensorflow/ssd-mobilenet, tested by Intel, and results as of March 2021.

SSD-MobileNet-v1 New: March 2021

Baseline: March 2021

[31] 3.88x higher INT8 real-time inference throughput & 22.09x higher INT8 batch inference throughput on ResNet-50 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan 3rd Generation Intel® Xeon® Platinum processor 3.88x higher INT8 real-time inference throughput & 22.09x higher INT8 batch inference throughput on ResNet-50 with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost vs. FP32 AMD EPYC Milan .

3rd Gen Intel Xeon: 8380: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 (40C/2.3GHz, 270W TDP) processor on Intel Software Development Platform with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X55260, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_​SSDSC2KG96, Intel SSDPE2KX010T8, ResNet50-v1.5, gcc-9.3.0, oneDNN 1.6.4, BS=1,128, INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, tested by Intel, and results as of March 2021.

AMD:7763: 1-node, 2-socket AMD EPYC 7763 (64C/2.45GHz, 280W cTDP) on GIGABYTE R282-Z92 server with 512 GB (16 s/lots/ 32GB/ 3200) total DDR4 memory, ucode 0xa001114, SMT on, Boost on, Power deterministic mode, NPS=1, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Samsung_​MZ7LH3T8, ResNet50-v1.5, gcc-9.3.0, oneDNN 1.6.4, BS=1,128, FP32, TensorFlow- 2.4.1, Model: https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/resnet50v1_5, tested by Intel, and results as of March 2021.

ResNet50 New: March 2021

Baseline: March 2021

[30] Intel® Xeon® processor outperforms Graviton2 by up to 1.48x in WordPress (Intel Xeon based AWS M6i instance with Crypto Ni acceleration outperforms Graviton2 M6g instances for 16 vCPU, up to 1.43x for 4 vCPU, and up to 1.27x for 64 vCPU) 3rd Generation Intel® Xeon® Platinum processor 1.43x higher WordPress transactions per second for 4 vCPU, 1.48x for16 vCPU, and 1.27x for 64 vCPU.

New: M6i.xlarge 4 vCPU, 16 GB memory capacity/instance, M6i.4xlarge 16 vCPU, 64 GB memory capacity/instance, M6i.16xlarge 64 vCPU, 256 GB memory capacity/instance (Xeon) with Crypto NI acceleration, WordPress v5.2, Storage/instance: Amazon Elastic Block Store 8GB, Compiler: GCC 9.3.0, Other SW: MariaDB v10.3.31, Nginx v1.18.0, PHP v7.3.30-1, Ubuntu 20.04.3 LTS, Kernel 5.11.0-1017-aws.

Baseline: M6g.xlarge 4 vCPU, 16 GB memory capacity/instance, m6g.4xlarge 16 vCPU, 64 GB memory capacity/instance, m6g.16xlarge 64 vCPU, 256 GB memory capacity/instance (Graviton2), WordPress v5.2, Storage/instance: Amazon Elastic Block Store 8GB, Compiler: GCC 9.3.0, Other SW: MariaDB v10.3.31, Nginx v1.18.0, PHP v7.3.30-1, Ubuntu 20.04.3 LTS, Kernel 5.11.0-1017-aws.
WordPress New: September 2021

Baseline: September-October 2021

[29] Intel® Xeon® processor outperforms Graviton2 by up to 1.19x in Server-Side Java (Intel Xeon based AWS M6i instance outperforms Graviton2 M6g instances for 4 vCPU, up to 1.13x for 16 vCPU, and up to 1.06x for 64 vCPU) 3rd Generation Intel® Xeon® Platinum processor 1.19x higher Server Side Java operations per second for 4vCPU, 1.13x for 16 vCPU, and 1.06x for 64 vCPU.

New: M6i.xlarge 4 vCPU, 16 GB memory capacity/instance, M6i.4xlarge 16 vCPU, 64 GB memory capacity/instance, M6i.16xlarge 64 vCPU, 256 GB memory capacity/instance (Xeon), Ubuntu 20.04.3 LTS, Kernel 5.11.0-1016-aws.

Baseline: M6g.xlarge 4 vCPU, 16 GB memory capacity/instance, m6g.4xlarge 16 vCPU, 64 GB memory capacity/instance, m6g.16xlarge 64 vCPU, 256 GB memory capacity/instance (Graviton2), Ubuntu 20.04.3 LTS, Kernel 5.11.0-1016-aws.

Server Side Java New: September 2021

Baseline: September 2021

[28] Intel® Xeon® processor outperforms Graviton2 by up to 1.51x in Java Server latency bound throughput (Intel Xeon based AWS M6i instance outperforms Graviton2 M6g instances for 16 vCPU, 1.34x for 4 vCPU, and for 1.51x for 64 vCPU ) 3rd Generation Intel® Xeon® Platinum processor 1.34x higher Java Server latency bound throughput on 4 vCPU, 1.51x on 16 vCPU, and 1.27x on 64 vCPU.

New: M6i.xlarge 4 vCPU, 16 GB memory capacity/instance, M6i.4xlarge 16 vCPU, 64 GB memory capacity/instance, M6i.16xlarge 64 vCPU, 256 GB memory capacity/instance (Xeon), Ubuntu 20.04.3 LTS, Kernel 5.11.0-1016-aws.

Baseline: M6g.xlarge 4 vCPU, 16 GB memory capacity/instance, m6g.4xlarge 16 vCPU, 64 GB memory capacity/instance, m6g.16xlarge 64 vCPU, 256 GB memory capacity/instance (Graviton2), Ubuntu 20.04.3 LTS, Kernel 5.11.0-1016-aws.

Server Side Java New: October 2021

Baseline: October 2021

[27] Intel® Xeon® processor outperforms Graviton2 by up to 1.23x in Redis/Memtier workloads (Intel Xeon based AWS M6i instance outperforms Graviton2 M6g instances for 16 vCPU and up to 1.02x for 64 vCPU) 3rd Generation Intel® Xeon® Platinum processor 1.23x higher Redis throughput for 16 vCPU and 1.02 for 64 vCPU.

New: M6i.4xlarge 16 vCPU 64 GB memory capacity/instance, M6i.16xlarge 64 vCPU, 256 GB memory capacity/instance (Xeon), Redis 6.0.8, memtier 1.2.15, Network BW/Instance 12.5Gbps for M6i, Ubuntu 20.04 LTS, Kernel 5.11.0-1019-aws.

Baseline: M6g.4xlarge 16 vCPU 64 GB memory capacity/instance, M6g.16xlarge 64 vCPU, 256 GB memory capacity/instance (Graviton2), Redis 6.0.8, memtier 1.2.15, Network BW/Instance 10Gbps for M6g, Ubuntu 20.04 LTS, Kernel 5.11.0-1019-aws.

Redis New: September 2021

Baseline: September 2021

[25] 1.54x average performance gains with 3rd Gen Intel Xeon Platinum 8380 processor vs legacy Xeon Platinum 8180 server

2.65x average performance gains with 3rd Gen Intel Xeon Platinum 8380 processor vs legacy E5-v4 server

3.1x average performance gains with 3rd Gen Intel Xeon Platinum 8380 processor vs legacy E5-v3 server

3rd Generation Intel® Xeon® Platinum processor 1.54x average performance gain - Ice Lake vs Skylake: Geomean of 1.6x SPECrate2017_​int_​base (est), 1.62x SPECrate2017_​fp_​base (est), 1.52x Stream Triad, 1.44x Intel distribution of LINPACK.

2.65x average performance gain - Ice Lake vs Broadwell: Geomean of 2.34x SPECrate2017_​int_​base (est), 2.6x SPECrate2017_​fp_​base (est), 2.55x Stream Triad, 3.18x Intel distribution of LINPACK.

3.1x average performance gain - Ice Lake vs Haswell: Geomean of 2.85x SPECrate2017_​int_​base (est), 3.08x SPECrate2017_​fp_​base (est), 2.8x Stream Triad, 3.97x Intel distribution of LINPACK.

New: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on (SPECcpu2017), off (others), Turbo on, Ubuntu 20.04, 5.4.0-66-generic, 1x S4610 SSD 960G, SPECcpu2017 (est) v1.1.0, Stream Triad, Linpack, ic19.1u2, MPI: Version 2019u9; MKL:2020.4.17, test by Intel on 3/15/2021.

Skylake Baseline: 1-node, 2x Intel Xeon Platinum 8180 processor on Wolf Pass with 192 GB (12 slots/ 16GB/ 2933[2666]) total DDR4 memory, ucode 0x2006a08, HT on (SPECcpu2017), off (others), Turbo on, Ubuntu 20.04, 5.4.0-62-generic, SPECcpu2017 (est) v1.1.0, Stream Triad, Intel distribution of LINPACK, ic19.1u2, MPI: Version 2019 Update 9 Build 20200923; MKL: psxe_​runtime_​2020.4.17, test by Intel on 1/27/21.

Broadwell Baseline: 1-node, 2x Intel Xeon processor E5-2699v4 on Wildcat Pass with 256 GB (8 slots/ 32GB/ 2400) total DDR4 memory, ucode 0x038, HT on (SPECcpu2017), off (others), Turbo on, Ubuntu 20.04, 5.4.0-62-generic, 1x S3700 400GB SSD, SPECcpu2017 (est) v1.1.0, Stream Triad, Intel distribution of LINPACK, ic19.1u2, MPI: Version 2019 Update 9 Build 20200923; MKL: psxe_​runtime_​2020.4.17, test by Intel on 1/17/21.

Haswell Baseline: 1-node, 2x Intel Xeon processor E5-2699v3 on Wildcat Pass with 256 GB (8 slots/ 32GB/ 2666[2133]) total DDR4 memory, ucode 0x44, HT on (SPECcpu2017), off (others), Turbo on, Ubuntu 20.04, 5.4.0-62-generic, 1x S3700 400GB SSD, SPECcpu2017 (est) v1.1.0, Stream Triad, Intel distribution of LINPACK, ic19.1u2, MPI: Version 2019 Update 9 Build 20200923; MKL: psxe_​runtime_​2020.4.17, test by Intel on 2/3/21.

Geomean of

Integer throughput/Floating Point throughput/STREAM/LINPACK

New: March 15, 2021

Baseline: Jan 17, 2021

[24] 3rd Gen Intel Xeon Platinum 8380 processor delivers 4.5x performance on cloud data microservices usage vs. legacy Intel Xeon E5-v4 platform enabling faster business decisions. 3rd Generation Intel® Xeon® Platinum processor 4.5x higher responses with CloudXPRT Web Microservices vs. legacy Intel Xeon E5-v4 server: NewPlatinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic​, 1x S4610 SSD 960G, CloudXPRT v1.0, Web Microservices (Requests per minute @ p.95 latency <= 3s), test by Intel on 3/12/2021. Baseline: Intel Xeon E5-v4: 1-node, 2x Intel Xeon processor E5-2699v4 on Wildcat Pass with 256 GB (8 slots/ 32GB/ 2400) total DDR4 memory, ucode 0x038, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic​, 1x S3700 400GB SSD, CloudXPRT v1.0, test by Intel on 1/17/21. Intel contributes to the development of benchmarks by participating in, sponsoring, and/or contributing technical support to various benchmarking groups, including the BenchmarkXPRT Development Community administered by Principled Technologies. CloudXPRT Web Microservices New: March 12, 2021

Baseline: Jan 17, 2021

[23] 3rd Gen Intel Xeon Platinum 8380 processor delivers 2.3x performance on cloud data analytics usage vs. legacy Intel Xeon E5-v4 platform enabling faster business decisions 3rd Generation Intel® Xeon® Platinum processor 2.3x higher responses with CloudXPRT - Data Analytics vs. legacy Intel Xeon E5-v4 server: New: Platinum 8380:1-node, 2x 3rd Gen Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic​, 1x S4610 SSD 960G, CloudXPRT v1.0, Data Analytics (Analytics per minute @ p.95 <= 90s), test by Intel on 3/12/2021. Baseline: Intel Xeon E5-v4: 1-node, 2x Intel Xeon processor E5-2699v4 on Wildcat Pass with 256 GB (8 slots/ 32GB/ 2400) total DDR4 memory, ucode 0x038, HT on, Turbo on, Ubuntu 20.04, 5.4.0-65-generic​, 1x S3700 400GB SSD, CloudXPRT v1.0, test by Intel on 1/17/21. Intel contributes to the development of benchmarks by participating in, sponsoring, and/or contributing technical support to various benchmarking groups, including the BenchmarkXPRT Development Community administered by Principled Technologies. CloudXPRT Data Analytics New: March 12, 2021

Baseline: Jan 17, 2021

[22]Up to 2.95x virtualization performance with 3rd Gen Intel® Xeon® Scalable processor with Intel® SSD D5-P5510 Series and Intel® Ethernet Network Adapter E810 vs. legacy Intel Xeon E5 v4 platform 3rd Generation Intel® Xeon® Platinum processor 2.95x higher virtualization performance vs. legacy Intel Xeon E5-v4 server: New: Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 2048 GB (32 slots/ 64GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, RedHat 8.3, 4.18.0-240.el8.x86_​64, 1x S4610 SSD 960G, 4x P5510 3.84TB NVME, 2x Intel E810, Virtualization workload, Qemu-kvm 4.2.0-34 (inbox), WebSphere 8.5.5, DB2 v9.7, Nginx 1.14.1, test by Intel on 3/14/2021. Baseline: Intel Xeon E5-v4: 1-node, 2x Intel Xeon processor E5-2699v4 on Wildcat Pass with 768 GB (24 slots/ 32GB/ 2666[1600]) total DDR4 memory, ucode 0xb000038, HT on, Turbo on, RedHat 8.3, 4.18.0-240.el8.x86_​64, 1x S3700 400GB SSD, 2x P3700 2TB NVME, 2x Intel XL710, Virtualization workload, Qemu-kvm 4.2.0-34 (inbox), WebSphere 8.5.5, DB2 v9.7, Nginx 1.14.1, test by Intel on 1/14/2021.

Virtualization workload

New: March 14, 2021

Baseline: Jan 14, 2021

[21] Process up to 2.4x transactions with the new 3rd Gen Intel Xeon Platinum 8380 platform vs. legacy Intel Xeon E5-v4 platform 3rd Generation Intel® Xeon® Platinum processor 2.4x higher Transactions on OLTP Database vs. legacy Intel Xeon E5-v4 server: New: Platinum 8380: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode 0x261, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel SSD 960GB OS Drive, 4x Intel P5800 1.6T (2xDATA, 2XREDO), x Onboard 1G/s, HammerDB 4.0, Oracle 19c, test by Intel on 3/16/2021. Baseline: Intel Xeon E5-2699v4: 1-node, 2x Intel Xeon processor E5-2699v4 on Wildcat Pass with 384 GB (24 slots/ 16GB/ 2133[1600]) total DDR4 memory, ucode 0x038, HT on, Turbo on, Redhat 8.3, 4.18.0-240.el8.x86_​64 x86_​64, 1x Intel 200GB SSD OS Drive, 4x 2.0T P3700 (2xDATA, 2xREDO), x Onboard 1G/s, HammerDB 4.0, Oracle 19c, test by Intel on 1/27/21. HammerDB OLTP w/Oracle New: March 16, 2021

Baseline: Jan 27, 2021

[14] Up to 39% More Bandwidth with Intel® Optane™ PMem 200 series 512GB module vs. Intel® Optane™ PMem 100 series 512GB module 3rd Generation Intel® Xeon® Platinum processor & Intel® Optane™ persistent memory 200 series New: 1-node, 1x pre-production CPX6 Processor @ 2.9GHz on Intel - Cedar Island Customer Reference Board (CRB) with DRAM: (per socket) 6 slots / 32GB / 2666 MT/s, PMem: (per socket) 1x 512GB Intel® Optane™ PMem 200 series module at 15W (192GB DRAM, 512GB PMem) total memory, ucode WW12'20 (pre-production), running Fedora 30 kernel 5.1.18-200.fc29.x86_​65,using MLC v3.8 with App-Direct. Source: 2020ww18_​CPX_​BPS_​BG, test by Intel on 31 Mar 2020. Baseline: 1-node, 1x Intel® Xeon® Platinum 8280L processor @ 2.7GHz on Intel - Purley Customer Reference Board (CRB) with DRAM: (per socket) 6 slots / 32GB / 2666 MT/s, PMem: (per socket) 1x 512GB Intel® Optane™ PMem 100 series module at 15W (192GB DRAM, 512GB PMem) total memory, ucode 0x04002F00, running Fedora 29 kernel 5.1.18-200.fc29.x86_​64,using MLC v3.8 with App-Direct workload. Source: 2020ww22_​CPX_​BPS_​BG, test by Intel on 27 Apr 2020. MLC ver3.8 with App Direct (from Intel Optane PMem 200 series demo) New: March 31, 2020

Baseline: April 27, 2020

[13] Multi-generation ResNet-50 Training Throughput Performance Improvement with Intel DL Boost supporting INT8 and BF16 3rd Generation Intel® Xeon® Platinum processor

New- 3rd Gen Intel Xeon Scalable Processor: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 384 GB (24 slots / 16GB / 3200) total memory, ucode 0x700001b, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-29-generic, Intel SSD 800GB OS Drive, ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, BF16, global BS=1024, 4 instances, 28-cores/instance, test by Intel on 06/01//2020.

2nd Gen Intel Xeon Scalable Processor: 1-node, 4x Intel® Xeon® Platinum 8280 processor (28C, 205W) on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2933 ) total memory, ucode 0x4002f00, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-29-generic, Intel SSD 800GB OS Drive, ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit# 6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, global BS=1024, 4 instances, 28-cores/instance, test by Intel on 06/01/2020.

Intel Xeon Scalable Processor: 1-node, 4x Intel® Xeon® Platinum 8180 processor (28C, 205W) on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2666 ) total memory, ucode 0x2000069, HT on, Turbo on, with Ubuntu 20.04 LTS, 5.4.0-26-generic, Intel SSD 800GB OS Drive, Training: ResNet-50-v1.5,Inference: ResNet-50-v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, global BS=1024, 4 instances, 28-cores/instance, test by Intel on 6/02/2020.

Baseline- Intel Xeon processor E7 v4: 1-node, 4x Intel® Xeon® processor E7-8890 v4 (24C, 165W) on Intel Reference Platform (Brickland) with 512 GB (32 slots /16GB/ 1600) total memory, ucode 0xb000038, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-29-generic, Intel SSD 800GB OS Drive, Training: ResNet-50-v1.5,Inference: ResNet-50-v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, global BS=1024, 4 instances, 24-cores/instance, test by Intel on 6/08/2020

Training: ResNet-50 v1.5 Throughput New: June 01, 2020

2nd Gen: June 01, 2020

1st Gen: June 02, 2020

Baseline: June 08, 2020

[12] Multi-generation ResNet-50 Inference Throughput Performance Improvement with Intel DL Boost supporting INT8 and BF16 3rd Generation Intel® Xeon® Platinum processor

New 3rd Gen Intel Xeon Scalable Processor (Cooper Lake): 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 384 GB (24 slots / 16GB / 3200) total memory, ucode 0x700001b, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-29-generic, Intel SSD 800GB OS Drive, Inference: ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, INT8-VNNI, BF16, BS=128, 4 instances, 28-cores/instance, test by Intel on 06/01//2020.

2nd Gen Intel Xeon Scalable Processor (Cascade Lake): 1-node, 4x Intel® Xeon® Platinum 8280 processor (28C, 205W) on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2933 ) total memory, ucode 0x4002f00, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-29-generic, Intel SSD 800GB OS Drive, Inference: ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit# 6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, INT8-VNNI, BS=128, 4 instances, 28-cores/instance, test by Intel on 06/01/2020.

Intel Xeon Scalable Processor (Skylake): 1-node, 4x Intel® Xeon® Platinum 8180 processor (28C, 205W) on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2666 ) total memory, ucode 0x2000069, HT on, Turbo on, with Ubuntu 20.04 LTS, 5.4.0-26-generic, Intel SSD 800GB OS Drive, Inference: ResNet-50-v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, INT8, BS=128, 4 instances, 28-cores/instance, test by Intel on 6/02/2020.

Baseline: Intel Xeon processor E7 v4 (Broadwell): 1-node, 4x Intel® Xeon® processor E7-8890 v4 (24C, 165W) on Intel Reference Platform (Brickland) with 512 GB (32 slots /16GB/ 1600) total memory, ucode 0xb000038, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-29-generic, Intel SSD 800GB OS Drive, Inference: ResNet-50-v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#6ef2116e6a09, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, BS=128, 4 instances, 24-cores/instance, test by Intel on 6/08/2020.

Inference: ResNet-50 v1.5 Throughput New: June 01, 2020

2nd Gen: June 01, 2020

1st Gen: June 02, 2020

Baseline: June 08, 2020

[11] 1.9x average performance gain on popular workloads with the new 3rd Gen Intel® Xeon® Platinum 8380H processor vs. 5-year old platform 3rd Generation Intel® Xeon® Platinum processor

Average performance based on Geomean of est SPECrate®2017_​int_​base 1-copy, est SPECrate®2017_​fp_​base 1-copy, est SPECrate®2017_​int_​base, est SPECrate®2017_​fp_​base, STREAM Triad, Intel distribution of LINPACK, Virtualization and OLTP Database workloads. Results have been estimated or simulated.

New: SPECcpu_​2017, STREAM, LINPACK Performance: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 768 GB (24 slots / 32 GB / 3200) total memory, microcode 0x87000016, HT on for SPECcpu, off for STREAM, LINPACK), Turbo on, with Ubuntu 19.10, 5.3.0-48-generic, 1x Intel 240GB SSD OS Drive, est SPECcpu_​2017, STREAM Triad, Intel distribution of LINPACK, test by Intel on 5/15/2020. HammerDB OLTP Database Performance: New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 768 GB (24 slots / 32 GB / 3200) total memory, microcode 0x700001b, HT on, Turbo on, with Redhat 8.1, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 240GB SSD OS Drive, 2x6.4T P4610 for DATA, 2x3.2T P4610 for REDO, 1Gbps NIC, HammerDB 3.2, Popular Commercial Database, test by Intel on 5/13/2020. Virtualization Performance: New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 1536 GB (48 slots / 32 GB / 3200 (@2933)) total memory, microcode 0x700001b, HT on, Turbo on, with RHEL-8.1 GA, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 240GB SSD OS Drive, 4x P4610 3.2TB PCIe NVMe, 4 x 40 GbE x710 dual port, Virtualization workload, test by Intel on 5/20/2020.

Baseline: SPECcpu_​2017, STREAM, LINPACK Performance: 1-node, 4x Intel® Xeon® processor E7-8890 v3 on Intel Reference Platform (Brickland) with 512 GB (32 slots / 16 GB / 2133 (@1600)) total memory, microcode 0x16, HT on for SPECcpu, off for STREAM, LINPACK), Turbo on, with Ubuntu 20.04 LTS, 5.4.0-29-generic, 1x Intel 480GB SSD OS Drive, est SPECcpu_​2017, STREAM Triad, Intel distribution of LINPACK, test by Intel on 5/15/2020. HammerDB OLTP Database Performance: 1-node, 4x Intel® Xeon® processor E7-8890 v3 on Intel Reference Platform (Brickland) with 1024 GB (64 slots / 16GB / 1600) total memory, microcode 0x16, HT on, Turbo on, with Redhat 8.1, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 800GB SSD OS Drive, 1x1.6T P3700 for DATA, 1x1.6T P3700 for REDO, 1Gbps NIC, HammerDB 3.2, Popular Commercial Database, test by Intel on 4/20/2020. Virtualization Performance: 1-node, 4x Intel® Xeon® processor E7-8890 v3 on Intel Reference Platform (Brickland) with 1024 GB (64 slots / 16GB / 1600) total memory, microcode 0x0000016, HT on, Turbo on, with RHEL-8.1 GA, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 240GB SSD OS Drive, 4x P3700 2TB PCIe NVMe, 4 x 40 GbE x710 dual port, Virtualization workload, test by Intel on 5/20/2020.

Geomean of est SPECrate®2017_​int_​base(1-copy), est SPECrate®2017_​fp_​base(1-copy), est SPECrate®2017_​int_​base, est SPECrate®2017_​fp_​base, STREAM Triad, Intel distribution of LINPACK, Virtualization and OLTP Database workloads New: May 20, 2020

Baseline: May 20, 2020

[10] Process up to 1.98x more OLTP database transactions per minute with the new 3rd Gen Intel® Xeon® Scalable platform vs. 5-year old 4-socket platform 3rd Generation Intel® Xeon® Platinum processor New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 768 GB (24 slots / 32 GB / 3200) total memory, microcode 0x700001b, HT on, Turbo on, with Redhat 8.1, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 240GB SSD OS Drive, 2x6.4T P4610 for DATA, 2x3.2T P4610 for REDO, 1Gbps NIC, HammerDB 3.2, Popular Commercial Database, test by Intel on 5/13/2020.

Baseline: 1-node, 4x Intel® Xeon® processor E7-8890 v3 on Intel Reference Platform (Brickland) with 1024 GB (64 slots / 16GB / 1600) total memory, microcode 0x16, HT on, Turbo on, with Redhat 8.1, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 800GB SSD OS Drive, 1x1.6T P3700 for DATA, 1x1.6T P3700 for REDO, 1Gbps NIC, HammerDB 3.2, Popular Commercial Database, test by Intel on 4/20/2020.

HammerDB OLTP Database New: April 20, 2020

Baseline: April 20, 2020

[9] Up to 1.93x higher AI training performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost with BF16 vs. prior generation on ResNet50 throughput for image classification 3rd Generation Intel® Xeon® Platinum processor New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 384 GB (24 slots / 16GB / 3200) total memory, ucode 0x700001b, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, BF16, BS=512, test by Intel on 5/18/2020.

Baseline: 1-node, 4x Intel® Xeon® Platinum 8280 processor on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2933 ) total memory, ucode 0x4002f00, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, BS=512, test by Intel on 5/18/2020.

ResNet-50 v1.5 Image Classification Training Throughput New: May 18, 2020

Baseline: May 18, 2020

[8] Up to 1.9x higher AI inference performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost with BF16 vs. prior generation with FP32 on BERT throughput for natural language processing 3rd Generation Intel® Xeon® Platinum processor New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 384 GB (24 slots / 16GB / 3200) total memory, ucode 0x700001b, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, BERT-Large (QA) Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=32, 4 instances, 28-cores/instance, test by Intel on 5/18/2020. Baseline: 1-node, 4x Intel® Xeon® Platinum 8280 processor on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2933 ) total memory, ucode 0x4002f00, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, BERT-Large (QA) Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, FP32, BS=32, 4 instances, 28-cores/instance, test by Intel on 5/18/2020. BERT-Large (QA) Squad Inference Throughput New: May 18, 2020

Baseline: May 18, 2020

[7] 225X faster access to data with Intel® Optane™ persistent memory 200 series vs. NVMe SSD Intel® Optane™ persistent memory 200 series New: Intel® Optane™ persistent memory idle read latency compared to Baseline: Intel® SSD DC P4610 Series TLC NAND solid state drive idle read latency. Memory idle read latency NA
[6] Average of 25% higher memory bandwidth vs. prior gen 3rd Generation Intel® Xeon® Platinum processor & Intel® Optane™ persistent memory 200 series New: 1-node, 1x Intel® Xeon® pre-production CPX6 28C @ 2.9GHz processor on Cooper City with Single PMem module config (6x32GB DRAM; 1x{128GB,256GB,512GB} Intel® Optane™ PMem 200 series module at 15W), ucode pre-production running Fedora 29 kernel 5.1.18-200.fc29.x86_​64, and MLC ver 3.8 with App-Direct. Source: 2020ww18_​CPX_​BPS_​BG. Tested by Intel, on 31 Mar 2020.

Baseline: 1-node, 1x Intel® Xeon® 8280L 28C @ 2.7GHz processor on Neon City with Single PMem module config (6x32GB DRAM; 1x{128GB,256GB,512GB} Intel® Optane™ PMem 100 series module at 15W) ucode Rev: 04002F00 running Fedora 29 kernel 5.1.18-200.fc29.x86_​64, and MLC ver 3.8 with App-Direct. Source: 2020ww18_​CPX_​BPS_​DI. Tested by Intel, on 27 Apr 2020

MLC ver 3.8 with App-Direct New: March 31, 2020

Baseline: April 27, 2020

[5] Up to 1.92x higher performance on cloud data analytics usage models with the new 3rd Gen Intel® Xeon® Scalable processor vs. 5-year old 4-socket platform 3rd Generation Intel® Xeon® Platinum processor

New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 1536 GB (48 slots / 32 GB / 3200 (@2933)) total memory, microcode 0x700001b, HT on, Turbo on, with Ubuntu 18.04.4 LTS, 5.3.0-53-generic, 1x Intel 240GB SSD OS Drive, 4x P4610 3.2TB PCIe NVMe, 4 x 40 GbE x710 dual port, CloudXPRT vCP - Data Analytics, Kubernetes, Docker, Kafka, MinIO, Prometheus, XGBoost workload, Higgs dataset, test by Intel on 5/27/2020.

Baseline: 1-node, 4x Intel® Xeon® processor E7-8890 v3 on Intel Reference Platform (Brickland) with 1024 GB (64 slots / 16GB / 1600) total memory, microcode 0x0000016, HT on, Turbo on, with Ubuntu 18.04.4 LTS, 5.3.0-53-generic, 1x Intel 400GB SSD OS Drive, 4x P3700 2TB PCIe NVMe, 4 x 40 GbE x710 dual port, CloudXPRT vCP - Data Analytics, Kubernetes, Docker, Kafka, MinIO, Prometheus, XGBoost workload, Higgs dataset, test by Intel on 5/27/2020.

Intel contributes to the development of benchmarks by participating in, sponsoring, and/or contributing technical support to various benchmarking groups, including the BenchmarkXPRT Development Community administered by Principled Technologies.

CloudXPRT vCP- Data Analytics, Kubernetes, Docker, Kafka, MinIO, Prometheus, XGBoost workload, Higgs dataset New: May 27, 2020

Baseline: May 27, 2020

[4] Up to 2.2x more Virtual Machines with the new 3rd Gen Intel® Xeon® Scalable platform and Intel® SSD Data Center Family vs. 5-year old 4-socket platform 3rd Generation Intel® Xeon® Platinum processor New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 1536 GB (48 slots / 32 GB / 3200 (@2933)) total memory, microcode 0x700001b, HT on, Turbo on, with RHEL-8.1 GA, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 240GB SSD OS Drive, 4x P4610 3.2TB PCIe NVMe, 4 x 40 GbE x710 dual port, Virtualization workload, test by Intel on 5/20/2020.

Baseline:1-node, 4x Intel® Xeon® processor E7-8890 v3 on Intel Reference Platform (Brickland) with 1024 GB (64 slots / 16GB / 1600) total memory, microcode 0x0000016, HT on, Turbo on, with RHEL-8.1 GA, 4.18.0-147.3.1.el8_​1.x86_​64, 1x Intel 240GB SSD OS Drive, 4x P3700 2TB PCIe NVMe, 4 x 40 GbE x710 dual port, Virtualization workload, test by Intel on 5/20/2020.

Virtualization workload New: May 20, 2020

Baseline: May 20, 2020

[3] The new Intel® 3D NAND SSDs deliver an improved balance of performance and capacity for your storage requirements-including up to 33% better performance and 40% lower latency. Intel® SSD D7-P5500 series

33% better performance when using Intel ® SSD D7-P5500 series: Source - Intel. Comparing datasheet figures for 4KB Random Read QD256 performance between the Intel ® SSD D7-P5500 Series 7.68TB and Intel ® SSD DC P4510 Series 8TB with both drives running on PCIe 3.1. Measured performance was 854K IOPS and 641.8K IOPS for the D7-P5500 and DC P4510, respectively. Performance for both drives measured using FIO Linux CentOS 7.2 kernel 4.8.6 with 4KB (4,096 bytes) of transfer size with Queue Depth 64 (4 workers). Measurements are performed on a full Logical Block Address (LBA) span of the drive once the workload has reached steady state but including all background activities required for normal operation and data reliability. Power mode set at PM0. Any differences in your system hardware, software or configuration may affect your actual performance. Intel expects to see certain level of variation in data measurement across multiple drives.

40% lower latency when using Intel ® SSD D7-P5500 series: Source - Intel. Comparing datasheet figures for 4KB Random Write QD1 latency between the Intel ® SSD D7-P5500 Series 7.68TB and Intel® SSD DC P4510 Series 8TB with both drives running on PCIe 3.1. Measured latency was 15μs and 25μs for the D7-P5500 and DC P4510, respectively. Performance for both drives measured using FIO Linux CentOS 7.2 kernel 4.8.6 with 4KB (4096 bytes) of transfer size with Queue Depth 1 (1 worker). Measurements are performed on a full Logical Block Address (LBA) span of the drive once the workload has reached steady state but including all background activities required for normal operation and data reliability. Power mode set at PM0. Any differences in your system hardware, software or configuration may affect your actual performance. Intel expects to see certain level of variation in data measurement across multiple drives.

Read/Write IOPS, Latency N/A
[2] Up to 1.87x higher AI Inference performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost with BF16 vs. prior generation using FP32 on ResNet50 throughput for image classification 3rd Generation Intel® Xeon® Platinum processor New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 384 GB (24 slots / 16GB / 3200) total memory, ucode 0x700001b, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, BF16, BS=56, 4 instances, 28-cores/instance, test by Intel on 5/18/2020.

Baseline: 1-node, 4x Intel® Xeon® Platinum 8280 processor on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2933 ) total memory, ucode 0x4002f00, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, ResNet-50 v1.5 Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Imagenet dataset, oneDNN 1.4, FP32, BS=56, 4 instances, 28-cores/instance, test by Intel on 5/18/2020.

ResNet-50 Inference Throughput New: May 18, 2020

Baseline: May 18, 2020

[1] Up to 1.7x more AI training performance with 3rd Gen Intel® Xeon® Scalable processor supporting Intel® DL Boost with BF16 vs. prior generation on BERT throughput for natural language processing 3rd Generation Intel® Xeon® Platinum processor New: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 384 GB (24 slots / 16GB / 3200) total memory, ucode 0x700001b, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, BERT-Large (QA) Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=12, test by Intel on 5/18/2020.

Baseline: 1-node, 4x Intel® Xeon® Platinum 8280 processor on Intel Reference Platform (Lightning Ridge) with 768 GB (24 slots / 32 GB / 2933 ) total memory, ucode 0x4002f00, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, Intel 800GB SSD OS Drive, BERT-Large (QA) Throughput, https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Modelzoo: https://github.com/IntelAI/models/ -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, FP32, BS=12, test by Intel on 5/18/2020.

BERT-Large (QA) Squad Training Throughput New: May 18, 2020

Baseline: May 18, 2020