Performance Index

ID 615781
Date 03/28/2024
Document Table of Contents

Supercomputing 22

Keynotes

Session, Speaker Claim Claim Details

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.7x Higher HPL performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.4x Higher HPL performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/7/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s DDR4), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, HPL from MKL_​v2022.1.0

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, cTDP – 280, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, HPL v2.3_​BLIS-3.0_​AMD_​OFFICIAL

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, HPL from MKL_​v2022.1.0

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 3.2x Higher HPCG performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 3.8x Higher HPCG performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/7/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s DDR4), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, HPCG from MKL_​v2022.1.0

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, cTDP – 280, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, HPCG from MKL_​v2022.1.0

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, HPCG from MKL_​v2022.1.0

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 5.0x Higher HPCG performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 5.3x Higher HPCG performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/7/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s DDR4), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, Stream v5.10

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, cTDP – 280, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, Stream v5.10

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, Stream v5.10

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 4.8x Higher CosmoFlow (training on 8192 image batches) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 3.0x Higher CosmoFlow (training on 8192 image batches) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 06/07/2022. 1-node, 2x Intel® Xeon® Scalable Processor 8380, HT On, Turbo On, Total Memory 512 GB (16 slots/ 32 GB/ 3200 MHz, DDR4), BIOS SE5C6200.86B.0022.D64.2105220049, ucode 0xd0002b1, OS Red Hat Enterprise Linux 8.5 (Ootpa), kernel 4.18.0-348.7.1.el8_​5.x86_​64, https://github.com/mlcommons/hpc/tree/main/cosmoflow, AVX-512, FP32, Tensorflow 2.9.0, horovod 0.23.0, keras 2.6.0, oneCCL-2021.4, oneAPI MPI 2021.4.0, Python 3.8

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1 node, 2x AMD EPYC 7773X, HT On, AMD Turbo Core On, Total Memory 512 GB (16 slots/ 32 GB/ 3200 MHz), BIOS M10, ucode 0xa001229, OS CentOS Stream 8, kernel 4.18.0-383.el8.x86_​64, https://github.com/mlcommons/hpc/tree/main/cosmoflow, Intel TensorFlow 2.8.0, horovod 0.22.1, keras 2.8.0, OpenMPI 4.1.0, Python 3.8

Intel® Xeon® CPU Max Series: Test by Intel as of 10/18/2022. 1 node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, Total Memory 128 HBM and 512 GB DDR (16 slots/ 32 GB/ 4800 MHz), BIOS SE5C7411.86B.8424.D03.2208100444, ucode 0x2c000020, CentOS Stream 8, kernel 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, https://github.com/mlcommons/hpc/tree/main/cosmoflow, AMX, BF16, TensorFlow 2.9.1, horovod 0.24.0, keras 2.9.0.dev2022021708, oneCCL 2021.5, Python 3.8

Unverified performance Gains on MLPerf™ HPC-AI v 0.7 Cosmoflow Training benchmark using optimized Tensorflow2.6 (FP32, FP16) and 2.9.1 (FP32,BP16). Result not verified by MLCommons Association. Unverified results have not been through an MLPerf™ review and may use measurement methodologies and/or workload implementations that are inconsistent with the MLPerf™ specification for verified results. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 3.2x Higher ISO3DFD performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 3.4x Higher ISO3DFD performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/7/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s DDR4), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, YASK v3.05.07

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, cTDP – 280, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, YASK v3.05.07

Intel® Xeon® CPU Max Series: Test by Intel as of ww36’22. 1-node, 2x Intel® Xeon® CPU Max SeriesHT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, YASK v3.05.07.

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.5x Higher ROMS (Geomean of benchmark3 (2048x256x30), benchmark3 (8192x256x30)) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 2.7x Higher ROMS (Geomean of benchmark3 (2048x256x30), benchmark3 (8192x256x30)) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/12/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, NUMA configuration SNC2, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, ROMS V4 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-ip -O3 -heap-arrays -xCORE-AVX512 -qopt-zmm-usage=high -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, ROMS V4, benchmark3 (2048x256x30), benchmark3 (8192x256x30)

AMD EPYC 7773X: Test by Intel as of 10/12/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, NUMA configuration NPS=4, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, Rocky Linux 8.6 (Green Obsidian), Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, ROMS V4 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-ip -O3 -march=core-avx2 -heap-arrays -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, ROMS V4, benchmark3 (2048x256x30), benchmark3 (8192x256x30)

Intel® Xeon® CPU Max Series: Test by Intel as of 10/12/22. 1-node, 2x Intel® Xeon® CPU Max Series, HT ON, Turbo ON, NUMA configuration SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, ROMS V4 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-ip -O3 -heap-arrays -xCORE-AVX512 -qopt-zmm-usage=high -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, ROMS V4, benchmark3 (2048x256x30), benchmark3 (8192x256x30)

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.1x Higher NEMO (Geomean of GYRE_​PISCES_​25, BENCH ORCA-1) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 3.3x Higher NEMO (Geomean of GYRE_​PISCES_​25, BENCH ORCA-1) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/12/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, NUMA configuration SNC2, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, NEMO v4.2 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags ”-i4 -r8 -O3 -fno-alias -march=core-avx2 -fp-model fast=2 -no-prec-div -no-prec-sqrt -align array64byte -fimf-use-svml=true”

AMD EPYC 7773X: Test by Intel as of 10/12/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, NUMA configuration NPS=4, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, Rocky Linux 8.6 (Green Obsidian), Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, NEMO v4.2 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags ” -i4 -r8 -O3 -fno-alias -march=core-avx2 -fp-model fast=2 -no-prec-div -no-prec-sqrt -align array64byte -fimf-use-svml=true”.

Intel® Xeon® CPU Max Series: Test by Intel as of 10/12/22. 1-node, 2x Intel® Xeon® CPU Max Series, HT ON, Turbo ON, NUMA configuration SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, NEMO v4.2 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-i4 -r8 -O3 -fno-alias -march=core-avx2 -fp-model fast=2 -no-prec-div -no-prec-sqrt -align array64byte -fimf-use-svml=true”.

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.2x Higher WRF (CONUS 2.5KM) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 2.7x Higher WRF (CONUS 2.5KM) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/12/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, NUMA configuration SNC2, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, WRF v3.9.1.1 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags ”-ip -O3 -xCORE-AVX512 -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low -w -ftz -align array64byte -fno-alias -fimf-use-svml=true -inline-max-size=12000 -inline-max-total-size=30000 -vec-threshold0 -qno-opt-dynamic-align”.

AMD EPYC 7773X: Test by Intel as of 10/12/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, NUMA configuration NPS=4, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, Rocky Linux 8.6 (Green Obsidian), Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, WRF v3.9.1.1 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-ip -O3 -march=core-avx2 -fp-model fast=2 -no-prec-div -no-prec-sqrt -w -ftz -align array64byte -fno-alias -fimf-use-svml=true -inline-max-size=12000 -inline-max-total-size=30000 -vec-threshold0 -qno-opt-dynamic-align ”.

Intel® Xeon® CPU Max Series: Test by Intel as of 10/12/22. 1-node, 2x Intel® Xeon® CPU Max Series, HT ON, Turbo ON, NUMA configuration SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, WRF v3.9.1.1 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags ”-ip -O3 -xCORE-AVX512 -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low -w -ftz -align array64byte -fno-alias -fimf-use-svml=true -inline-max-size=12000 -inline-max-total-size=30000 -vec-threshold0 -qno-opt-dynamic-align”.

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.4x Higher MPAS-A (MPAS-A V7.3 60-km dynamical core) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 3.5x Higher MPAS-A (MPAS-A V7.3 60-km dynamical core) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/12/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, NUMA configuration SNC2, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, MPAS-A V7.3 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-O3 -march=core-avx2 -convert big_​endian -free -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, MPAS-A V7.3, benchmark_​60km dycore

AMD EPYC 7773X: Test by Intel as of 10/12/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, NUMA configuration NPS=4, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, Rocky Linux 8.6 (Green Obsidian), Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, MPAS-A V7.3 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-O3 -march=core-avx2 -convert big_​endian -free -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, MPAS-A V7.3, benchmark_​60km dycore

Intel® Xeon® CPU Max Series: Test by Intel as of 10/12/22. 1-node, 2x Intel® Xeon® CPU Max Series, HT ON, Turbo ON, NUMA configuration SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, MPAS-A V7.3 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-O3 -march=core-avx2 -convert big_​endian -free -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, MPAS-A V7.3, benchmark_​60km dycore.

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.9x Higher Black Scholes performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.6x Higher Black Scholes performance vs. Intel® Xeon® 8380

Intel® Xeon® CPU Max Series has 1.5x Higher Binomial Options performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.4x Higher Binomial Options performance vs. Intel® Xeon® 8380

Intel® Xeon® CPU Max Series has 1.5x Higher Monte Carlo performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.3x Higher Monte Carlo performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/7/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s DDR4), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, Binomial Options v1.1, Black Scholes v1.4, Monte Carlo v1.2

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, cTDP – 280, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, Binomial Options v1.1, Black Scholes v1.4, Monte Carlo v1.2

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max SeriesHT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, Binomial Options v1.1, Black Scholes v1.4, Monte Carlo v1.2

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.9x Higher LAMMPS (Geomean of Atomic Fluid, Copper, DPD, Liquid_​crystal, Polyethylene, Protein, Stillinger-Weber, Tersoff, Water) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.6x Higher LAMMPS (Geomean of Atomic Fluid, Copper, DPD, Liquid_​crystal, Polyethylene, Protein, Stillinger-Weber, Tersoff, Water) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/11/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, NUMA configuration SNC2, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, LAMMPS v2021-09-29 cmkl:2022.1.0, icc:2021.6.0, impi:2021.6.0, tbb:2021.6.0; threads/core:; Turbo:on; BuildKnobs:-O3 -ip -xCORE-AVX512 -g -debug inline-debug-info -qopt-zmm-usage=high;

AMD EPYC 7773X: Test by Intel as of 10/6/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, NUMA configuration NPS=4, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, Rocky Linux 8.6 (Green Obsidian), Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, LAMMPS v2021-09-29 cmkl:2022.1.0, icc:2021.6.0, impi:2021.6.0, tbb:2021.6.0; threads/core:; Turbo:on; BuildKnobs:-O3 -ip -g -debug inline-debug-info -axCORE-AVX2 -march=core-avx2;

Intel® Xeon® CPU Max Series: Test by Intel as of 8/31/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT ON, Turbo ON, NUMA configuration SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, LAMMPS v2021-09-29 cmkl:2022.1.0, icc:2021.6.0, impi:2021.6.0, tbb:2021.6.0; threads/core:; Turbo:off; BuildKnobs:-O3 -ip -xCORE-AVX512 -g -debug inline-debug-info -qopt-zmm-usage=high;

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.9x Higher AlphaFold 2 (Inference on Eight Streams) performance vs. AMD EPYC 7773X

EPYC 7773X Milan-X: Test by Intel as of 10/18/2022. 1-node, 2x AMD EPYC , 64 cores, HT On, AMD Turbo Core On, Total Memory 512 GB (16 slots/ 32 GB/ 3200 MHz), BIOS M10, ucode 0xa001229, OS CentOS Stream 8, kernel 4.18.0-383.el8.x86_​64, https://github.com/intelxialei/intel-alphafold2, AVX-512, FP32, PyTorch 1.11.0, Intel Extension for PyTorch 1.11.200, jax 0.3.14

Intel® Xeon® CPU Max Series: Test by Intel as of 9/16/2022, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, Total Memory 128GB (HBM2e at 3200 MHz), BIOS SE5C7411.86B.8424.D03.2208100444, ucode 0x2c000020, CentOS Stream 8, kernel 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, https://github.com/intelxialei/intel-alphafold2, AVX-512, FP32, PyTorch 1.11.0, Intel Extension for PyTorch 1.11.200, jax 0.3.14

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.8x Higher DeePMD (Multi-Instance Training) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 2.9x Higher DeePMD (Multi-Instance Training) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/20/2022. 1-node, Intel® Xeon® 8380 processor, Total Memory 256 GB, kernel 4.18.0-372.26.1.eI8_​6.crt1.x86_​64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), https://github.com/deepmodeling/deepmd-kit, Tensorflow 2.9, Horovod 0.24.0, oneCCL-2021.5.2, Python 3.9

AMD EPYC 7773X: Test by Intel as of 10/25/2022. 1-node, AMD EPYC 7773X, 64 Cores, Total Memory 256 GB, kernel 4.18.0-372.26.1.el8_​6.crt1.x86_​64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), https://github.com/deepmodeling/deepmd-kit, Tensorflow 2.9, Horovod 0.24.0, oneCCL-2021.5.2, Python 3.9

Intel® Xeon® CPU Max Series: Test by Intel as of 10/12/2022. 1-node, Intel® Xeon® CPU Max Series, Total Memory 128 GB (HBM2e at 3200 MHz), kernel 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-13), https://github.com/deepmodeling/deepmd-kit, Tensorflow 2.9, Horovod 0.24.0, oneCCL-2021.5.2, Python 3.9

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.7x Higher Quantum Espresso (Geomean of AUSURF112, Water_​EXX) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 9/30/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, Quantum Espresso 7.0, AUSURF112, Water_​EXX

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, Quantum Espresso 7.0, AUSURF112, Water_​EXX

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.2x Higher Siemens Simcenter Star-CCM+ (Geomean of civil, HlMach10AoA10Sou, kcs_​with_​physics, lemans_​poly_​17m.amg, reactor, TurboCharger7M) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.7x Higher Siemens Simcenter Star-CCM+ (Geomean of civil, HlMach10AoA10Sou, kcs_​with_​physics, lemans_​poly_​17m.amg, reactor, TurboCharger7M) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 25-Oct-22. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode revision=0xd000270, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, StarCCM+ 17.04.007, reactor 9m @ 20 iterations, lemans_​poly_​17m @ 20 iterations, civil 20m @ 20 iterations, TurboCharger7M @ 20 iterations, HlMach10AoA10Sou 6.4m @ 20 iterations, kcs_​with_​physics 3m @ 20 iterations

AMD EPYC 7773X: Test by Intel as of 25-Oct-22. 1-node, 2x AMD EPYC HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, StarCCM+ 17.04.007, reactor 9m @ 20 iterations, lemans_​poly_​17m @ 20 iterations, civil 20m @ 20 iterations, TurboCharger7M @ 20 iterations, HlMach10AoA10Sou 6.4m @ 20 iterations, kcs_​with_​physics 3m @ 20 iterations

Intel® Xeon® CPU Max Series: Test by Intel as of 14-Sep-22. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, StarCCM+ 17.04.007, reactor 9m @ 20 iterations, lemans_​poly_​17m @ 20 iterations, civil 20m @ 20 iterations, TurboCharger7M @ 20 iterations, HlMach10AoA10Sou 6.4m @ 20 iterations, kcs_​with_​physics 3m @ 20 iterations

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.2x Higher Ansys LS-DYNA (ODB-10M) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.6x Higher Ansys LS-DYNA (ODB-10M) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/7/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s DDR4), BIOS Version SE5C620.86B.01.01.0006.2207150335, ucode revision=0xd000375, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, LS-DYNA R11

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, cTDP – 280, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, LS-DYNA R11

Intel® Xeon® CPU Max Series: Test by Intel as of ww36’22. 1-node, 2x Intel® Xeon® CPU Max SeriesHT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, LS-DYNA R11

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.3x Higher Ansys Fluent (Geomean of pump_​2m, sedan_​4m, rotor_​3m, aircraft_​wing_​14m, combustor_​12m, exhaust_​system_​33m)​ performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 2.1x Higher Ansys Fluent (Geomean of pump_​2m, sedan_​4m, rotor_​3m, aircraft_​wing_​14m, combustor_​12m, exhaust_​system_​33m)​ performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 08/24/2022. 1-node, 2x Intel® Xeon® 8380, HT ON, Turbo ON, Quad, Total Memory 256 GB, BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode 0xd000270, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, Ansys Fluent 2022R1

AMD EPYC 7773X: Test by Intel as of 8/24/2022. 1-node, 2x AMD EPYC 7773X, HT On, Turbo On, NPS4,Total Memory 256 GB, BIOS ver. M10, ucode 0xa001224, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, Ansys Fluent 2022R1

Intel® Xeon® CPU Max Series: Test by Intel as of 08/31/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo ON, SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode 2c000020, CentOS Stream 8, kernel version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, Ansys Fluent 2022R1

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 1.5x Higher Ansys Mechanical (Geomean of V22iter-1, V22iter-2, V22iter-3, V22iter-4, V22direct-1, V22direct-2, V22direct-3) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 1.6x Higher Ansys Mechanical (Geomean of V22iter-1, V22iter-2, V22iter-3, V22iter-4, V22direct-1, V22direct-2, V22direct-3)​ performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 08/24/2022. 1-node, 2x Intel® Xeon® 8380, HT ON, Turbo ON, Quad, Total Memory 256 GB, BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode 0xd000270, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, Ansys Mechanical 2022 R2

AMD EPYC 7773X: Test by Intel as of 8/24/2022. 1-node, 2x AMD EPYC 7773X, HT On, Turbo On, NPS4,Total Memory 512 GB, BIOS ver. M10, ucode 0xa001229, CentOS Stream 8, kernel version 4.18.0-383.el8.x86_​6, Ansys Mechanical 2022 R2

Intel® Xeon® CPU Max Series: Test by Intel as of 08/31/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo ON, SNC4, Total Memory 512 GB DDR5 4800 MT/s, 128 GB HBM in cache mode (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode 2c000020, CentOS Stream 8, kernel version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, Ansys Mechanical 2022 R2

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.1x Higher Altair AcuSolve (HQ Model) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 2.4x Higher Altair AcuSolve (HQ Model) performance vs. Intel® Xeon® 8380

Intel® Xeon® CPU Max Series has 2.5x Higher Altair AcuSolve (HQ Model) performance vs. Intel® Xeon® 8358

1 node of Intel® Xeon® CPU Max Series can replace 4 of Intel® Xeon® 6346 and reduce CPU power 43% from 1640W to 700W

Intel® Xeon® 8380: Test by Intel as of 09/28/2022. 1-node, 2x Intel® Xeon® 8380, HT ON, Turbo ON, Quad, Total Memory 256 GB, BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode 0xd000270, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, Altair AcuSolve 2021R2

Intel® Xeon® 6346: Test by Intel as of 10/08/2022. 4-nodes connected via HDR-200, 2x Intel® Xeon® 6346, 16 cores, HT ON, Turbo ON, Quad, Total Memory 256 GB, BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode 0xd000270, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, Altair AcuSolve 2021R2

Intel® Xeon® 8358: Test by Intel as of 09/26/2022. 1-node, 2x Intel® Xeon® 8358, HT ON, Turbo ON, Quad, Total Memory 256 GB, BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode 0xd000270, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, Altair AcuSove 2021R2

AMD EPYC 7773X: Test by Intel as of 9/27/2022. 1-node, 2x AMD EPYC 7773X, HT On, Turbo On, NPS4,Total Memory 256 GB, BIOS ver. M10, ucode 0xa001224, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, Altair AcuSolve 2021R2

Intel® Xeon® CPU Max Series: Test by Intel as of 10/03/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT ON, Turbo ON, SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode 2c000020, CentOS Stream 8, kernel version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, Altair AcuSolve 2021R2

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.1x Higher ParSeNet (SplineNet) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 2.1x Higher ParSeNet (SplineNet) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 10/18/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode revision=0xd000270, Rocky Linux 8.6, Linux version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, ParSeNet (SplineNet), PyTorch 1.11.0, Torch-CCL 1.2.0, IPEX 1.10.0, MKL (20220804), oneDNN (v2.6.0)

AMD EPYC 7773X: Test by Intel as of 10/19/2022. 1-node, 2x AMD EPYC HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, ParSeNet (SplineNet), PyTorch 1.13.0, IPEX 1.13.0-cpu, MKL (20220804), oneDNN (v2.6.0), score=0.91 (31.1 sec per epoch)

Intel® Xeon® CPU Max Series: Test by Intel as of 09/12/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, ParSeNet (SplineNet), PyTorch 1.11.0, Torch-CCL 1.2.0, IPEX 1.10.0, MKL (20220804), oneDNN (v2.6.0)

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.5x Higher OpenFOAM (Geomean of Motorbike 20M, Motorbike 42M) performance vs. AMD EPYC 7773X

Intel® Xeon® CPU Max Series has 3.2x Higher OpenFOAM (Geomean of Motorbike 20M, Motorbike 42M) performance vs. Intel® Xeon® 8380

Intel® Xeon® 8380: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® 8380 CPU, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode revision=0xd000270, Rocky Linux 8.6, Linux version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations

AMD EPYC 7773X: Test by Intel as of 9/2/2022. 1-node, 2x AMD EPYC HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.19.1.el8_​6.crt1.x86_​64, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, OpenFOAM 8, Motorbike 20M @ 250 iterations, Motorbike 42M @ 250 iterations

This offering is not approved or endorsed by OpenCFD Limited, producer and distributor of the OpenFOAM software via www.openfoam.com, and owner of the OPENFOAM® and OpenCFD® trademark

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series has 2.1x Higher 3D-GAN for Particle Shower Simulations performance vs. AMD EPYC 7773X

AMD EPYC 7773X: Test by Intel as of 10/18/2022. 1 node, 2x AMD EPYC, HT On, AMD Turbo Core On, Total Memory 512 GB (16 slots/ 32 GB/ 3200 MHz), BIOS M10, ucode 0xa001229, OS CentOS Stream 8, kernel 4.18.0-383.el8.x86_​64, https://github.com/svalleco/3Dgan/, Intel TensorFlow 2.10.0, Keras 2.10.0, GCC 11.20, Intel Neural Compressor: https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html, TF Wheel intel_​tensorflow_​avx512-2.10.0.202230-cp38-cp38-manylinux_​2_​17_​x86_​64.manylinux2014_​x86_​64.whl, Python 3.8.13

Intel® Xeon® CPU Max Series: Test by Intel as of 10/18/2022. 1 node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, Total Memory 128 HBM and 512 GB DDR (16 slots/ 32 GB/ 4800 MHz), BIOS SE5C7411.86B.8424.D03.2208100444, ucode 0x2c000020, CentOS Stream 8, kernel 5.19.0-rc6.0712.intel_​next.1.x86_​64+server https://github.com/svalleco/3Dgan/, Intel TensorFlow 2.10.0, Keras 2.10.0, GCC 11.20, Intel Neural Compressor: https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html, TF Wheel intel_​tensorflow_​avx512-2.10.0.202230-cp38-cp38-manylinux_​2_​17_​x86_​64.manylinux2014_​x86_​64.whl, Python 3.8.13

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Data Center GPU Max Series has 2.4x Higher Riskfuel Credit Option Pricing Training performance than NVIDIA A100

Intel® Data Center GPU Max Series: Test by Intel as of 10/25/2022. 1-node, 2x 4th Gen Intel® Xeon® Scalable Processor, HT On, Turbo On, Total Memory 512 GB DDR5-4800, BIOS: EGSDCRB1.SYS.007.D01.2203211346, ucode 0x8f000300, 1x Intel Datacenter GPU Max Series , Ubuntu 20.04, Kernel 5.15; oneAPI_​20220630, Intel Python 3.9, intel-extension-for-pytorch 1.10.100, torch 1.10; Run config --layers 8 – width 1024 – input_​dims = 5

NVIDIA A100: Test by Intel as of 10/5/2022. 1-node, 2x Intel® Xeon® 8360Y, HT On, Turbo On, Total Memory 512 GB DDR4-3200, BIOS SE5C6200.86B.0021.D40.2101090208, ucode = 0xd000363, 1x NVIDIA A100 80G PCIe, CentOS Stream 8, Kernel 4.18; CUDA 11.7, Python 3.9, torch 1.12.1+cu113; Run config --layers 8 – width 1024 – input_​dims = 5

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Data Center GPU Max Series sees a 2x performance increase in 4Kx4K 2D-FFT from its larger cache

Intel® Data Center GPU Max Series: Test by Intel as of 8/1/2022. 1-node, 2x 4th Gen Intel® Xeon Scalable Processor, 1x Intel Datacenter GPU Max Series, Ubuntu 20.04, pre-production oneAPI, configured GPU L2 Cache to 408MB, 80MB, and 32MB. 2D-FFT configuration was with 4K*4K DP 2D FFT  

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Data Center GPU Max Series has 2.0x Higher MiniBUDE performance than NVIDIA A100

Intel® Data Center GPU Max Series: Test by Intel as of 8/1/2022. 1-node, 2x Intel®Xeon® 8360Y, 1x Intel® Data Center GPU Max Series, Ubuntu 20.04, pre-production oneAPI software, BIG5 dataset (2672 Ligands, 2672 proteins, and 983040 poses) https://github.com/cschpc/epmhpcgpu/tree/main/miniBUDE/big5

NVIDIA A100: Test by Intel as of 8/1/2022. 1-node, 2x Intel®Xeon® 8360Y, 1x NVIDIA A100 8G PCIe, Ubuntu 20.04, CUDA 11.4, BIG5 dataset (2672 Ligands, 2672 proteins, and 983040 poses) https://github.com/cschpc/epmhpcgpu/tree/main/miniBUDE/big5

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Data Center GPU Max Series has 1.5x Higher NekRS (Geomean AxHelm(FP64), AxHelm(FP32), FDM(FP32), AdvSub) performance than NVIDIA A100

AxHelm(FP64) is 1.5x, AxHelm(FP32) is 1.3x, FDM(FP32) is 1.7x, AdvSub is 1.4x

Intel® Data Center GPU Max Series: Test by Intel as of 8/1/2022. 1-node, 2x Intel® Xeon® 8360Y,Total Memory: 256 GB DDR4-3200, 1x Intel® Data Center GPU Max Series, Ubuntu 20.04, pre-production oneAPI; Benchmarks: NekRSAxHelm(BK5) FP64, AxHelm(BK5) FP32, FDM FP32 and advSubwith problem size of 8192 (E = 2x16^3).

NVIDIA A100: Test by Intel as of 8/1/2022, 1-node, 2x AMD EPYC 7532, 1x NVIDIA A100 80GB PCIe, Total Memory 256 GB DDR4-3200, Ubuntu 20.04, CUDA 11.4; Benchmarks: NekRSAxHelm(BK5) FP64, AxHelm(BK5) FP32, FDM FP32 and advSubwith problem size of 8192 (E = 2x16^3).

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Data Center GPU Max Series has 2.0x Higher OpenMC performance than NVIDIA A100

Intel® Data Center GPU Max Series: Test By Intel as of 5/25/2022, 1-node, 2x Intel® Xeon® Scalable Processor 8360Y, 256GB DDR4 3200, HT On, Turbo, On, ucode0xd0002c1. 1x Intel Data Center GPU Max Series. Ubuntu 20.04, Linux Version 5.10.54, agama 434, Build Knobs: cmake-DCMAKE_​CXX_​COMPILER="mpiicpc" -DCMAKE_​C_​COMPILER="mpiicc" -DCMAKE_​CXX_​FLAGS="-cxx=icpx-mllvm-indvars-widen-indvars=false -Xclang-fopenmp-declare-target-global-default-no-map -std=c++17 -Dgsl_​CONFIG_​CONTRACT_​CHECKING_​OFF -fsycl-DSYCL_​SORT -D_​GLIBCXX_​USE_​TBB_​PAR_​BACKEND=0" --preset=spirv-DCMAKE_​UNITY_​BUILD=ON -DCMAKE_​UNITY_​BUILD_​MODE=BATCH -DCMAKE_​UNITY_​BUILD_​BATCH_​SIZE=1000 -DCMAKE_​INSTALL_​PREFIX=../install -Ddebug=off -Doptimize=on -Dopenmp=on -Dnew_​w=on -Ddevice_​history=off -Ddisable_​xs_​cache=on -Ddevice_​printf=off Benchmark: Depleted Fuel Inactive Batch Performance on HM-Large Reactor with 40M particles

NVIDIA A100: Test by Argonne National Laboratory as of 5/23/2022, 2x AMD EPYC 7532, 256 GB DDR4 3200, HT On, Turbo On, ucode0x8301038. 1x A100 40GB PCIe. OpenSUSE Leap 15.3, Linux Version 5.3.18, Libraries: CUDA 11.6 with OpenMP clang compiler. Build Knobs: cmake--preset=llvm_​a100 -DCMAKE_​UNITY_​BUILD=ON -DCMAKE_​UNITY_​BUILD_​MODE=BATCH -DCMAKE_​UNITY_​BUILD_​BATCH_​SIZE=1000 -DCMAKE_​INSTALL_​PREFIX=./install -Ddebug=off -Doptimize=on -Dopenmp=on -Dnew_​w=on -Ddevice_​history=off -Ddisable_​xs_​cache=on -Ddevice_​printf=off. Benchmark: Depleted Fuel Inactive Batch Performance on HM-Large Reactor with 40M particles  

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Six Intel® Data Center GPU Max Series have a 6.0x performance uplift on LAMMPS(Liquid Crystal) when hosted by Intel® Xeon® Max Series CPUs using DDR5 only and 10x performance when hosted by Intel® Xeon® CPU Max Series with the HBM enabled.

The high bandwidth memory on the host CPUs enables a 1.57x performance uplift for the GPUs running LAMMPS compared to DDR5.

AMD EPYC 7773X: Test by Intel as of 10/28/2022. 1-node, 2x AMD EPYC 7773X, HT On, Turbo On, NUMA configuration NPS=4, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, Rocky Linux 8.6 (Green Obsidian), Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, LAMMPS built with the Intel package

Intel® Xeon® CPU Max Series : Test by Intel as of 10/28/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, Total Memory 128 GB HBM2e, BIOS EGSDCRB1.DWR.0085.D12.2207281916, ucode 0xac000040, SUSE Linux Enterprise Server 15 SP3, Kernel 5.3.18, oneAPI 2022.3.0, LAMMPS built with the Intel package

Intel® Data Center GPU Max Series with DDR Host: Test by Intel as of 10/28/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, Total Memory 1024 GB DDR5-4800 + 128 GB HBM2e, Memory Mode: Flat, HBM2e not used, 6x Intel® Data Center GPU Max Series, BIOS EGSDCRB1.DWR.0085.D12.2207281916, ucode 0xac000040, Agama pvc-prq-54, SUSE Linux Enterprise Server 15 SP3, Kernel 5.3.18, oneAPI 2022.3.0, LAMMPS built with the Intel package

Intel® Data Center GPU Max Series with HBM Host: Test by Intel as of 10/28/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, Total Memory 128 GB HBM2e, 6x Intel® Data Center GPU Max Series, BIOS EGSDCRB1.DWR.0085.D12.2207281916, ucode 0xac000040, Agama pvc-prq-54, SUSE Linux Enterprise Server 15 SP3, Kernel 5.3.18, oneAPI 2022.3.0, LAMMPS built with the Intel package

Maximize Possibilities for High Performance Computing & AI

Jeff McVeigh

Intel® Xeon® CPU Max Series-based clusters with DDR5 consume 63% lower power for their CPUs and memory to achieve equivalent performance to a cluster using AMD EPYC 7773X

Intel® Xeon® CPU Max Series-based clusters using HBM only, consume 68% lower power for their CPUs and memory to achieve equivalent performance to a cluster using AMD EPYC 7773X

AMD EPYC 7773X: Test by Intel as of 10/7/2022. 1-node, 2x AMD EPYC , HT On, Turbo On, cTDP – 280, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10 rev5.22, ucode revision=0xa001224, Rocky Linux 8.6, Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, HPCG from MKL_​v2022.1.0. Power calculated using CPU TDP and 7W per DIMM for 100 nodes for equal performance

Intel® Xeon® CPU Max Series: Test by Intel as of 9/2/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT On, Turbo On, SNC4, Total Memory 128 GB (8x16GB HBM2 3200MT/s), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, HPCG from MKL_​v2022.1.0. Power calculated using CPU TDP and 7W per DIMM for 31 nodes for equal performance

Demos

Demo Claim Claim Details

Acceleration with High Bandwidth Memory

Intel® Xeon® CPU Max Series is 2.3x faster compared to Milan-X CPU with traditional DDR memory

Intel® Xeon® CPU Max Series: Test by Intel as of 10/12/22. 1-node, 2x Intel® Xeon® Sapphire Rapids Plus HBM (Family 6 Model 143 Stepping 6), 112 cores, HT ON, Turbo ON, NUMA configuration SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode revision=0x2c000020, CentOS Stream 8, Linux version 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, MPAS-A V7.3 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-O3 -march=core-avx2 -convert big_​endian -free -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, MPAS-A V7.3, benchmark_​60km dycore, score=0.3856 s/timestep.

Milan-X 7773X: Test by Intel as of 10/12/y22. 1-node, 2x AMD EPYC 7773X 64-Core Processor @ 2.2GHz, 128 cores, HT On, Turbo On, NUMA configuration NPS=4, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, Rocky Linux 8.6 (Green Obsidian), Linux version 4.18.0-372.26.1.el8_​6.crt1.x86_​64, MPAS-A V7.3 build with Intel® Fortran Compiler Classic and Intel® MPI from 2022.3 Intel® oneAPI HPC Toolkit with compiler flags “-O3 -march=core-avx2 -convert big_​endian -free -align array64byte -fimf-use-svml=true -fp-model fast=2 -no-prec-div -no-prec-sqrt -fimf-precision=low”, MPAS-A V7.3, benchmark_​60km dycore, score=0.9204 s/timestep.

AI Accelerated HPC: LAMMPS + DeePMD

Intel® Xeon® CPU Max Series is 3x faster as compared with Intel® Xeon® 8380 CPU with traditional DDR memory

Intel® Xeon® CPU Max Series with BF16/AMX: Test by Intel® as of 10/12/2022. 1-node, Intel® Xeon® CPU Max Series, 56 cores, Total Memory 128 GB (8x16GB HBM2 3200MT/s [3200MT/s]), kernel 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-13), BIOS version SE5C7411.86B.8424.D03.2208100444, CentOS Stream 8, 1.9 GHz Base Frequency, 2.4 GHz All-core Max Frequency, https://github.com/deepmodeling/deepmd-kit, Tensorflow 2.9, Horovod 0.24.0, oneCCL-2021.5.2, Python 3.9, Single Instance Training: ppn=1, omp=2, intra=14, inter=14, bfloat16 precision in neural network ; output to FP32, time to complete training=29 seconds.

Intel® Xeon® Platinum 8380: Test by Intel® as of 10/20/2022. 1-node, Intel® Xeon® Platinum 8380 processor, 40 cores, Total Memory 256 GB (16x16GB DDR4 3200 MT/s [3200MT/s]), kernel 4.18.0-372.26.1.eI8_​6.crt1.x86_​64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), BIOS Version SE5C620.86B.01.01.0006.2207150335, Rocky Linux 8.6 (Green Obsidian), 2.3 GHz Base Frequency, 3.0 GHz All Core Frequency, https://github.com/deepmodeling/deepmd-kit, Tensorflow 2.9, Horovod 0.24.0, oneCCL-2021.5.2, Python 3.9, SIngle Instance Training: ppn=1, omp=2, intra=14, inter=14, oneCCL-2021.5.2, bfloat16 precision in neural network ; output to FP32, time to complete training= 88 seconds.

AI Accelerated HPC: LAMMPS + DeePMD

Intel® Xeon® CPU Max Series observes a LAMMPS+DeePMD simulation speedup of up to 135x as compared with equivalent LAMMPS(ReaxFF) Simulation

Intel® Xeon® Platinum 8380 with traditional DDR memory observes a LAMMPS+DeePMD simulation speedup of up to 100x as compared with equivalent LAMMPS(ReaxFF) Simulation

Intel® Xeon® CPU Max Series with BF16/AMX: Test by Intel® as of 10/12/2022. 1-node, Intel® Xeon® CPU Max Series, 56 cores, Total Memory 128 GB (8x16GB HBM2 3200MT/s [3200MT/s]), kernel 5.19.0-rc6.0712.intel_​next.1.x86_​64+server, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-13), BIOS version SE5C7411.86B.8424.D03.2208100444, CentOS Stream 8, 1.9 GHz Base Frequency, 2.4 GHz All-core Max Frequency, https://github.com/deepmodeling/deepmd-kit, Tensorflow 2.9, Horovod 0.24.0, oneCCL-2021.5.2, Python 3.9, LAMMPS 23-Jun2022, 2 nanosecond simulation of 4032 Si atoms: ppn=112, omp=1, time to complete DeePMD+LAMMPS simulation is 0.882 hours. For ReaxFF simulation, 2 nanosecond simulation of 4032 atoms: ppn=16, omp=4, time to complete simulation is 115 hours.

Intel® Xeon® Platinum 8380: Test by Intel as of 10/20/2022. 1-node, Intel® Xeon® Platinum 8380 processor, 40 cores, Total Memory 256 GB (16x16GB DDR4 3200 MT/s [3200MT/s]), kernel 4.18.0-372.26.1.eI8_​6.crt1.x86_​64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), BIOS Version SE5C620.86B.01.01.0006.2207150335, Rocky Linux 8.6 (Green Obsidian), 2.3 GHz Base Frequency, 3.0 GHz All Core Frequency, https://github.com/deepmodeling/deepmd-kit, Tensorflow 2.9, Horovod 0.24.0, oneCCL-2021.5.2, Python 3.9, LAMMPS 23-Jun2022, 2 nanosecond simulation of 4032 Si atoms: ppn=80, omp=1, time to complete DeePMD+LAMMPS simulation is 1.386 hours. For ReaxFF simulation, 2 nanosecond simulation of 4032 atoms: ppn=16, omp=4, time to complete simulation is 139 hours.