Performance Index
Vision 2022
Number | Speaker | Session | Claim | Claim Details/Citation | Testing Date |
---|---|---|---|---|---|
1 | Day 1 Keynote | Eitan Medina | Gaudi2 BERT Phase-2 Training - sequences-per-second throughput: 2.8x relative to A100 (80GB); 3.3x relative to A100 (40GB); 7.7x relative to V100. | Gaudi2 sequences-per-second throughput on BERT Phase-1 Training: - A100-80GB : Measured by Habana on Azure instance Standard_ND96amsr_A100_v4 using single A100-80GB with TF docker 21.02-tf2-py3 from NGC (Phase-1: Seq len=128, BS=312, accu steps=1024; Phase-2: seq len=512, BS=40, accu steps=3072) - A100-40GB : Measured by Habana on DGX-A100 using single A100-40GB with TF docker 21.12-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048) - V100-32GB : Measured by Habana on p3dn.24xlarge using single V100-32GB with TF docker 21.12-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=8, accu steps=4096) - Gaudi2: Measured by Habana on Gaudi2-HLS system using single Gaudi2 with SynapseAI TF docker 1.4.0-435 (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048) Results may vary. | 22-Apr |
2 | Day 1 Keynote | Eitan Medina | Gaudi2 BERT Effective Throughput Combining Phase-1 and Phase-2 (per standard industry practice) - sequences-per-second :2.x relative to A100 (80GB); 2.4x relative to A100 (40GB); 5.3x relative to V100. | Gaudi2 sequences-per-second on BERT Effective Throughput combining Phase-1 and Phase-2: - - - A100-80GB : Measured by Habana on Azure instance Standard_ND96amsr_A100_v4 using single A100-80GB with TF docker 21.02-tf2-py3 from NGC (Phase-1: Seq len=128, BS=312, accu steps=1024; Phase-2: seq len=512, BS=40, accu steps=3072) - A100-40GB : Measured by Habana on DGX-A100 using single A100-40GB with TF docker 21.12-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048) - V100-32GB : Measured by Habana on p3dn.24xlarge using single V100-32GB with TF docker 21.12-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=8, accu steps=4096) - Gaudi2: Measured by Habana on Gaudi2-HLS system using single Gaudi2 with SynapseAI TF docker 1.4.0-435 (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048) Results may vary. | 22-Apr |
3 | Day 1 Keynote | Eitan Medina | Enterprises will increasingly rely on deep learning; 2021 - 2026 projections indicate: Data center accelerator market CAGR of 36.7%; 1/3 of servers shipped in 2026 will run DL training or inference; DL to acount for majority of cloud workloads; training applications to be the majority of the server apps by | Source: https://www.businesswire.com/news/home/20210819005361/en/Global-Data-Center-Accelerator- Market-Forecast-to-2026-Artificial-Intelligence-to-Drive-the-Growth-of-Cloud-Data-Center-Market--- ResearchAndMarkets.com | |
4 | Day 1 Keynote | Eitan Medina | "On our own models the increase in price performance met and even exceeded the published 40% mark." | Quote by Rand Chaim, Mobileye, ML Algorithm Engineer, based on Mobileye evaluation of Gaudi-based DL1; https://towardsdatascience.com/training-on-aws-with-habana-gaudi-3126e183048 | |
5 | Day 1 Keynote | Eitan Medina | Gaudi2 images-per-second throughput on ResNet-50: 1.9x relative to A100 (80GB); 2.0x relative to A100 (40GB); 4.1x relative to V100; | RESNET50 CLAIM: Sources for performance substantiation for ResNet-50: (note that the ResNet-50 model script is also run as a live demonstration to show the Gaudi2 performance which conforms with the test configuration noted below. - A100-80GB : Measured by Habana on Azure instance Standard_ND96amsr_A100_v4 using single A100-80GB using TF docker 21.12-tf2-py3 from NGC (optimizer=sgd, BS=256) - A100-40GB : Measured by Habana on DGX-A100 using single A100-40GB using TF docker 21.12-tf2-py3 from NGC (optimizer=sgd, BS=256) - V100-32GB : Measured by Habana on p3dn.24xlarge using single V100-32GB using TF docker 21.12-tf2-py3 from NGC (optimizer=sgd, BS=256) - Gaudi2:Measured by Habana on Gaudi2-HLS system using single Gaudi2 using SynapseAI TF docker 1.4.0-435 (BS=256) Results may vary. | Apr-22 |
6 | Day 1 Keynote | Eitan Medina | Customer savings with Gaudi-based Amazon DL1 instances ResNet-50 $/image throughput cost: DL1 - 46% lower than A100-based P4d DL1 - 60% lower than V100-based P3 BERT-Large Pre-Training Phase-1 $/sequence throughphput cost: DL1 - 31% lower than A100-based P4d DL1 - 54% lower than V100 -based P3 BERT-Large Pre-Training Phase-2 $/sequence throughput cost: DL1 - 57% lower than A100-based P4d DL1 - 75% lower than A100-based P3 | Cost savings based on Amazon EC2 On-Demand pricing for P3, P4d and DL1 instances respectively. Performance data was collected and measured using the following resources. Results may vary. Habana BERT-Large Model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/nlp/bert Habana ResNet50 Model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/computer_vision/Resnets/resnet_keras Habana SynapseAI Container: https://vault.habana.ai/ui/repos/tree/General/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.7.0 Habana Gaudi Performance: https://developer.habana.ai/resources/habana-training-models/ A100 / V100 Performance: https://ngc.nvidia.com/catalog/resources/nvidia:bert_for_tensorflow/performance, https://ngc.nvidia.com/catalog/resources/nvidia:resnet_50_v1_5_for_tensorflow/performance, results published for DGX A100-40G and DGX V100-32G | 21-Sep |
7 | Day 1 Keynote | Eitan Medina | While Gaudi2 is implemented in the same 7nm process as the A100, it delivers twice the throughput for both ResNet50 and BERT models, the two most popular vision and language models. | RESNET50 CLAIM: Sources for performance substantiation for ResNet-50: (note that the ResNet-50 model script is also run as a live demonstration to show the Gaudi2 performance which conforms with the test configuration noted below. - A100-80GB : Measured by Habana on Azure instance Standard_ND96amsr_A100_v4 using single A100-80GB using TF docker 21.12-tf2-py3 from NGC (optimizer=sgd, BS=256) - A100-40GB : Measured by Habana on DGX-A100 using single A100-40GB using TF docker 21.12-tf2-py3 from NGC (optimizer=sgd, BS=256) - V100-32GB : Measured by Habana on p3dn.24xlarge using single V100-32GB using TF docker 21.12-tf2-py3 from NGC (optimizer=sgd, BS=256) - Gaudi2:Measured by Habana on Gaudi2-HLS system using single Gaudi2 using SynapseAI TF docker 1.4.0-435 (BS=256) Results may vary. BERT CLAIM: Effective throughput combining Phase-1 and Phase-2 - A100-80GB : Measured by Habana on Azure instance Standard_ND96amsr_A100_v4 using single A100-80GB with TF docker 21.02-tf2-py3 from NGC (Phase 1: Seq len=128, BS= 312, accu steps=1024; Phase-2: seq len=512, BS=40, accu steps=3072) - A100-40GB : Measured by Habana on DGX-A100 using single A100-40GB with TF docker 21.12-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048) - V100-32GB : Measured by Habana on p3dn.24xlarge using single V100-32GB with TF docker 21.12-tf2-py3 from NGC (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=8, accu steps=4096) - Gaudi2: Measured by Habana on Gaudi2-HLS system using single Gaudi2 with SynapseAI TF docker 1.4.0-435 (Phase-1: Seq len=128, BS=64, accu steps=1024; Phase-2: seq len=512, BS=16, accu steps=2048) Results may vary. | 22-Apr |
8 | Day 1 Keynote | Michelle Johnston Holthaus | There are nearly 140M commercial devices being used globally that are more than four years old. | Source: Internal, Intel | |
9 | Day 1 Keynote | Michelle Johnston Holthaus | AMT lets our IT team manage each of our system's mobile cart devices, resulting in improved HW uptime, increased employee & patient satisfaction, and reduced troubleshooting times by 50%. | Source: Intermountain Healthcare | |
10 | Day 1 Keynote | Michelle Johnston Holthaus | 12th Gen HX delivers unrivaled mobile performance. | Based on unique features and estimates derived from SPECworkstation™ v3.1 CPU Scores: Media and Entertainment, product development , life sciences, financial services and energy measurements on 12th Gen Intel Core i9-12900HX with NVIDIA RTX 3080ti vs 11th Gen Intel Core i9-11980HK with RTX 3080 and AMD Ryzen R9 6900HX with RTX 3060. OS: Win 11 | |
11 | Day 1 Keynote | Michelle Johnston Holthaus | 12th Gen HX is the world's best mobile workstation platform. | Source: Intel. Based on superior performance of 12th Gen Intel Core i9 12900HX against Intel Core i9 11980HK, Intel Core i9 12900HK, AMD Ryzen 9 5900HX, and Apple M1 Max. Intel processor performance is estimated based on measurements with Intel pre-production platforms. AMD processor performance is estimated based on measurements on an ASUS ROG G713 Ryzen R9-6900HX with RTX 3060. The metric used is the geometric mean of C/C++ integer benchmarks in SPEC*int_rate_base2017 IC 2021.2 (1-copy) and SPEC*int_rate_base2017 IC 2021.2 (n-copy). For workload and configuration details, see www.intel.com/PerformanceIndex. Results may vary. | |
12 | Day 1 Keynote | Michelle Johnston Holthaus | • Telehealth use by physicians jumped from 25 percent to almost 80 percent • Remote patient monitoring jumped as well, with almost twice as many physicians using it Finally, some 26.2 percent of healthcare professionals worked in a practice that used videoconferencing to consult with colleagues in 2020, up from 11.6 percent in 2018 | Source: https://www.ama-assn.org/system/files/2020-prp-telehealth.pdf | |
13 | Day 1 Keynote | Sandra Rivera | StubHub's database deployment became more economical running on our latest Xeon processors. They saw a 64% performance increase running database workloads with lower licensing costs. | Source: StubHub internal measurements. https://www.intel.com/content/www/us/en/customer-spotlight/stories/salesforce-customer-story.html | |
14 | Day 1 Keynote | Sandra Rivera | 3 rd Gen Xeon Scalable processors provide up to a 53% performance gain on raw input, compared to the previous generation. | Results provided by Salesforce and were based on its internal tests, as of November 2021. Contact Salesforce for further details. https://www.intel.com/content/www/us/en/customer-spotlight/stories/salesforce-customer-story.html | |
15 | Day 1 Keynote | Sandra Rivera | On average customers have realized 60% performance improvements and 20-30% cost reductions after deploying Granulates real-time optimization software | Center data taken February 22, 2022. For more information contact Granulate. Your costs and results may vary. | |
16 | Day 1 Keynote | Sandra Rivera | Deploying Granulate software allowed Nylas to cut 35% off the total cost of their compute spend. | Results provided by Nylas. Contact Nylas for further details. (we have a PMR in place with Nylas for this claim.) | |
17 | Day 1 Keynote | Sandra Rivera | By 2025, Gartner predicts that more than 50% of enterprise-generated data will be outside of central data centers. | Source: Garnter, Predicts 2022: The Distributed Enterprise Drives Computing to the Edge. October 20, 2021 | |
18 | Day 1 Keynote | Sandra Rivera | Gaudi is deployed today in the AWS EC2 cloud and provides up to 40% better price/performance as compared to leading Nvidia A100-based solutions. | The price/performance claim is made by AWS and based on AWS's internal testing. Habana Labs does not control or audit third-party data. More information can be found at: habana.ai/AWS-launches-ec2-dl1-instances/. | |
19 | Day 1 Keynote | Sandra Rivera | We recently worked with Salesforce to optimize one of their proxy workloads to our hardware. This optimization work delivered an overall gen-on-gen throughput gain of 53%. Salesforce was also able to cut platforms qualification time from 1 year to 3 months which helped them accelerate adoption of their services both on-prem and in the public cloud. | This claims is made by Salesforce and based on Salesforce's internal testing. Intel does not control or audit third-party data. | |
20 | Day 1 Keynote | Nick McKeown | 300+ market ready solutions. Since we launched these solutions about four years ago, we have seen more than 45,000 deployments across 160 countries. And what is most telling is that more than 20,000 of those deployments took place in 2021 alone. | Solutions are a part of Intel's IoT Market Ready Solutions & IoT RFP Ready Kits programs, submitted by Intel Partners and approved by our partner program team. Number of deployments and countries of deployment are reported directly from our partners through an online reporting tool in a quarterly process. | |
21 | Day 1 Keynote | Nick McKeown | 87% of consumers prefer to shop in stores with touchless or self-checkout options | This came directly from Nouish and Bloom - please see statement at about 00:50 in their founder's video here - https://www.youtube.com/watch?v=M0_56KApryI | |
22 | Day 1 Keynote | Christoph Schell | The amount of data generated really makes you wonder, how do we secure all this data and make sure confidential information is kept private? We recently conducted a study exploring how organizations approach security innovation in an increasingly digital world to stay ahead of the evolving threat landscape. Based on what we heard, deploying hardware-assisted security solutions are a critical part of building a robust security strategy. • 36% of respondents have adopted hardware-assisted security solutions and 47% of respondents say their organizations will adopt these solutions in the next six months (24%) or 12 months (23%). o There is a growing awareness that hardware-assisted security capabilities are critical to a robust security strategy. • Of those same 36% of respondents using hardware-assisted security solutions, 85% say hardware and/or firmware-based security is a high or very high priority in their organization • 64% say it is important for a vendor to offer both hardware- and software-assisted security capabilities. | Intel sponsored study: Newsbyte https://www.intel.com/content/www/us/en/newsroom/news/study-secure-systems-start-hardware.html#gs.z99xy4 | 4/12/2022 |
23 | Day 1 Keynote | Christoph Schell | Last year, according to IBM Security, the average total cost of a data breach to a corporation increased by nearly 10% YoY to $4.2M . This was the highest average total cost in the history of IBM's report. | Cost of a Data Breach Report 2021: https://www.ibm.com/downloads/cas/OJDVQGRY | 7/28/2021 |
24 | Day 1 Keynote | Raja Koduri | Video accounts for over 80% of the global internet traffic and is not going to slow down. This is driving significant demand for compute resources for graphics workloads in the data center. | https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_2021_Forecast_Highlights.pdf | |
25 | Day 1 Keynote | Rick Stevens (Argonne) | The high memory bandwidth that you've integrated into Sapphire Rapids, and the acceleration from Ponte Vecchio will be a game changer for scientific applications - like 10x or more speed up over current supercomputers. | This claim is made by Argonne National Laboratory and is based on Argonne's internal testing. Intel does not control or audit third-party data. | |
26 | Day 2 Keynote | Greg Lavender | BeeKeeper AI is using Intel technologies to accelerate healthcare AI development 30-40%. | BeeKeeper AI uses Intel SGX hardware-based security capabilities and Microsoft Azure's confidential computing infrastructure to provide a zero-trust platform. It enables an AI algorithm to compute against multiple real-world clinical data sets without compromising the privacy of the data or the intellectual property of the algorithm model. This is accelerating healthcare AI development and deployment innovation by more than 30-40% when compared to the current method. These statements were provided by Beekeper AI. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. |
Number | Session | Speaker | Claim | Claim Details/Citation |
---|---|---|---|---|
1 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Jennifer Talerico | 3rd party stat: By 2025, an estimated 40% of employees will work remotely. | https://www.consultancy.eu/news/5273/research-40-of-employees-will-work-from-home-by-2025#:~:text=By 2025, 40% of employees around the world,from businesses with $5 billion plus in revenues |
2 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Jennifer Talerico | 58% of the workforce now needs new skill sets to in order to do their jobs successfully. | https://www.gartner.com/en/newsroom/press-releases/2021-02-03-gartner-hr-research-finds-fifty-eight-percent-of-the-workforce-will-need-new-skill-sets-to-do-their-jobs-successfully |
3 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Jennifer Talerico | In addition, our real-world testing gave us the data we needed to justify not only refreshing sooner, but also increasing the computing capability given the shift ot the latest OS and modern software applications...This data showed a faster refresh to a higher performing PC can pay for itself in less than a year. | Source: Internal, Intel |
4 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Chris Walker | Built on our Intel 7 process node, these processors are Intel's first high performance hybrid design featuring TWO core architectures: performance core and efficient core | Performance hybrid architecture combines two new core microarchitectures, Performance-cores (P-cores) and Efficient-cores (E-cores), on a single processor die. Select 12th Gen Intel® Core™ processors (certain 12th Gen Intel Core i5 processors and lower) do not have performance hybrid architecture, only P-cores. |
5 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Chris Walker | With Intel Thread Director built in, we can ensure that the right workloads move to the right cores to help deliver the best PC experiences possible. | Built into the hardware, Intel® Thread Director is provided only in performance hybrid architecture configurations of 12th Gen Intel® Core™ processors; OS enablement is required. Available features and functionality vary by OS. |
6 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Chris Walker | Overclocking | Overclocking may void warranty or affect system health. Learn more at intel.com/overclocking. Results may vary. |
7 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Chris Walker | With today's launch, we are delivering the World's Best Mobile Workstation Platform… | The 12th Generation Intel® Core™ i9-12900HX is the world's best mobile workstation platform based on unique features, including: Board Memory Support • First to industry to enable DDR5-4800, DDR4-3200, LPDDR5 5200, LPDDR4x-4267 Best in class connectivity - Wi-Fi 6E (Gig+) , Thunderbolt 4 • Intel® Killer™ Wi-Fi 6E : Low Latency Gameplay • Intel® Killer™ Wi-Fi 6E (Gig+): Intel® Double Connect • Thunderbolt™ 4: 40Gbps • Thunderbolt™ 4: Mandatory Certification Industry-pioneering PCIE Gen 4 (best in class) and Superior CPU performance of 12th Gen Intel Core i9-12900HX with RTX 3080Ti vs 11th Gen Intel Core i9-11980HK with RTX 3080 and vs AMD Ryzen R9-6900HX with RTX 3060. Performance results are based on testing as of 4/28/2022. Full configuration details available at www.intel.com/PerformanceIndex. |
8 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Chris Walker | Unrivaled mobile performance. (12th Gen HX) | Source Intel: Based on unique features and estimates derived from SPECworkstation™ v3.1 CPU Scores: Media and entertainment, product development, life sciences, financial services and energy measurements on 12th Gen Intel Core i9-12900HX with RTX 3080ti vs 11th Gen Intel Core i9-11980HK with RTX 3080, vs 12th Gen Intel Core i9 12900HK with RTX 3080Ti and AMD Ryzen R9 6900HX with RTX 3060. OS:Win 11 For all workload and configuration details, see www.intel.com/PerformanceIndex. Results may vary. |
9 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Stephanie Hallford | Intel brings together security, manageability, and performance into every aspect of the vPro portfolio. | As measured by each Intel vPro platform's tailored combination of performance, security, manageability, and stability solutions designed, integrated, and fine-tuned for particular business needs. All Intel vPro versions feature an eligible high performing Intel® Core™ processor, supported operating system, Intel LAN and/or WLAN silicon, firmware enhancements, and other hardware and software necessary to deliver the manageability use cases, security features, system performance and stability that define the platform. By validating business PCs against a rigorous specification defined for each product version, Intel vPro delivers tangible advantages for any business user. See www.intel.com/PerformanceIndex (platforms) for details. |
10 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Stephanie Hallford | Featuring Intel Threat Detection, we are the first and only business PC with hardware-based ransomware detection. | The Intel vPro platform delivers the first and only silicon-enabled AI threat detection to help stop ransomware and cryptojacking attacks for Windows-based systems. Intel TDT Anomalous Behavior Detection (ABD) is a hardware-based control flow monitoring and anomaly detection solution able to monitor business apps for early indicators of compromise, leveraging the Intel CPU to build dynamic AI models of "good" application behavior. See www.intel.com/PerformanceIndex (platforms) for details. No product or component can be absolutely secure. |
11 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Stephanie Hallford | In fact, in a survey of businesses that have deployed Intel vPro, they report close to a 200% return on investment | A Forrester Total Economic Impact™ Study Commissioned By Intel, January 2021 https://tools.totaleconomicimpact.com/go/intel/vproplatform/ From the information provided in the interviews and survey, Forrester constructed a Total Economic Impact™ framework for those organizations considering an investment in the Intel vPro® platform. The objective of the framework is to identify the cost, benefit, flexibility, and risk factors that affect the investment decision. Forrester took a multistep approach to evaluate the impact that the Intel vPro platform can have on an organization. |
12 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Stephanie Hallford | The 12th Gen Intel Core i9-12900 desktop processor provides up to 23% faster application performance than the competition when using Microsoft Excel during a Zoom video conference call, and up to 46% faster with Power BI while on a Zoom call. | As measured by Collaboration with Excel workflow as of Feb. 9, 2022. For workloads and configurations visit www.intel.com/PerformanceIndex. Results may vary. As measured by Collaboration with Power BI workflow as of Feb. 9, 2022. For workloads and configurations visit www.intel.com/PerformanceIndex. Results may vary. |
13 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Stephanie Hallford | I encourage you to review our CoalFire White Paper. CoalFire White Paper Link | |
14 | Day 2 Business Insights - NEX | Sachin Katti | NOTE: THIS CLAIM IS MADE BY AN INTEL PARTNER WHO IS PROVIDING A TESTIMONIAL VIDEO AS PART OF THIS SESSION: Speaker, Anand Oswal, Palo Alto Networks We've recently announced PAN-OS 10.2 Nebula - the Industry's first inline deep learning protection for network security, providing six times faster prevention and 48% more detection of evasive threats, by using AI software optimization and libraries Intel developed to utilize readily available HW acceleration for AI in Intel Xeon Scalable processors | Palo Alto Networks has provided this detail in support of the claim: Short explanation: These performance improvements are a comparison of their current 10.2 software product vs. the previous 10.1 release of the same product. Detailed technical explanation: "6x is compared to the default FP32(32bit floating point) implementation for the inference. This 6x is achieved by quantization into INT8 model with Intel neraul compressor and the utilization of VNNI instruct sets. This 6x is happening on both Cascade Lake CPU and Ice Lake CPU." 48% is more detection that is introduced in 10.2 software, which wasn't present in the prior SW. Note: ATP (Advanced thread prevention) was not called "Advanced" prior to 10.2. So essentially, 10.1 was using FP32 instructions and due to Intel's tools & instructions, they now in 10.2 use INT8 instructions leveraging VNNI, which delivers a 6x performance improvement & 48% gain. Bottom line: Everything is compared to the previous SW of 10.1 release. |
15 | BI_CCG_001 - Day 1 Business Insight: Modern Client Computing | Jennifer Talerico | The 12th Gen Intel® Core® Mobile HX workstation outperformed the AMD based desktop system in InvMark. The 12th Gen Intel® Core® HX based workstation scored 66984 on InvMark while the best AMD desktop in the leaderboards was 59687. | Based on InvMark benchmark within Autodesk Inventor; AMD score is available online at https://invmark.cadac.com/#/, Intel score was gathered on MSI laptop. Intel benchmark was completed on 5/09/22; AMD score was captured online on 05/09/22 based on report to database from 04/07/22. |
Number | Session | Speaker | Claim | Claim Details/Citation | Testing Date |
---|---|---|---|---|---|
1 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Performance increase of more than 1000X of a solution-based on 2nd Generation Intel® Xeon® Scalable using AVX-512 acceleration provided by Matlogica library. | https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xva-pricing-application-financial-services-white-papers.pdf | Claims via Matlogica |
2 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon Platinum 8380 is 1.72x better performance than AMD EPYC 7763 for XVA pricing ; Intel Xeon Platinum 8380 is 1.67x better performance than AMD EPYC 7763 for XVA Pricing & Greeks | Slide 12: 1.72x average performance gain - Intel Xeon Platinum 8380 vs AMD EPYC 7763: Intel® Xeon® Platinum 8380 40 cores @ 2.30GHz Mem: 16x SK Hynix 1.2v 32GB DDR4 @ 3200 MT/sec BIOS: SE5C6200.86B.0020.P23.2103261309 (Turbo ON, HT ON) OS: CentOS Linux release 8.4.2105 | 10/14/2021 |
3 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon Platinum 8380 is 1.72x better performance than AMD EPYC 7763 for XVA pricing ; Intel Xeon Platinum 8380 is 1.67x better performance than AMD EPYC 7763 for XVA Pricing & Greeks | 4.18.0-240.22.1.el8_3.crt4.x86_64; AMD EPYC™ 7763 64 cores @ 2.45GHz Mem: 16x SK Hynix 1.2v 32GB DDR4 @ 3200 MT/sec BIOS: GIGABYTE M06 Turbo ON, HT ON, Numa Node Per Socket =4 | 10/14/2021 |
4 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon Platinum 8380 is 1.72x better performance than AMD EPYC 7763 for XVA pricing ; Intel Xeon Platinum 8380 is 1.67x better performance than AMD EPYC 7763 for XVA Pricing & Greeks | OS: CentOS Linux release 8.4.2105 4.18.0-305.19.1.el8_4.x86_64 | 10/14/2021 |
5 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon Platinum 8380 is 1.72x better performance than AMD EPYC 7763 for XVA pricing ; Intel Xeon Platinum 8380 is 1.67x better performance than AMD EPYC 7763 for XVA Pricing & Greeks | ICC: 2021.4.0 20210910 | 10/14/2021 |
6 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon Platinum 8380 is 1.72x better performance than AMD EPYC 7763 for XVA pricing ; Intel Xeon Platinum 8380 is 1.67x better performance than AMD EPYC 7763 for XVA Pricing & Greeks | GCC: 8.4.1 20200928 (Red Hat 8.4.1-1) | 03/10/2022 |
7 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon Platinum 8380 is 1.72x better performance than AMD EPYC 7763 for XVA pricing ; Intel Xeon Platinum 8380 is 1.67x better performance than AMD EPYC 7763 for XVA Pricing & Greeks | AADC version: AADC-demo-2021-10-01-cd0737f-M9s6 | 03/10/2022 |
8 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon Platinum 8380 is 1.72x better performance than AMD EPYC 7763 for XVA pricing ; Intel Xeon Platinum 8380 is 1.67x better performance than AMD EPYC 7763 for XVA Pricing & Greeks | Threads/core: 2 Threads/Core for XVA-(Pricing only) & XVA-(Pricing + Greeks) Calculation | 03/10/2022 |
9 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Performance of the Quantifi conventional model vs AI model based on 3rd generation Intel® Xeon® Scalable processor with multiple batch sizes delivers up to 700x speed up | https://www.intel.com/content/dam/www/central-libraries/us/en/documents/quantifi-derivatives-pricing-white-paper-0621.pdf | 3/4/2021 |
10 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | Intel Xeon delivers up to 95% increased performance over AMD EPYC 7763 for Quantifi Credit Option Pricing Throughput | Slide 16: Up to 1.95x average performance gain - Intel Xeon Platinum 8380 vs. AMD EPYC 7763: 2S Intel Xeon 8380 CPU @ 2.30 GHz (40 cores/processor) Mem: 512GB DDR4-3200 BIOS: SE5C6200.86B.0022.D08.2103221623 (Turbo ON, HT ON) OS: CentOS Linux Version 8; 2S AMD EPYC 7763 @ 2.45 GHz (64 cores/processor) Mem: 512GB DDR4-3200 BIOS: M06 OS: CentOS Linux Version 8 Libraries: Python 3.8.11 Intel-tensorflow 2.6.0 Threads/core: 2 (all platforms) | 10/12/2021 |
11 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | I/O acceleration can deliver up to 20% faster calculations for XVA | https://www.quantifisolutions.com/accelerating-the-performance-of-large-scale-xva-workloads/ | 3/10/2022 |
12 | AIML001 - Accelerating Risk Calculations | Parviz Peiravi, Mahesh Bhat | 143% higher throughput than a solution based on NVMe SSDs | https://www.quantifisolutions.com/accelerating-the-performance-of-large-scale-xva-workloads/ | 3/10/2022 |
13 | AIML002 DCAI Advancing Deep Learning in the Data Center - Habana | Sachin Katti | 3rd Party Claim made in a Video provided by Palo Alto Networks which says: "And we've recently announced PAN-OS 10.2 Nebula - the Industry's first inline deep learning protection for network security, providing six times faster prevention and 48% more detection of evasive threats, by using AI software optimization and libraries Intel developed to utilize readily available HW acceleration for AI in Intel Xeon Scalable processors." | Source: Palo Alto Networks disclosure as part of a blog they posted. https://www.paloaltonetworks.com/blog/2022/03/network-security-innovation-and-prevention/ | 3/10/2022 |
14 | DTCC010: Data Center of the Future | Jen Huffstetler | 29 Federated international medical centers. 80K brain tumor diagnosis each year wordwide. 99% accuracy of model trained for brain tumor detection | Venture Beat: Intel partners with Penn Medicine to develop brain tumor classifier | 11-May-20 |
15 | DTCC010: Data Center of the Future | Jen Huffstetler | Xeon continue to deliver big generation gains for healthcare workloads 57% for NAMD vs previous gen 60% for GROMACS vs previous gen 64% for LAMMPs vs previous gen 61% for RELION vs previous gen | See [108] at www.intel.com/3gen-xeon-config . Results may vary | 20-Feb-21 |
16 | DTCC010: Data Center of the Future | Jen Huffstetler | 66% higher AI inference performance | See [122] at www.intel.com/3gen-xeon-config Results may vary | |
17 | DTCC010: Data Center of the Future | Jen Huffstetler | Up to 50% reduction in CAPEX build costs. Up to 95% reduction in cooling OPEX. Up to 10x increase in computing density with liquid immersion cooling | Source: Submer. https://submer.com/business-cases/ | 25-Mar-22 |
18 | DTTC003 Maximizing Performance, Scalability and Operational Efficiency with Kubernetes and Splunk SmartStore | Murali Madhanagopal, Sr. Architect | Up to 2.95X higher Splunk indexing performance (886 MBps) scaling containers and up to 4.4X better Splunk search performance (400 searches) scaling containers. | Up to 2.95X higher Splunk indexing performance (886 MBps) scaling containers and up to 4.4X better Splunk search performance (400 searches) scaling containers. | 03/10/2022. |
19 | Bare metal Config : 9-nodes, 2x Intel Xeon Platinum 8360Y processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200[3200]) total DDR4 memory, HT on, Turbo on, CentOS 8.2.2004, 1x Intel S4610 SSD 960GB, Intel P4510 SSD 4.0TB, 1x P5800X SSD 1.6TB, 1x 25GbE Intel Ethernet adapter E810, Splunk Enterprise v8.2.0, Confluent Platform v7.0.1, Splunk 5 indexers and 1 node hosting 3 search heads | ||||
20 | Kubernetes Container Config 2: 9-nodes, 2x Intel Xeon Platinum 8360Y processor, 512 GB DDR4 memory, HT on, Turbo on, CentOS 8.2.2004, 1x Intel S4610 SSD 960 GB, Intel P4510 SSD 4.0TB, 1x P5800X SSD 1.6TB , 1x 25GbE Intel Ethernet adapter E810, Splunk Enterprise v8.2.0, Confluent Platform v7.0.1, Confluent for Kubernetes 2.2.0, Splunk Operator for Kubernetes v1.0.1, Kubernetes v1.23.0, Splunk Container Indexers: 35; 24 CPU, 48 GB memory each, Splunk Container Search Heads: 9; 24 CPU, 48GB memory, 300GB SSD, Number of Kafka Container Brokers: 18, no resource limits, # Splunk Ingestion Pipelines: 8, Object Store: Pure Storage FlashBlade;search index="*" earliest=-660s latest=-600s | stats count by splunk_server (in cache search) ; 75% searches; search index="*" earliest=-21660s latest=-21600s | stats count by splunk_server (from Pure Storage s3) - 25% of searches | ||||
21 | DTTC003 Maximizing Performance, Scalability and Operational Efficiency with Kubernetes and Splunk SmartStore | Murali Madhanagopal, Sr. Architect | A super-sparse search downloading 376 GB in only 84 seconds | Kubernetes Container Config 2: 9-nodes, 2x Intel Xeon Platinum 8360Y processor, 512 GB DDR4 memory, HT on, Turbo on, CentOS 8.2.2004, 1x Intel S4610 SSD 960 GB, Intel P4510 SSD 4.0TB, 1x P5800X SSD 1.6TB , 1x 25GbE Intel Ethernet adapter E810, Splunk Enterprise v8.2.0, Confluent Platform v7.0.1, Confluent for Kubernetes 2.2.0, Splunk Operator for Kubernetes v1.0.1, Kubernetes v1.23.0, Splunk Container Indexers: 35; 24 CPU, 48 GB memory each, Splunk Container Search Heads: 9; 24 CPU, 48GB memory, 300GB SSD, Number of Kafka Container Brokers: 18, no resource limits, # Splunk Ingestion Pipelines: 8, Object Store: Pure Storage FlashBlade;search index="*" earliest=-660s latest=-600s | stats count by splunk_server (in cache search) ; 75% searches; search index="*" earliest=-21660s latest=-21600s | stats count by splunk_server (from Pure Storage s3) - 25% of searches | 03/10/2022 |
22 | DTCC002 - Workload Placement | Kevin Johnson, Christine McMonigal | Up to 11X Higher batch AI inference perf on ResNet50 using TensorFlow and Intel DL Boost | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [118] | 2/17/2021 |
23 | DTCC002 - Workload Placement | Kevin Johnson, Christine McMonigal | Up to 1.65X Higher performance on CloudXPRT cloud data analytics usage | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [99] | 2/4/2021 |
24 | DTCC002 - Workload Placement | Kevin Johnson, Christine McMonigal | Up to 1.64X Higher MySQL transactions per minute | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [81] | 2/5/2021 |
25 | DTCC002 - Workload Placement | Kevin Johnson, Christine McMonigal | Up to 1.48X Higher secure requests to content management systems (Wordpress with HTTPS) | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [97] | 3/15/2021 |
26 | DTCC002 - Workload Placement | Kevin Johnson, Christine McMonigal | Up to 5.1X Higher Splunk search performance scaling containers | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [110] | 4/16/2021 |
27 | DTCC002 - Workload Placement | Kevin Johnson, Christine McMonigal | Up to 4.2X Higher NGINX web server connections with Intel Crypto Acceleration | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [90] | 1/17/2021 |
28 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Up to 11X Higher batch AI inference perf on ResNet50 using TensorFlow and Intel DL Boost | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [118] | 2/17/2021 |
29 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Up to 1.65X Higher performance on CloudXPRT cloud data analytics usage | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [99] | 2/4/2021 |
30 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Up to 1.64X Higher MySQL transactions per minute | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [81] | 2/5/2021 |
31 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Up to 1.48X Higher secure requests to content management systems (Wordpress with HTTPS) | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [97] | 3/15/2021 |
32 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Up to 5.1X Higher Splunk search performance scaling containers | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [110] | 4/16/2021 |
33 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Up to 4.2X Higher NGINX web server connections with Intel Crypto Acceleration | https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [90] | 1/17/2021 |
34 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Intel-Optimized Open-Source Software Business Value | https://www.intel.com/content/dam/www/central-libraries/us/en/documents/idc-business-value-of-optimized-software.pdf | |
35 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Open Source at Intel | https://www.linuxfoundation.org/wp-content/uploads/2020_kernel_history_report_082720.pdf and https://download.intel.com/newsroom/2021/client-computing/OSS-Fact-sheet.pdf | |
36 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | 22-44% perf gains out of the box | Wordpress gen/gen gains per vCPU with crypto acceleration across a range of instance sizes: Wordpress v5.2 measured by Intel, Sept 23, 2021 using 3rd Gen Intel Xeon Scalable processor-based M6i (us-east-2 region) instances comparing to 2nd Gen Intel Xeon Scalable processor-based M5 (us-east-2 region). Common config: 16GB, 64GB, 256GB across 4, 16, 64 vCPU instance sizes. Amazon Elastic Block Store 8GB. Ubuntu 20.04.3 LTS, Kernel: 5.11.0-1017-aws, GCC 9.3.0 compiler. Other software: MariaDB v10.3.31, Nginix v1.18.0, PHP v7.3.30-1. Curve exchange key: TLSv1.3-secp384r1-secp384r1; TLS_AES_256_GCM_SHA384. | 9/23/2021 |
37 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | TensorFlow with oneDNN Default! 45-74% perf gains | Tensorflow gen/gen gains using oneDNN: BERT-Large SQuAD: 1.45x higher INT8 real-time inference throughput & 1.74x higher INT8 batch inference throughput on Ice Lake vs. prior generation Cascade Lake Platinum 8380: New:1-node, 2x Intel Xeon Platinum 8380 processor on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, ucode X261, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-65-generic, 1x Intel_SSDSC2KG96, Intel SSDPE2KX010T8, BERT - Large SQuAD, gcc-9.3.0, oneDNN 1.6.4, BS=1,128 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 3/12/2021. Baseline: Platinum 8280: 1-node, 2x Intel Xeon Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/ 32GB/ 2933) total DDR4 memory, ucode 0x5003003, HT on, Turbo on, Ubuntu 20.04 LTS, 5.4.0-48-generic, 1x Samsung_SSD_860, Intel SSDPE2KX040T8, BERT - Large SQuAD, gcc-9.3.0, oneDNN 1.6.4, BS=1,128 INT8, TensorFlow 2.4.1 with Intel optimizations for 3rd Gen Intel Xeon Scalable processor, upstreamed to TensorFlow- 2.5 (container- intel/intel-optimized-tensorflow:tf-r2.5-icx-b631821f), Model zoo: https://github.com/IntelAI/models/tree/icx-launch-public/quickstart/, test by Intel on 2/17/2021. | 2/17/2021 |
38 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | Intel m6i wordpress perfromance (TPS) with crypto acceleration optimizations relative to unoptimized | Wordpress v5.2 measured by Intel, Sept 23, 2021 using 3rd Gen Intel Xeon Scalable processor-based M6i (us-east-2 region) comparing crypto performance with and without qatengine using IFMA crypto instructions. Amazon Elastic Block Store 8GB. Ubuntu 20.04.3 LTS, Kernel: 5.11.0-1017-aws, GCC 9.3.0 compiler. Other software: MariaDB v10.3.31, Nginix v1.18.0, PHP v7.3.30-1. Curve exchange key: TLSv1.3-secp384r1-secp384r1; TLS_AES_256_GCM_SHA384. | 9/23/2021 |
39 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | NGINX - 7x more connections/second compared to previous generation | *NGNIX 1.20.1 measured by Intel, Sept 13, 2021 using 3rd Gen Intel® Xeon® Scalable Processor (Ice Lake) based n2-standard-1 and -16 16 (us-central1-a region) instances comparing crypto performance with and without qatengine using IFMA crypto instructions. Measured with TLS version 1.2, ECDH Curves used: secp384r1, Cipher: ECDHE-RSA-AES128-GCM-SHA256 | 9/13/2021 |
40 | DTCC005 - Accelerate Your Workloads | Femi Oluwafemi | WordPress - 22-44% performance improvement over previous generation | Wordpress gen/gen gains per vCPU with crypto acceleration across a range of instance sizes: Wordpress v5.2 measured by Intel, Sept 23, 2021 using 3rd Gen Intel Xeon Scalable processor-based M6i (us-east-2 region) instances comparing to 2nd Gen Intel Xeon Scalable processor-based M5 (us-east-2 region). Common config: 16GB, 64GB, 256GB across 4, 16, 64 vCPU instance sizes. Amazon Elastic Block Store 8GB. Ubuntu 20.04.3 LTS, Kernel: 5.11.0-1017-aws, GCC 9.3.0 compiler. Other software: MariaDB v10.3.31, Nginix v1.18.0, PHP v7.3.30-1. Curve exchange key: TLSv1.3-secp384r1-secp384r1; TLS_AES_256_GCM_SHA384. | 9/23/2021 |
41 | DTCC009 - Workload-driven Performance with the Upcoming 4th Gen Intel® Xeon® Scalable Processor (formerly codenamed Sapphire Rapids) | Don Cunningham | 4th Gen Xeon processor delivers up to 4.5x images per second SSD-RN34 real-time inference with new Intel AMX (INT8) | Estimated performance comparing 4th Gen Xeon pre-production silicon vs. 40C, 270W 3rd Gen Xeon, using TensorFlow framework | |
42 | DTCC009 - Workload-driven Performance with the Upcoming 4th Gen Intel® Xeon® Scalable Processor (formerly codenamed Sapphire Rapids) | Don Cunningham | 4th Gen Xeon processor delivers up to 6x more images per second running SSD-Resnet34 real-time inference (TensorFlow) with AMX vs CPX2 (BF16) | Estimated performance comparing 4th Gen Xeon pre-production silicon vs. 28C, 250W 3rd Gen Xeon, using TensorFlow framework | Jan, 2022 |
43 | DTCC009 - Workload-driven Performance with the Upcoming 4th Gen Intel® Xeon® Scalable Processor (formerly codenamed Sapphire Rapids) | Don Cunningham | With 4th Gen Xeon and the Intel® QuickAssist Technology, running a common Webserver (NGINX) ability to manage large number of clients while allowing for performance bursts to defend against Distributed Denial of Service(DDOS) attacks is critical for SSL/TLS applications. Intel® QAT offload the Public Key Encryption enabling 50k client handshakes per sec performance, approximately 4.2x the performance of default software running on CPU cores only, while simultaneously freeing-up 4 CPU cores in the process. It is worth noting that 4th Generation Intel® Xeon Scalable CPU cores offer inherent crypto instructions which by themselves can provide gain of 3x performance compared to default software or competitive offerings. The addition of the Intel® QAT accelerator provides a greater core efficiency by offloading CPU cores while providing an additional 1.4x performance boost than even the optimized software libraries using the latest Intel® Xeon Scalable Family Crypto Instructions. | Configuration: 1-node, 2x Next Gen Intel Xeon Scalable processor (codenamed Sapphire Rapids, > 40 cores) on Intel pre-production platform with 512 GB DDR memory, HT ON, Turbo OFF, EGSDCRB1.E9I.0075.D01.2202251835, ucode 0x8e000210, 1x Intel 240G SSD, 1 x Intel® Ethernet Network Adaptor E810-CQDA2 , Ubuntu* 20.04.3 LTS, 5.15.13-051513-generic, GCC 9.4.0, workload Async NGINX 0.4.7, OpenSSL 1.1.11, 12 cores/24 threads. Test by Intel as of 03/31/2022 CPU using New Crypto Instructions & Optimizations : QAT Engine v0.6.11, Intel IPsec MB v1.1, IPP Crypto ippcp_2021.5 With Intel® QAT: QAT Engine v0.6.11, QAT20.L.2201.0.0-00028 | 3/31/2022 |
44 | DTCC009 - Workload-driven Performance with the Upcoming 4th Gen Intel® Xeon® Scalable Processor (formerly codenamed Sapphire Rapids) | Don Cunningham | Utilizing 4th Gen Intel® Xeon® Scalable processor as part of an NVMe-over-TCP target stack deployment, the CRC32C data integrity checks accelerated by Intel® DSA, compared to computing on the CPU cores without acceleration, reduces the CRC32C processing overhead achieving up to 79% higher storage I/O per second (IOPS) when making 128KB packetsize I/O requests to NVMe devices. And at the same time, Intel® DSA provides as much as 45% lower latency using the standard 'fio' Flexible I/O tester benchmark with 128KB packets. | Config with Baseline software: Test by Intel as of 03/31/2022 1-node, 2x Intel® Xeon® 2S, > 40 cores, HT /ON, Turbo ON, Total Memory 512 GB (12 slots/ 16 GB/ 4400 MHz [run @ 1800 MHz] ), EGSDCRB1.86B.0072.D01.2201101353,0x8e0001a0, Ubuntu 21.04, 5.11.0-16-generic, gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0, workload SPDK v22.01 NVMe O TCP, FIO version fio-3.29-38-g52a0, DSA configuration. 4th Gen Xeon with Intel DSA: Test by Intel as of 03/31/2022 1-node, 2x Intel® Xeon® 2S, >40 cores, HT /ON, Turbo ON, Total Memory 512 GB (12 slots/ 16 GB/ 4400 MHz [run @ 1800 MHz] ), EGSDCRB1.86B.0072.D01.2201101353,ucode 0x8e0001a0, Ubuntu 21.04, 5.11.0-16-generic, gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0,DSA driver IDXD-CONFIG-ACCEL-CONFIG-V3.4.5,1 DSA device used out of 8 available in the 2 socket system, workload SPDK v22.01 NVMe O TCP, FIO version fio-3.29-38-g52a0; versus baseline same configuration with workload SPDK v22.01 NVMe O TCP, FIO version fio-3.29-38-g52a0, DSA configuration. | 3/31/2022 |
45 | DTCC009 - Workload-driven Performance with the Upcoming 4th Gen Intel® Xeon® Scalable Processor (formerly codenamed Sapphire Rapids) | Don Cunningham | On microservices performance, we show an improvement in throughput per core (under a latency SLA of p99 <30ms) of: up to 24% comparing Ice Lake 3rd Gen Xeon to 2nd Gen Xeon; up to 69% comparing 4th Gen to 2nd Gen Xeon. | Workloads: DeathStarBench 'hotelReservation', 'socialNetwork' ( https://github.com/delimitrou/DeathStarBench) and Google Microservices demo ( https://github.com/GoogleCloudPlatform/microservices-demo) OS: Ubuntu 20.04 with kernel version v5.10, Kubernetes v1.21.0; Testing as of July 2021. 2nd Gen Xeon Measurements on 3-node Kubernetes setup on AWS M5.metal instances (2S 24 core 8259CL with 384GB DDR4 RAM and 25Gbps network) in us-west2b. 3rd Gen Xeon (codenamed Ice Lake) Measurements on 3-node 2S 32 core, 2.5GHz, 300W TDP SKU with 512GB DDR4 RAM and 40Gbps network. | Jul, 2021 |
46 | DTCC010: Data Center of the Future | Jen Huffstetler | 29 Federated international medical centers. 80K brain tumor diagnosis each year wordwide. 99% accuracy of model trained for brain tumor detection | Venture Beat: Intel partners with Penn Medicine to develop brain tumor classifier | 11-May-20 |
47 | DTCC010: Data Center of the Future | Jen Huffstetler | Xeon continue to deliver big generation gains for healthcare workloads 57% for NAMD vs previous gen 60% for GROMACS vs previous gen 64% for LAMMPs vs previous gen 61% for RELION vs previous gen | See [108] at www.intel.com/3gen-xeon-config. Results may vary | 20-Feb-21 |
48 | DTCC010: Data Center of the Future | Jen Huffstetler | 66% higher AI inference performance | See [122] at www.intel.com/3gen-xeon-config. Results may vary. | |
49 | DTCC010: Data Center of the Future | Jen Huffstetler | Up to 50% reduction in CAPEX build costs. Up to 95% reduction in cooling OPEX. Up to 10x increase in computing density with liquid immersion cooling | Source: Submer. https://submer.com/business-cases/ | 25-Mar-22 |
50 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Shawna Meyer-Ravelli | With Granulate activated, the first thing we see is the latency dropping, in this case we have a 33% drop in response time going from 33 down to 22 milliseconds. The tail latency also drops from 44 to 35 milliseconds, roughly 20% drop. Along with lower response time, the throughput increases from 114 requests per second, up to 143, an almost 30% increase. | Source: Granulate gCenter database. Obtaining relevant data from Granulate team on application version, python version, etc used in demo to demonstrate data proof points. | |
51 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | 3rd party stat: . 74% of IDC ML practitioner respondents indicate running 5 - 10 iterations of training; 50% of ML practitioner respondents rebuilt models weekly or more often; 26% rebuild daily or hourly; 56% cite cost of AI training as most significant challenge to implementing AL/ML solutions" | source: IDC Semiannual Artificial Intellilgence Tracker (2020H1) | |
52 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | "Best price performance for training deep learning models in the cloud." - AWS | Third party statement by AWS based on their assessement of price and performance for instances. | |
53 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | In reference to the Gaudi-based DL1 instance: Up to 40% better price performance than latest GPU-based instances | The price/performance claim is made by AWS and based on AWS's internal testing. Habana Labs does not control or audit third-party data. More information can be found at: habana.ai/AWS-launches-ec2-dl1-instances/. Customer claim: https://press.aboutamazon.com/news-releases/news-release-details/aws-announces-general-availability-amazon-ec2-dl1-instances | |
54 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | DL1 ResNet-50 Performance vs. A100 and V100: we compare training throughput running ResNet50 using Tensorflow. The GPU performance and configuration are reported by nVidia on DGX machines, that are similar (but not identical) to the instances offered by AWS. | Configuration of performance test: Habana ResNet50 Model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/computer_vision/Resnets/resnet_keras. Habana SynapseAI Container: https://vault.habana.ai/ui/repos/tree/General/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.7.0. Habana Gaudi Performance: https://developer.habana.ai/resources/habana-training-models/. A100 / V100 Performance Source: https://ngc.nvidia.com/catalog/resources/nvidia:resnet_50_v1_5_for_tensorflow/performance, results published for DGX A100-40G and DGX V100-32G. Results may vary. | 21-Sep |
55 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | DL1 NLP BERT Performance vs. A100 and V100: we compare training throughput running BERT using TensorFlow. (The GPU numbers are reported by Nvidia on DGX machines, that are similar (but not identifcal) to the instances offered by AWS. | Habana BERT-Large Model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/nlp/bert. Habana SynapseAI Container: https://vault.habana.ai/ui/repos/tree/General/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.7.0. Habana Gaudi Performance: https://developer.habana.ai/resources/habana-training-models/. A100 / V100 Performance Sources: https://ngc.nvidia.com/catalog/resources/nvidia:bert_for_tensorflow/performance, results published for DGX A100-40G and DGX V100-32G | 21-Sep |
56 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | Customer savings with Gaudi-based Amazon DL1 instances ResNet-50 $/image throughput cost: DL1 - 46% lower than A100-based P4d DL1 - 60% lower than V100-based P3 BERT-Large Pre-Training Phase-1 $/sequence throughphput cost: DL1 - 31% lower than A100-based P4d DL1 - 54% lower than V100 -based P3 BERT-Large Pre-Training Phase-2 $/sequence throughput cost: DL1 - 57% lower than A100-based P4d DL1 - 75% lower than A100-based P3 | Cost savings based on Amazon EC2 On-Demand pricing for P3, P4d and DL1 instances respectively. Habana ResNet50 Model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/computer_vision/Resnets/resnet_keras Performance data was collected and measured using the following resources. Results may vary. Habana BERT-Large Model: https://github.com/HabanaAI/Model-References/tree/master/TensorFlow/nlp/bert. Habana SynapseAI Container: https://vault.habana.ai/ui/repos/tree/General/gaudi-docker/1.2.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.7.0. Habana Gaudi Performance: https://developer.habana.ai/resources/habana-training-models/. A100 / V100 Performance: https://ngc.nvidia.com/catalog/resources/nvidia:bert_for_tensorflow/performance, https://ngc.nvidia.com/catalog/resources/nvidia:resnet_50_v1_5_for_tensorflow/performance, results published for DGX A100-40G and DGX V100-32G | |
57 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | DL1 Cost savings calculated by Leidos in conducting POC on medical imaging workloads; Cost savings of 59% with DL1 on ChexNET-Keras model | Source: Leidos Configuration: Pre-training model: CheXNet-Keras; Dataset: ChestXray - NIHCC; batch size: 32; Precision: FP32; Device count: 8 Gaudi-based DL1.24xlarge instances vs. 8x V100-32 GB (p3dn.24xlarge) | |
58 | DTCC013 - Reaping Business Advantages with Affordable Deep Learning Training | Sree Ganesan | DL1 Cost savings calculated by Leidos in conducting POC on medical imaging workloads; cost savings of 67% with DL1 on COVID-CXNet | Source: Leidos Configuration: Pre-training model: COVID-CXNet; Dataset: COVID-CXNet; Batch size: 16; Precision: BF16; Device count: 1; | |
59 | MOD003 - High Performance Computing Solutions for Biomedical Research | Marcus Piper, Jon Bach (external) | Feature available on select CPU SKUs when paired with the W680 PCH. ECC routing supported in 4L for all DDR4 and DDR5 configurations | ||
60 | MOD003 - High Performance Computing Solutions for Biomedical Research | Marcus Piper, Jon Bach (external) | Feature available on select CPU SKUs when paired with the W680 PCH. ECC routing supported in 4L for all DDR4 and DDR5 configurations | ||
63 | MOD010 - Small businesses deserve more - Affordable deployment, security, and manageability now extends to PCs anywhere | Kate Porter, Thomas Koll (external) | For details on performance claims, learn more at www.intel.com/PerformanceIndex (processors - Intel(R) CoreTM Processors & connectivity). | ||
64 | MOD010 - Small businesses deserve more - Affordable deployment, security, and manageability now extends to PCs anywhere | Kate Porter, Thomas Koll (external) | Revolutionary performance and elevated connections with the best Wi-Fi technologies for video conferencing. 4 | Yahoo, Consumer needs and expectations for home wifi | Sep-21 |
65 | MOD010 - Small businesses deserve more - Affordable deployment, security, and manageability now extends to PCs anywhere | Kate Porter, Thomas Koll (external) | Column 4 should now read: See www.intel.com/PerformanceIndex (platforms) for details. No product or component can be absolutely secure | The Intel vPro platform delivers the first and only silicon-enabled AI threat detection to help stop ransomware and Cryptomining attacks for Windows-based systems. See www.intel.com/PerformanceIndex (platforms) for details. No product or component can be absolutely secure. | |
66 | MOD010 - Small businesses deserve more - Affordable deployment, security, and manageability now extends to PCs anywhere | Kate Porter, Thomas Koll (external) | Revolutionary performance and elevated connections with the best Wi-Fi technologies for video conferencing. 4 | Yahoo, Consumer needs and expectations for home wifi | 21-Sep |
68 | NET012 Cloud-to-Edge Programmable Networks | Ed Doe | up to 2.5X more requests per connection using EN + INT | Transaction (RPC) rate using netperf TCP_RR benchmark running on network equipped with 6 x Arista 7170-32CD-C32 switches in a 2 TOR and 4 spine configuration. 13720 transactions per second with TCP-INT / 5325 transactions per second with DCTCP = 2.576 more requests per connection. | Based on Intel performance measurements as of April 22, 2022 and may not reflect all publicly available updates. |
Number | Session | Speaker | Claim | Claim Details/Citation | Testing Date |
---|---|---|---|---|---|
1 | AI for Building a Trustful Media Ecosystem | Ilke Demir | We can run up to 72 concurrent deepfake detection streams on Intel 3rd gen scalable processor. | We can run up to 72 concurrent deepfake detection streams on Intel 3rd gen scalable processor. Claim Details/Citation: Configurations: 1-node, 2x Intel® Xeon® Platinum 8380 on Intel® Reference platform with 384 GB (16 slots / 32 GB / 3200) total memory, ucode0xd000280, HT on, Turbo on, with Ubuntu 20.04.02 LTS, 5.4.0-73-generic, OpenVINO™ 2021.2 and OpenVINO™ DL Streamer 1.3. | N/A |
2 | Improving Sustainability in the Network | Muhammad Siddiqui | Reduction in power consumption of about 30% per server. | Based on findings from Intracom, because power consumption is directly related to the frequency changes, the reduction in average power consumption from 610W to 430W represents around 30% power savings per server. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. | N/A |
3 | Human-Robot Collaboration: Task Co-execution | Javier Felip Leon | We show how to accelerate ABC (5 orders of magnitude) to run at interactive frame rates on desktop processors. | See the materials located at http://arxiv.org/abs/2205.08657 | N/A |
4 | Human-Robot Collaboration: Task Co-execution | Javier Felip Leon | The neural surrogate is able to sample and evaluate 9,100 samples in 0.01s instead of 1365s using the original single-threaded physics simulation generative model. This is more than a 100,000x improvement. | See the materials located at http://arxiv.org/abs/2205.08657 | N/A |
5 | Mount Evans reduces Kubernetes overhead | Kelley Mullick | Showcasing the performance differences obtained by moving the microservices load balancer to the Intel® IPU ES2000 (aka Mount Evans). Measuring 30% performance improvement between Ice Lake and Sapphire Rapids. On Sapphire Rapids, we measure a 30% performance increase when the load balancer is moved from the CPU to the IPU (exact data being finalized) | For backup, please visit https://cdrdv2.intel.com/v1/dl/getContent/730671 | Apr-22 |
6 | Demo E2E Challenge | Demo Staff | Lower time to completion for End-to-End AI workflows on 3rd Gen Xeon versus Nvidia Ampere A100 and AMD EPYC for DLSA | 3rd Gen Intel Xeon Platinum 8380 CPU: 2x 3rd Gen Intel Xeon Platinum 8380 with 512GB (16 slots/ 32GB/ 3200MHz) total DDR4 memory, microcode 0xd0002b1, HT off, Turbo on, Ubuntu 20.04 LTS, 5.4.0-84-generic kernel, 1x Intel 960GB SSD, Intel® Extension for PyTorch v1.8.1, Transformers 4.6.1, MKL 2021.3.0, Bert-large-uncased (https://huggingface.co/bert-large-uncased) model, BS=1 per instance, 20 instances/node, 4 cores/instance, test by Intel on 09/17/2021. Nvidia Ampere A100 GPU: Nvidia Ampere A100 GPU hosted on 2x AMD EPYC 7742 CPU with 1024GB (16 slots/ 64GB/ 3200MHz) total DDR4 memory, microcode 0x8301034, HT off, Turbo on, Ubuntu 20.04 LTS, 5.4.0-80-generic kernel, 1x SAMSUNG 3.5TB SSD, PyTorch 1.8.1, Transformers 4.6.1, CUDA 11.1, Bert-large-uncased (https://huggingface.co/bert-large-uncased) model, BS=1 per instance, 7 total instances with MIG enabled, test by Intel on 09/22/2021 | 09/17/2021 and 09/22/2021 |
7 | Demo E2E Challenge | Demo Staff | Lower time to completion for End-to-End AI workflows on 3rd Gen Xeon versus Nvidia Ampere A100 and AMD EPYC for DIEN | 3rd Gen Intel Xeon Platinum 8380 CPU: 1-node, 2x 3rd Gen Intel Xeon Platinum 8380 on Coyote Pass with 512 GB (16 slots/ 32GB/ 3200) total DDR4 memory, microcode 0xd0002b1, HT off, Turbo on, Ubuntu 20.04 LTS,5.4.0-84-generic, 1x Intel 960GB SSD OS Drive, Modin 0.10.2, Intel-tensorflow-avx512 2.6.0, oneDNN v2.3 , test by Intel on 09/29/2021 Nvidia Ampere A100 GPU: 1-node, 2x AMD EPYC 7742 on Nvidia DGXA100 920-23687-2530-000 utilizing 1x A100 GPU with 1024 GB (16 slots/ 64GB/ 3200) total DDR4 memory, microcode 0x8301034, HT OFF, Turbo on Ubuntu 20.04 LTS,5.4.0-84-generic , 1x SAMSUNG 3.5TB SSD OS Drive, Modin 0.10.2, tensorflow 2.6.0+nv, CUDA 11.4, test by Intel on 09/29/2021 | 09/29/2021 |
8 | Demo E2E Challenge | Demo Staff | Single socket server with 3rd gen Xeon Scalable general purpose cpu can finishes the End to End - Single Cell -Genomics sequencing in 489 seconds compared with 686 seconds an Nvidia A100 GPU. This means 3rd gen Xeon Scalable is 1.4x faster than Nvidia A100, that equates to over 1.6x better TCO. Single socket server with the next gen general purpose cpu can finish the End to End - Single Cell -Genomics sequencing workload in 370 seconds compared with 686 seconds an Nvidia A100 GPU. This means we can deliver nearly 2x the performance of Nvidia's mainstream Training GPU for 2022. | Baseline Testing as of Dec16th 2020. Google Cloud instance a2-highgpu-1g, 1x Tesla A100 GPU, 40GB HBM2 Memory, 12 vCPUs, $3.78 cost per hour, dedicated access, Single-cell RNA-seq of 1.3 Million Mouse Brain Cells using SCANPY 1.8.1 Toolkit, score= 686 seconds to compete, total cost to complete $0.70. source: https://github.com/clara-parabricks/rapids-single-cell-examples#example-2-single-cell-rna-seq-of-13-million-mouse-brain-cells New-1: Testing as of Feb 5th 2022. Google Cloud instance n2-standard-64, 3rd Gen Intel Xeon Scalable 64vCPUs, 256GB Memory, 257GB Persistant Disk, NIC bandwidth 32Gbps, $3.10 cost per hour dedicated access, Rocky Linux 8.5, Linux version 4.18.0-240.22.1.el8_3.crt6.x86_64,Single-cell RNA-seq of 1.3 Million Mouse Brain Cells using SCANPY 1.8.1 Toolkit, score= 489.1 seconds to compete, total cost to complete $0.42 New -2: Testing as of Jan 20th 2022. 1-node, 1x Next Gen Intel Xeon Scalable processor (codenamed Sapphire Rapids, > 40 cores) on Intel pre-production platform with 512 GB DDR memory (8(1DPC)/64GB/4800 MT/s), HT on, Turbo on, CentOS Linux 8.3, internal pre-production bios, Single-cell RNA-seq of 1.3 Million Mouse Brain Cells using SCANPY 1.8.1 Toolkit, score= 370.2 seconds to compete. | |
9 | Demo E2E Challenge | Demo Staff | On Census training+inference, Xeon 8380 >5x faster on ML vs DGX A100 (utilizing 1xA100). | 3rd Gen Intel Xeon Platinum 8380 CPU: 2x 3rd Gen Intel Xeon Platinum 8380 with 512GB (16 slots/ 32GB/ 3200MHz) total DDR4 memory, microcode 0x8d055260, HT on, Turbo on, Ubuntu 20.04.2 LTS, 5.4.0-65-generic kernel, 1x INTEL SSDSC2KG960G8, Python 3.7.9, Modin 0.8.3, Omniscidbe v5.4.1, scikit-learn v0.24.1 accelerated by daal4py v2021.2, test by Intel on 03/15/2021. Nvidia Ampere A100 GPU : Nvidia Ampere A100 GPU hosted on 2x AMD EPYC 7742 CPU with 512GB (16 slots/ 32GB/ 3200MHz) total DDR4 memory, microcode 0x8301034, HT on, Turbo on, Ubuntu 18.04.5 LTS, 5.4.0-42-generic kernel, 1x SAMSUNG 3.5TB SSD, Python 3.7.9, RAPIDS0.17, cuDF 0.17, cuML 0.17, scikit-learn v0.24.1, CUDA 11.0.221, test by Intel on 02/04/2021. Census Data [21721922, 45]: Dataset is from IPUMS USA, University of Minnesota, www.ipums.org [Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas and Matthew Sobek. IPUMS USA: Version 10.0 [dataset]. Minneapolis, MN: IPUMS, 2020. https://doi.org/10.181 | |
10 | Demo E2E Challenge | Demo Staff | The Payoff: Higher Performance/$ | 3rd Gen Intel Xeon Platinum 8380 CPU: 2x 3rd Gen Intel Xeon Platinum 8380 with 512GB (16 slots/ 32GB/ 3200MHz) total DDR4 memory, microcode 0x8d055260, HT on, Turbo on, Ubuntu 20.04.2 LTS, 5.4.0-65-generic kernel, 1x INTEL SSDSC2KG960G8, Python 3.7.9, Modin 0.8.3, Omniscidbe v5.4.1, scikit-learn v0.24.1 accelerated by daal4py v2021.2, test by Intel on 03/15/2021. Nvidia Ampere A100 GPU: Nvidia Ampere A100 GPU hosted on 2x AMD EPYC 7742 CPU with 512GB (16 slots/ 32GB/ 3200MHz) total DDR4 memory, microcode 0x8301034, HT on, Turbo on, Ubuntu 18.04.5 LTS, 5.4.0-42-generic kernel, 1x SAMSUNG 3.5TB SSD, Python 3.7.9, RAPIDS0.17, cuDF 0.17, cuML 0.17, scikit-learn v0.24.1, CUDA 11.0.221, test by Intel on 02/04/2021. Census Data [21721922, 45]: Dataset is from IPUMS USA, University of Minnesota, www.ipums.org [Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas and Matthew Sobek. IPUMS USA: Version 10.0 [dataset]. Minneapolis, MN: IPUMS, 2020. https://doi.org/10.18128/D010.V10.0] Disclaimer (for pricing only): System pricing is based on an average of comparable configurations as the test systems as priced on www.colfax-intl.com and www.thinkmate.com on September 20, 2021. 4U rackmount systems used for 3rd Gen Intel® Xeon® Scalable 8380 processors: Thinkmate GPX XN6-24S3-10GPU and Colfax CX41060s-XK8. 4U rackmount servers used for AMD EPYC 7742 with Nvidia A100 GPU: Thinkmate GPX QT24-24E2-8GPU and Colfax CX4860s-EK8. See www.colfax-intl.com and www.thinkmate.com for more details. | |
11 | Demo E2E Challenge | Demo Staff | Processing times speedup with Intel optimized scikit-learn. | Processing Times Speedup with Intel Optimized Scikit-learn: Azure-US-West, Standard_F16s_V2, 16 vCPUs, 1 instance, Platinum 8168 @ 2.70 GHz / Platinum 8272CL @ 2.60 GHz, 32GB Memory Capacity/Instance, Direct Attached Storage, Ubuntu 18.04.5 LTS, 5.4.0-1051-azure, Databricks 9.0 ML Runtime, Stock scikit-learn-0.22.1 vs Intel scikit-learn-0.24.2, Tested by Intel on 23-September-2021 | |
12 | Demo E2E Challenge | Demo Staff | Processing times speedup with Intel optimized TensorFlow/BERT-large. | Baseline: Processing Times Speedup: Intel Optimized TensorFlow/BERT-large: Azure-US-West, Standard_F32s_V2, 32 vCPUs, 1 instance, Platinum 8168 @ 2.70 GHz / Platinum 8272CL @ 2.60 GHz, 64GB Memory Capacity/Instance, Direct Attached Storage, Ubuntu 18.04.5 LTS, 5.4.0-1051-azure, Databricks 9.0 ML Runtime, Stock TensorFlow 2.3.1 vs Intel TensorFlow 2.3.0, Tested by Intel on 23-September-2021 New: Processing Times Speedup: Intel Optimized TensorFlow/BERT-large: Azure-US-West, Standard_F64s_V2, 64 vCPUs, 1 instance, Platinum 8168 @ 2.70 GHz / Platinum 8272CL @ 2.60 GHz, 128GB Memory Capacity/Instance, Direct Attached Storage, Ubuntu 18.04.5 LTS, 5.4.0-1051-azure, Databricks 9.0 ML Runtime, Stock TensorFlow 2.3.1 vs Intel TensorFlow 2.3.0 New: Processing Times Speedup: Intel Optimized TensorFlow/BERT-large: Azure-US-West, Standard_F72s_V2, 72 vCPUs, 1 instance, Platinum 8168 @ 2.70 GHz / Platinum 8272CL @ 2.60 GHz, 144GB Memory Capacity/Instance, Direct Attached Storage, Ubuntu 18.04.5 LTS, 5.4.0-1051-azure, Databricks 9.0 ML Runtime, Stock TensorFlow 2.3.1 vs Intel TensorFlow 2.3.0, Tested on 23-September-2021 | |
13 | Demo: Leadership Performance on Xe-HPC | Demo Staffer | Ponte Vecchio outperforms the competition in Financial Services by 2.6x on Binomial Options, 1.9x on Black Scholes, and 1.7x on Monte Carlo | Testing as of 2/14/2022 Intel Platform: 1-node 1x Intel® Xeon® 6336Y, HT On, Turbo Off, total memory 128GB DDR, BIOS Version WLYDCRB1.SYS.0021.P16.2105280638, Ubuntu 20.04, Linux Version 5.10.54+pvc-xtb-po67perf, Ucode 0x8d0002c1, 1x Intel pre-production Ponte Vecchio GPU ; Competing Platform: 1-node 2x Intel® Xeon® 8360Y, HT On, Turbo On, total memory 256GB DDR, BIOS Version SE5C6200.86B.0022.D08.2103221623, Ubuntu 21.10, Linux Version 5.13.0-27-generic, Ucode 0xd0002a0, 1x NVIDIA A100 80GB PCIe ; Intel Binomial Options Build notes: Tools: Intel oneAPI 2022.1, Build knobs: -g -fdebug-info-for-profiling -gline-tables-only -fsycl-targets=spir64_gen -Xsycl-target-backend "-device 0x0bd5 -revision_id 3" -O3 -fp-model precise -std=c++17 -flto -o binomial.sycl.gpu.precise -I -lpthread Intel Black- Scholes Build notes: Tools: Intel oneAPI 2022.1, Build knobs: -g -O2 -I/opt/intel/opencl/include/ -L/opt/intel/opencl/lib64/ -ltbb -ltbbmalloc -lOpenCL Intel Monte Carlo Build notes: Tools: Intel oneAPI 2022.1, Build knobs: -DUSE_VML=0 -DUSE_MCG59 -DVEC_SIZE=8 -DMKL_ILP64 -Iinclude -I"${MKLROOT}/include" -L"${MKLROOT}/lib/intel64" -lpthread -lmkl_core -lmkl_intel_ilp64 -lmkl_sequential -lm -ldl -fsycl -fsycl-unnamed-lambda -O /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_sycl.a Competing Platform Binomial Options Build notes: Tools: CUDA SDK 11.4, Build knobs: -I../../common/inc -m64 --threads 0 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 Competing Platform Black-Scholes Build notes: Tools: CUDA SDK 11.4, Build knobs -ccbin g++ -I/usr/local/cuda-11.4/samples/common/inc -m64 -maxrregcount=16 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 Competing Platform Monte Carlo Build notes: Tools: CUDA SDK 11.4, Build knobs: -ccbin g++ -I/usr/local/cuda-11.4/samples/common/inc -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 | |
14 | Leadership Performance on MLPerf Benchmarks | Demo Staffer | 3rd Gen Intel® Xeon® Scalable processor outperforms AMD EYPC by 1.2x 4th Gen Intel® Xeon® Scalable processor outperforms AMD EYPC by 1.9x 4TH Gen Intel® Xeon® Scalable processors using mixed precision with FP32 and BFloat16 enabled with AMX & TMUL Instructions shows the scaling improvement of up to 3.2X over the AMD EYPC 4th Gen Intel® Xeon® Scalable processor outperforms NVDIA A100 by 1.3x | "BASELINE: Test by Intel as of 04/07/2022. 1-node, 2x AMD EPYC 7763, 64 cores, HT On, Turbo Off, Total Memory 512 GB (16 slots/ 32 GB/ 3200 MHz, DDR4), BIOS AMI 1.1b, ucode 0xa001144, OS Red Hat Enterprise Linux 8.5 (Ootpa), kernel 4.18.0-348.7.1.el8_5.x86_64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), MLPerf HPC-AI v0.7, DeepCAM DeepLabv3+, torch1.11.0a0+git13cdb98 AVX2, torch-1.11.0a0+git13cdb98-cp38-cp38-linux_x86_64.whl, torch_ccl-1.2.0+44e473a-cp38-cp38-linux_x86_64.whl, intel_extension_for_pytorch-1.10.0+cpu-cp38-cp38-linux_x86_64.whl (AVX-2), Intel MPI 2021.5, Python3.8, score=560 secs/1024 samples Test by Intel as of 04/07/2022. 1-node, 2x Intel® Xeon® Platinum 8380 processor, 40 cores, HT On, Turbo Off, Total Memory 512 GB (16 slots/ 32 GB/ 3200 MHz, DDR4), BIOS SE5C6200.86B.0022.D64.2105220049, ucode 0xd0002b1, OS Red Hat Enterprise Linux 8.5 (Ootpa), kernel 4.18.0-348.7.1.el8_5.x86_64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), MLPerf HPC-AI v0.7, DeepCAM DeepLabv3+, torch1.11.0a0+git13cdb98 AVX-2, torch-1.11.0a0+git13cdb98-cp38-cp38-linux_x86_64.whl, torch_ccl-1.2.0+44e473a-cp38-cp38-linux_x86_64.whl, intel_extension_for_pytorch-1.10.0+cpu-cp38-cp38-linux_x86_64.whl (AVX-512), Intel MPI 2021.5, Python3.8, score=470 secs/1024 samples Test by Intel as of 05/05/2022. 1-node, 2x 4th gen Intel® Xeon® Sapphire processor (codenamed Sapphire Rapids, > 40 cores), HT On, Turbo Off, Total Memory 512 GB (16 slots/ 32 GB/ 4800 MHz, DDR5), BIOS EGSDCRB1.86B.0078.D10.2204072027, ucode 0x8f000320, OS CentOS Stream 8, kernel 5.15.0-spr.bkc.pc.4.24.0.x86_64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), MLPerf HPC-AI v0.7, DeepCAM DeepLabv3+, torch1.11.0a0+git13cdb98 AVX-512, FP32, torch-1.11.0a0+git13cdb98-cp38-cp38-linux_x86_64.whl, torch_ccl-1.2.0+44e473a-cp38-cp38-linux_x86_64.whl, intel_extension_for_pytorch-1.10.0+cpu-cp38-cp38-linux_x86_64.whl (AVX-512), Intel MPI 2021.5, Python3.8, score=299 secs/1024 samples Test by Intel as of 04/13/2022. 1-node, 2x Intel® Xeon® Platinum 8360Y, 36 cores, HT On, Turbo On, Total Memory 256 GB (16 slots/ 16 GB/ 3200 MHz), Nvidia GPU A100, 80GB HBM, PICe ID 20B5, BIOS AMI 1.1b, ucode 0xd000311, OS Red Hat Enterprise Linux 8.4 (Ootpa), kernel 4.18.0-305.el8.x86_64, compiler gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), ), MLPerf HPC-AI v0.7, DeepCAM DeepLabv3+, pytorch1.11.0 py3.7_cuda11.3_cudnn8.2.0_0, cudnn 8.2.1, cuda11.3_0, intel-openmp 2022.0.1 h06a4308_3633, python3.7, score=233 secs/1024 samples Test by Intel as of 05/05/2022. 1-node, 2x 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids, > 40 cores), HT On, Turbo Off, Total Memory 512 GB (16 slots/ 32 GB/ 4800 MHz, DDR5), BIOS EGSDCRB1.86B.0078.D10.2204072027, ucode 0x8f000320, OS CentOS Stream 8, kernel 5.15.0-spr.bkc.pc.4.24.0.x86_64, compiler gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), MLPerf HPC-AI v0.7, DeepCAM DeepLabv3+, torch1.11.0a0+git13cdb98 AVX-512 FP32, AMX, BFloat16 Enabled, torch-1.11.0a0+git13cdb98-cp38-cp38-linux_x86_64.whl, torch_ccl-1.2.0+44e473a-cp38-cp38-linux_x86_64.whl, intel_extension_for_pytorch-1.10.0+cpu-cp38-cp38-linux_x86_64.whl (AVX-512), Intel MPI 2021.5, Python3.8, score=176 secs/1024 samples Test by Intel as of 04/09/2022. 16-nodes Cluster, 1-node, 2x 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids, > 40 cores), HT On, Turbo On, Total Memory 256 GB (16 slots/ 16 GB/ 4800 MHz, DDR5), BIOS Intel SE5C6301.86B.6712.D23.2111241351, ucode 0x8d000360, OS Red Hat Enterprise Linux 8.4 (Ootpa), kernel 4.18.0-305.el8.x86_64, compiler gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), MLPerf HPC-AI v0.7, DeepCAM DeepLabv3+, torch1.11.0a0+git13cdb98 AVX-512, FP32, torch-1.11.0a0+git13cdb98-cp38-cp38-linux_x86_64.whl, torch_ccl-1.2.0+44e473a-cp38-cp38-linux_x86_64.whl, intel_extension_for_pytorch-1.10.0+cpu-cp38-cp38-linux_x86_64.whl (AVX-512), Intel MPI 2021.5, Python3.8, 16-Node score=30 secs/1024 samples MLPerf™ HPC-AI v0.7 Training benchmark DeepCAM Performance. Result not verified by MLCommons Association. Unverified results have not been through an MLPerf™ review and may use measurement methodologies and/or workload implementations that are inconsistent with the MLPerf™ specification for verified results. The MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information" | |
15 | Performance delivered for Memory Hungry Workloads | Demo Staffer | 4th Gen Intel® Xeon® Scalable processors, codenamed Sapphire Rapids processor with high bandwidth memory (HBM) outperforms AMD Milan by 2.8x and AMD Milan-X by 2.1x Also 4th Gen Intel® Xeon® Scalable processors, codenamed Sapphire Rapids processor with high bandwidth memory (HBM) outperforms 3 rd Gen Xeon Scalable by 2.8x | Test by Intel as of 01/26/2022. 1-node, 2x Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Family 6 Model 106 Stepping 6), 80 cores, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode revision=0xd000270, Rocky Linux 8.5 , Linux version 4.18.0-240.22.1.el8_3.crt6.x86_64, OpenFOAM® v1912, Motorbike 28M @ 250 iterations; Build notes: Tools: Intel Parallel Studio 2020u4, Build knobs: -O3 -ip -xCORE-AVX512 Test by Intel as of 01/26/2022. 1-node, 2x AMD EPYC 7763 64-Core Processor @ 2.45GHz (Family 25 Model 1 Stepping 1), 128 cores, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version 2.1, ucode revision=0xa00111d, Rocky Linux 8.5 , Linux version 4.18.0-240.22.1.el8_3.crt6.x86_64, OpenFOAM® v1912, Motorbike 28M @ 250 iterations; Build notes: Tools: Intel Parallel Studio 2020u4, Build knobs: -O3 -ip -xCORE-AVX2 Test by Microsoft® Azure as of 11/08/21. 1-node, 2x AMD EPYC 7V73X on Azure HBv3, 128 cores (120 available), HT Off, Total Memory 448 GB, CentOS 8.1 HPC Image, GNU compiler 9.2.0, OpenFOAM® v1912, Motorbike 28M @ 250 iterations Test by Intel as of 01/26/2022. 1-node, 2x 4th Gen Intel Xeon Scalable processor (codenamed Sapphire Rapids, > 40 cores), HT On, Turbo On, Total Memory 512 GB (16x32GB 4800MT/s, Dual-Rank), preproduction platform and BIOS, Red Hat Enterprise Linux 8.4 , Linux version 4.18.0-305.el8.x86_64, OpenFOAM® v1912, Motorbike 28M @ 250 iterations; Build notes: Tools: Intel Parallel Studio 2020u4, Build knobs: -O3 -ip -xCORE-AVX512 Test by Intel as of 01/26/2022. 1-node, 2x 4tht Gen Intel® Xeon® Scalable (code Sapphire Rapids > 40) Plus HBM, HT Off, Turbo Off, Total Memory 128 GB (HBM2e at 3200 MHz), preproduction platform and BIOS, CentOS 8, Linux version 5.12.0-0507.intel_next.06_02_po.5.x86_64+server, OpenFOAM® v1912, Motorbike 28M @ 250 iterations; Build notes: Tools: Intel Parallel Studio 2020u4, Build knobs: -O3 -ip -xCORE-AVX512 OPENFOAM® is a registered trade mark of OpenCFD Limited, producer and distributor of the OpenFOAM software via www.openfoam.com. | |
16 | Performance delivered for Memory Hungry Workloads | Demo Staffer | 4th Gen Intel® Xeon® Scalable processors, codenamed Sapphire Rapids processor with high bandwidth memory (HBM) outperforms AMD Milan by 1.99x | Baseline: Test by Intel as of 05/03/y22. 1-node, 2x AMD EPYC 7773X 64-Core Processor @ 2.2GHz (Family 25, Model 1, Stepping 2), 128 cores, HT On, Turbo On, Total Memory 1024 GB (16x64GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001224, CentOS Stream 8, Linux version 4.18.0-383.el8.x86_64, WRF v4.2.2, CONUS-2.5km workload, score=2.9711 seconds per timestep; CONUS-12km workload, score=0.2465 seconds per timestep Milan 7763: Test by Intel as of 05/03/y22. 1-node, 2x AMD EPYC 7763 64-Core Processor @ 2.45GHz (Family 25, Model 1, Stepping 1), 128 cores, HT Off, Turbo On, Total Memory 1024 GB (16x64GB 3200MT/s, Dual-Rank), BIOS Version M10, ucode revision=0xa001144, CentOS Stream 8, Linux version 4.18.0-383.el8.x86_64, WRF v4.2.2, CONUS-2.5km workload, score=3.5416 seconds per timestep; CONUS-12km workload, score=0.2930 seconds per timestep Test by Intel as of 05/03/y22. 1-node, 2x Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Family 6 Model 106 Stepping 6), 80 cores, HT On, Turbo On, Total Memory 256 GB (16x16GB 3200MT/s, Dual-Rank), BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode revision=0xd000270, Rocky Linux 8.5, Linux version 4.18.0-348.23.1.el8_5.crt1.x86_64, WRF v4.2.2, CONUS-2.5km workload, score=3.8846 seconds per timestep; CONUS-12km workload, score=0.3483 seconds per timestep Test by Intel as of 05/03/y22. 1-node, 2x 4th Gen Intel Xeon Scalable processor (codenamed Sapphire Rapids, > 40 cores), HT On, Turbo On, Total Memory 1024 GB (16x64GB 4800MT/s, Dual-Rank), BIOS Version EGSDCRB1.SYS.0077.D01.2203211346, ucode revision=0xf0002e1, Ubuntu 22.04 LTS, Linux version 5.15.0-25-generic, WRF v4.2.2, CONUS-2.5km workload, score=2.4744 seconds per timestep; CONUS-12km workload, score=0.2265 seconds per timestep Test by Intel as of 05/03/22. 1-node, 2x 4th Gen Intel® Xeon® processor, codenamed Sapphire Rapids with High Bandwidth Memory, >40 cores, HT ON, Turbo ON with ITP recipe applied with power limit of 380W/socket), Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version EGSDCRB1.86B.0077.D11.2203281354, ucode revision=0x83000200, CentOS Stream 8, Linux version 5.16.0-0121.intel_next.1.x86_64+server, WRF v4.2.2,CONUS-2.5km workload, score=1.7803 seconds per timestep; CONUS-12km workload, score=0.1876 seconds per timestep | |
17 | MOD010 - Small businesses deserve more - Affordable deployment, security, and manageability now extends to PCs anywhere | Kate Porter, Thomas Koll (external) | Intel vPro only business platform with built-in hardware security to detect ransomware and software supply chain attacks. 5 | ||
18 | MOD010 - Small businesses deserve more - Affordable deployment, security, and manageability now extends to PCs anywhere | Kate Porter, Thomas Koll (external) | For details on performance claims, learn more at www.Intel.com/PerformanceIndex (processors - Intel(R) CoreTM Processors & connectivity). | ||
19 | MOD010 - Small businesses deserve more - Affordable deployment, security, and manageability now extends to PCs anywhere | Kate Porter, Thomas Koll (external) | The Intel vPro platform delivers the first and only silicon-enabled AI threat detection to help stop ransomware and Cryptomining attacks for Windows-based systems. See www.Intel.com/PerformanceIndex (platforms) for details. No product or component can be absolutely secure. | ||
20 | Demo | Demo Staffer | ATSM-150 outperforms NVIDIA A10 for media analytics by 1.48x with AVC and 1.14x with HEVC | Intel Platform: 2S Intel® Xeon® 6342, 256GB DDR4-3200, Ubuntu 20.04 Kernel 5.10.54+prerelease features hosting 1x ATSM-150. Media Delivery and Media Analytics Solution Stacks: Agama 407 running HEVC Decode and ResNet50 v1.5. Tested by Intel as of 5/1/2022. NVIDIA Platform: 2S AMD EPYC 7742, 256GB DDR4-3200, Ubuntu 20.04 hosting 1x NVIDIA A10. Media Delivery and Media Analytics Solution Stacks: Deepstream 6.0 NGC Container running HEVC Decode and ResNet50 v1.5. Tested by Intel as of 3/30/2022. | 3/30/2022 and 5/1/2022 |