Intel® System Debugger User Guide

ID 648476
Date 06/13/2024
Confidential
Document Table of Contents

Intel(R) Crash Log Framework

Overview

The Intel(R) Crash Log Framework provides the ability to create or update software to work with Intel(R) Crash Log Technology used to help in the triage and failure analysis process.

Intel(R) Crash Log Technology captures and preserves hardware state so that it survives a set of platform reset types. It creates a crash log that can be used by humans and external systems.

The Intel(R) Crash Log Framework includes software components to decode and analyze the log, providing immediate natural language diagnostics of the crash. Crash data survives resets, so System Firmware can notify the Operating System and save the log to permanent storage.

Below is a diagram showing the Crash Log flow and usage.

../../_images/flow.png

Frequently Asked Questions

Am I required to enable Crash Log?

The Crash Log collection is enabled by default on the supported platforms. Just extract the data after the crash, or view the Crash record extracted by the OS-based extraction component.

Crash Log feature can be enabled or disabled in the System Firmware settings.

Are Crash Logs stored for all crashes?

No. The exact details depend on the Crash Log capabilities specific to the Intel architecture based platform.

Do Crash Logs persist for all crashes?

No. Unfortunately some scenarios trigger a power-cycle and no log of the crash is saved.

What is in a Crash Log Report?

The Crash Log Report contains data and analysis. The scope of the data is specific to the platform. In general, the trigger of the crash, state of platform-based components, and analysis information to help diagnose the crash are included in the report.

How do I get the Crash Log?

The exact procedure for extracting a Crash Log into a file depends on the platform model, firmware support and operating system.

  • On early platform boot phases or when the firmware does not support the Crash Log Extraction, the Crash Log may be obtained by reading the Crash Log hardware storage using out-of-band extractions methods (JTAG, eSPI, …).

  • When the firmware has already extracted the Crash Log from the hardware storage, the Crash Log may be obtained by reading the BERT structure of the platform ACPI tables (in-band extraction).

For more details about the Crash Log extraction procedures for a specific platform, please contact your Intel representative.

Prerequisites

This section discusses the prerequisites for obtaining a Crash Log.

Platform

Platform must support Crash Log as a platform feature. When available, the feature is enabled by default but may be disabled by the IAFW.

Extraction
  • Debug/Test Connection [1] (e.g., Intel(R) Direct Connect Interface (Intel(R) DCI))

  • Debug/Test Software Framework [1] (e.g., OpenIPC)

Decoding

Installation

The Intel(R) Crash Log Framework is distributed as a wheel file. The package contains the decoder engine and the platform-specific collateral. Before installing the package, make sure a Python* Wheel package is installed in the system:

$ pip install wheel

The installation of the package can then be done by using pip. The following command automatically installs the required dependencies and overrides any previously installed versions of the decoder.

$ pip install intel.crashlog-...200-py3-none-any.*

If used behind a proxy, pip needs to be configured to use it with the --proxy=https://user@proxy_​addr:port option.

Verifying Setup and Usage

This section provides examples of expected behavior of the Intel(R) Crash Log Framework.

Note: The Intel® Crash Log Framework package installs the intel_​crashlog executable in the <PYTHON_​HOME>Scripts directory on Windows* and <PYTHON_​HOME>/bin on Linux* and macOS*. It is recommended to add this directory to the PATH environment variable.

If the package is correctly installed, the following command should print the version number of the Intel(R) Crash Log Framework:

$ intel_crashlog --version
Intel(R) Crash Log Framework, Version 3.x Beta
Copyright (C) 2017-2019 Intel Corporation. All rights reserved.

The command-line tool consists of the intel_​crashlog executable. It takes as parameter a Crash Log file and an action to perform on this input.

$ intel_crashlog <command> <crashlog_file>

The complete list of supported commands can be obtained by using:

$ intel_crashlog --help

Some Crash Log samples are provided in the framework. They are installed in the Python’s site packages directory and can be used to verify the Intel(R) Crash Log Framework setup:

$ cd /path/to/python/site-packages/intel/crashlog/collateral
$ cd intel/crashlog/collateral/ICL/all/all/white/crashlog/tests/samples
$ intel_crashlog analyze ./3strike_timeout.crashlog
Three-strike timeout - TIMEOUT.MLC.WDT
======================================

Three-strike timeout detected.
Machine-Check Address: 0xFFFFF802ACA8FE34.
Outstanding transaction(s):

- Thread 1 access (Memory read) to address: 0xfe1fe204, State: 6
...

Obtaining a Crash Log

Enabling the Crash Log Feature

The System Firmware has the ability to control the Crash Log platform feature. An entry in the IAFW settings allows the Crash Log feature to be configured (depending on the platform, it can usually be found in the Debug Settings page)

../../_images/iafw_debug_settings.png

Extraction

The Intel(R) Crash Log Framework provides a set of extraction scripts. They can be invoked by using the extract subcommand.

$ intel_crashlog extract <method> <output_file>
  • <method>: extraction method to use. Multiple methods may be specified. If none is specified, the default extraction method is used.

  • <output_​file>: Crash Log Raw File to produce.

Out-of-Band Extraction

The out-of-band extraction methods use a debug connection for the extraction and are based on Intel(R) In-Target Probe (Intel(R) ITP) scripting. As most of the out-of-band extraction methods are defined per platform, it is recommended to explicitly specify the platform model using the --product <product>, --variant <variant>, and --stepping <stepping> options.

Depending on the platform, multiple out-of-band extraction methods may be available. One of the most common extraction method consists of reading the Crash Log Storage from the Memory-mapped input/output (MMIO) space.

For using the out-of-band extraction flow with a debug connection based on Intel(R) In-Target Probe, the location of the OpenIPC installation folder must be specified by the IPC_​PATH environment variable.

Assuming that an OpenIPC process is already configured and running on the host, the default extraction method can be invoked by the following command:

$ intel_crashlog extract sample.crashlog

Note: This script requires the Crash Log Storage to be already mapped into the address space.

Out-of-Band Manual Trigger

It is sometimes useful to manually trigger the collection of a Crash Log. When an out-of-band trigger method is available in the platform under test, it can be called from the command line tool by using the trigger subcommand:

$ intel_crashlog trigger

If this command is completed successfully, a Crash Log is generated on the platform and can be extracted using the Out-of-Band Extraction.

Decoding and Analyzing a Crash Log Raw File

Once the Crash Log Raw File has been extracted, it can be decoded and analyzed. This section provides examples of how to interface with the Crash Log Raw File using a command line interface or a Python* API.

Command-Line Interface

The command-line tool is implemented through the intel_​crashlog executable. It aims to provide a simple way to invoke the functionality of the Intel(R) Crash Log Framework. The following sections describe how it can be used to interact with a pre-extracted Crash Log raw file.

Decoding

The decode subcommand produces a JSON-formatted output representing the values of the fields contained in the Crash Log. The output can be redirected to a file by using the -o <file> option. If none is specified, the JSON is printed in the standard output.

$ intel_crashlog decode /path/to/crashlog_file
{
    "crashlog_data": {
        "PMC": {
            "art_timestamp": "0x4e476eec78l",
            "crashlog_reason": "0x8",
            "crashlog_version": "0x1001001",
            "crashlog_completion_status": "0x800001ffl",
            "pmc_fw_engineering_version": "0x0",
            "pmc_fw_release_version": "0x4e",
            "reset_sequence_id": "0x0",
            ...
        },
        ...
    }
}

The decoding of two Crash Log files can be compared with the diff command:

$  intel_crashlog diff example1.crashlog example2.crashlog
--- example1.crashlog
+++ example2.crashlog
@@ -21,5 +21,5 @@
-        "art_timestamp": "0x6737b32eL",
+        "art_timestamp": "0x1be83d64L",

Analysis

The complete analysis of a Crash Log file can be displayed with the following command:

$ intel_crashlog analyze /path/to/crashlog_file

The goal of the analysis is to provide diagnostic information to characterize the crash based on the Crash Log data.

The output of this command is formatted in Markdown* and adopts the following structure:

Subsystem Name - BUCKET
=======================

Subsystem analysis

where:

  • Subsystem Name: Name of the system analyzed in the section.

  • BUCKET: The field representing the category of the analysis. It can be:

    • NO_​ERROR_​DETECTED: the subsystem analyzed is healthy and is highly unlikely to be involved in the platform crash.

    • UNKNOWN: the analysis could not make a conclusion about the state of the subsystem.

    • FAILED: Analysis failed due to an internal error.

    • Other values represent the type of the error detected in the subsystem.

  • Subsystem analysis: A Markdown-formatted explanation of the state of the subsystem during the platform crash.

Display the summary of the analysis with the summary command. The output is expected to be empty if no platform errors have been identified in the Crash Log file:

$ intel_crashlog summary /path/to/crashlog_file

Triage

Since the analyze subcommand is intended to produce a report in a natural language, it may not be suitable for automatic processing. For a given Crash Log raw file, the triage subcommand aggregates the buckets returned by the analyzers and prints the most relevant ones, sorted by level of severity:

$ intel_crashlog triage /path/to/crashlog_file
HW_DETECTED_DETECTED.PCIE.TL
TIMEOUT.PUNIT.DISPLAY

When a Crash Log sample does not report any error, the NO_​ERROR_​DETECTED bucket will be returned by the triage command.

$ intel_crashlog triage manually_triggered.crashlog
NO_ERROR_DETECTED

Using a Collateral Patch

The Intel(R) Crash Log Framework can be dynamically extended by using Collateral Patches. These are occasionally used to provide a support for an uncommon or not yet officially supported Crash Log layout in the mainstream release.

The typical steps to use a collateral patch are:

  1. If archived, extract the collateral patch. Example:

$ unzip ICPN_patch_rev02.zip
$ tree ./ICPN_patch_rev02
./ICPN_patch_rev02
`-- ICP
    |-- N
    |   `-- all
    |       `-- white
    |           `-- crashlog
    |               `-- decode-defs
    |                   |-- PMC
    |                   |   |-- 1
    |                   |   |   `-- layout.xml
    |                   |   `-- 2
    |                   |       `-- layout.xml
    |                   `-- PMC_FW_Trace
    |                       |-- 1
    |                       |   `-- layout.xml
    |                       `-- 2
    |                           `-- layout.xml
    `-- all
        `-- all
            `-- white
                `-- crashlog
                    `-- extractors
                        `-- tap2sb.py
  1. Specify the location of the extracted patch using the --ct option before the command argument:

$ intel_crashlog --ct <patch> <command> <crashlog_file>

Example:

$ intel_crashlog --ct ./ICPN_patch_rev02 info /path/to/crashlog_file
- 0000-027f: PMC_FW_Trace - ICP/N (RT: 0x02, PID: 0x005, Rev:0x02)

Miscelleneous Commands

List available analyzers for the specified Crash Log file:

$ intel_crashlog list /path/to/crashlog_file
global_reset
============

- Path: CNP\all\all\red\crashlog\analyzers\global_reset.py
- Full name: Global Reset
- Full version: 1.4.0
- Description: Analyze global reset cause
...

Manually specify which analyzer to run (here, system state):

$ intel_crashlog analyze /path/to/crashlog_file system_state
System State
============

Current state: S0
Attempted state: UNKNONW
System state transition: UNKNOWN

Display the cause of the Crash Log trigger:

$ intel_crashlog analyze /path/to/crashlog_file crashlog_reason
PMC Crashlog Reason - NO_ERROR_DETECTED.CPU_TRIG
================================================

This Crash Log was triggered by the CPU.

Punit Crashlog Reason - NO_ERROR_DETECTED.MANUAL.PUNIT
======================================================

Punit Crash Log has been manually triggered.

Export the Crash Log Report of the analysis to JSON:

$ intel_crashlog report -o crashlog_report.json /path/to/crashlog_file

Python* API

The Intel(R) Crash Log Framework can also be called from Python*. This section presents a typical usage flow using the Python* API.

The Crash Log module can be loaded into a Python* environment using:

>>> import intel.crashlog as crashlog

Decoding

First, the Crash Log has to be loaded from a file into a bytearray object:

>>> crashlog_dump = crashlog.extract_from_file("gblrst.crashlog")

The bytearray object can then be decoded to a register object:

>>> crashlog_regs = crashlog.decode(crashlog_dump)

The decoded values can be accessed from the register object as regular Python attributes:

>>> crashlog_regs.PMC.hest1
hests1 = 0x2112080
  batlow_sts = 0x0
  crda_sent = 0x0
  cts = 0x0
  flex_sku_done = 0x1
  grst_2_host = 0x0
  host_prim_rst_sts = 0x1
  ...
>>> int(crashlog_regs.PMC.hests1)
34676864
>>> int(crashlog_regs.PMC.hests1.host_prim_rst_sts)
1
>>> crashlog_regs.PMC.revision
2

Analysis

Finally, the decoded registers can be analyzed. The analyze function takes the register object as argument and returns a dict containing the analysis and the metadata of the scripts used.

>>> crashlog_analysis = crashlog.analyze(crashlog_regs)
>>> crashlog_analysis
{
    'System State': [
        {
            'Autorun': False,
            'Description': 'Analyze system state',
            'Full name': 'System State',
            'Path': 'path/to/system_state.py',
            'Version': '1.1.0',
            'html': u'...',
            'markdown': [
                'System State',
                '------------',
                '',
                '- `Current state`: S0'
             ],
            'bucket': 'TIMEOUT'
        }
    ],
    ...
}

Triage

Use the triage function to obtain the list of the buckets suitable for the analyzed Crash Log from the analysis report. The list is ordered according to the severity level of each bucket.

>>> buckets = crashlog.triage(crashlog_analysis)
>>> buckets
["HW_EXCEPTION_DETECTED.PCIE.TL.PL", "TIMEOUT.PUNIT.DISPLAY"]

Glossary

Collateral Patch

A set of files used to extend the Intel(R) Crash Log Framework in order to support additional hardware targets.

Crash Log Extraction

Refers to the reading or transformation of a Crash Log from a Crash Log Storage to a computer file (Crash Log Raw File).

Crash Log Raw File

The Crash Log Raw File is a computer file representation of the crash data.

Crash Log Report

JSON file produced by the Intel(R) Crash Log Framework containing the complete interpretation of a Crash Log Raw File.

Crash Log Storage

Persistent memory storage containing the Crash Log.