[PATCH v3 1/8] trace: ras: add ARM processor error information trace event
Baicar, Tyler
tbaicar at codeaurora.org
Fri Apr 14 13:36:06 PDT 2017
On 3/30/2017 4:31 AM, Xie XiuQi wrote:
> Add a new trace event for ARM processor error information, so that
> the user will know what error occurred. With this information the
> user may take appropriate action.
>
> These trace events are consistent with the ARM processor error
> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>
> ---
> v2: add trace enabled condition as Steven's suggestion.
> fix a typo.
> ---
>
> Cc: Steven Rostedt <rostedt at goodmis.org>
> Cc: Tyler Baicar <tbaicar at codeaurora.org>
> Signed-off-by: Xie XiuQi <xiexiuqi at huawei.com>
> ---
...
>
> +#define ARM_PROC_ERR_TYPE \
> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
> +
> +#define ARM_PROC_ERR_FLAGS \
> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
> +
Hello Xie XiuQi,
This isn't compiling for me because of these definitions. Here you are
using ARM_*, but below in the TP_printk you are using ARCH_*. The
compiler complains the ARCH_* ones are undefined:
./include/trace/../../include/ras/ras_event.h:278:37: error:
'ARCH_PROC_ERR_TYPE' undeclared (first use in this function)
__print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
./include/trace/../../include/ras/ras_event.h:280:38: error:
'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function)
__print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),
> +/*
> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
> + * via TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
> +
> +ARM_PROC_ERR_TYPE
> +ARM_PROC_ERR_FLAGS
Are the above two lines supposed to be here?
> +
> +/*
> + * Now redefine the EM() and EMe() macros to map the enums to the strings
> + * that will be printed in the output.
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) { a, b },
> +#define EMe(a, b) { a, b }
> +
> +TRACE_EVENT(arm_proc_err,
I think it would be better to keep this similar to the naming of the
current RAS trace events (right now we have mc_event, arm_event,
aer_event, etc.). I would suggest using "arm_err_info_event" since this
is handling the error information structures of the arm errors.
> +
> + TP_PROTO(const struct cper_arm_err_info *err),
> +
> + TP_ARGS(err),
> +
> + TP_STRUCT__entry(
> + __field(u8, type)
> + __field(u16, multiple_error)
> + __field(u8, flags)
> + __field(u64, error_info)
> + __field(u64, virt_fault_addr)
> + __field(u64, physical_fault_addr)
Validation bits should also be a part of this structure that way user
space tools will know which of these fields are valid.
> + ),
> +
> + TP_fast_assign(
> + __entry->type = err->type;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
> + __entry->multiple_error = err->multiple_error;
> + else
> + __entry->multiple_error = ~0;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
> + __entry->flags = err->flags;
> + else
> + __entry->flags = ~0;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
> + __entry->error_info = err->error_info;
> + else
> + __entry->error_info = 0ULL;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
> + __entry->virt_fault_addr = err->virt_fault_addr;
> + else
> + __entry->virt_fault_addr = 0ULL;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
> + __entry->physical_fault_addr = err->physical_fault_addr;
> + else
> + __entry->physical_fault_addr = 0ULL;
> + ),
> +
> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
I think the "ARM Processor Error:" part of this should just be removed.
Here's the output with this removed and the trace event renamed to
arm_err_info_event. I think this looks much cleaner and matches the
style used with the arm_event.
<idle>-0 [020] .ns. 366.592434: arm_event: affinity
level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running
state: 1; PSCI state: 0
<idle>-0 [020] .ns. 366.592437: arm_err_info_event:
type cache error; count: 0; flags: 0x3; error info: 0000000000c20058;
virtual address: 0000000000000000; physical address: 0000000000000000
Thanks,
Tyler
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
More information about the linux-arm-kernel
mailing list