[PATCH v3 1/8] trace: ras: add ARM processor error information trace event

Baicar, Tyler tbaicar at codeaurora.org
Fri Apr 14 13:36:06 PDT 2017


On 3/30/2017 4:31 AM, Xie XiuQi wrote:
> Add a new trace event for ARM processor error information, so that
> the user will know what error occurred. With this information the
> user may take appropriate action.
>
> These trace events are consistent with the ARM processor error
> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>
> ---
> v2: add trace enabled condition as Steven's suggestion.
>      fix a typo.
> ---
>
> Cc: Steven Rostedt <rostedt at goodmis.org>
> Cc: Tyler Baicar <tbaicar at codeaurora.org>
> Signed-off-by: Xie XiuQi <xiexiuqi at huawei.com>
> ---
...
>   
> +#define ARM_PROC_ERR_TYPE	\
> +	EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" )	\
> +	EM ( CPER_ARM_INFO_TYPE_TLB,  "TLB error" )	\
> +	EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" )	\
> +	EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
> +
> +#define ARM_PROC_ERR_FLAGS	\
> +	EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" )	\
> +	EM ( CPER_ARM_INFO_FLAGS_LAST,  "Last error captured" )	\
> +	EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" )	\
> +	EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
> +
Hello Xie XiuQi,

This isn't compiling for me because of these definitions. Here you are 
using ARM_*, but below in the TP_printk you are using ARCH_*. The 
compiler complains the ARCH_* ones are undefined:

./include/trace/../../include/ras/ras_event.h:278:37: error: 
'ARCH_PROC_ERR_TYPE' undeclared (first use in this function)
      __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
./include/trace/../../include/ras/ras_event.h:280:38: error: 
'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function)
      __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),

> +/*
> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
> + * via TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b)	TRACE_DEFINE_ENUM(a);
> +
> +ARM_PROC_ERR_TYPE
> +ARM_PROC_ERR_FLAGS
Are the above two lines supposed to be here?
> +
> +/*
> + * Now redefine the EM() and EMe() macros to map the enums to the strings
> + * that will be printed in the output.
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b)		{ a, b },
> +#define EMe(a, b)	{ a, b }
> +
> +TRACE_EVENT(arm_proc_err,
I think it would be better to keep this similar to the naming of the 
current RAS trace events (right now we have mc_event, arm_event, 
aer_event, etc.). I would suggest using "arm_err_info_event" since this 
is handling the error information structures of the arm errors.
> +
> +	TP_PROTO(const struct cper_arm_err_info *err),
> +
> +	TP_ARGS(err),
> +
> +	TP_STRUCT__entry(
> +		__field(u8, type)
> +		__field(u16, multiple_error)
> +		__field(u8, flags)
> +		__field(u64, error_info)
> +		__field(u64, virt_fault_addr)
> +		__field(u64, physical_fault_addr)
Validation bits should also be a part of this structure that way user 
space tools will know which of these fields are valid.
> +	),
> +
> +	TP_fast_assign(
> +		__entry->type = err->type;
> +
> +		if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
> +			__entry->multiple_error = err->multiple_error;
> +		else
> +			__entry->multiple_error = ~0;
> +
> +		if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
> +			__entry->flags = err->flags;
> +		else
> +			__entry->flags = ~0;
> +
> +		if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
> +			__entry->error_info = err->error_info;
> +		else
> +			__entry->error_info = 0ULL;
> +
> +		if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
> +			__entry->virt_fault_addr = err->virt_fault_addr;
> +		else
> +			__entry->virt_fault_addr = 0ULL;
> +
> +		if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
> +			__entry->physical_fault_addr = err->physical_fault_addr;
> +		else
> +			__entry->physical_fault_addr = 0ULL;
> +	),
> +
> +	TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
I think the "ARM Processor Error:" part of this should just be removed. 
Here's the output with this removed and the trace event renamed to 
arm_err_info_event. I think this looks much cleaner and matches the 
style used with the arm_event.

           <idle>-0     [020] .ns.   366.592434: arm_event: affinity 
level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running 
state: 1; PSCI state: 0
           <idle>-0     [020] .ns.   366.592437: arm_err_info_event: 
type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; 
virtual address: 0000000000000000; physical address: 0000000000000000

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.




More information about the linux-arm-kernel mailing list