[PATCH V17 01/11] acpi: apei: read ack upon ghes record consumption

Baicar, Tyler tbaicar at codeaurora.org
Fri Jun 30 09:47:17 PDT 2017


On 6/30/2017 4:10 AM, Robert Richter wrote:
> Tyler,
>
> On 19.05.17 14:32:03, Tyler Baicar wrote:
>> A RAS (Reliability, Availability, Serviceability) controller
>> may be a separate processor running in parallel with OS
>> execution, and may generate error records for consumption by
>> the OS. If the RAS controller produces multiple error records,
>> then they may be overwritten before the OS has consumed them.
>>
>> The Generic Hardware Error Source (GHES) v2 structure
>> introduces the capability for the OS to acknowledge the
>> consumption of the error record generated by the RAS
>> controller. A RAS controller supporting GHESv2 shall wait for
>> the acknowledgment before writing a new error record, thus
>> eliminating the race condition.
>>
>> Add support for parsing of GHESv2 sub-tables as well.
>>
>> Signed-off-by: Tyler Baicar <tbaicar at codeaurora.org>
>> CC: Jonathan (Zhixiong) Zhang <zjzhang at codeaurora.org>
>> Reviewed-by: James Morse <james.morse at arm.com>
>> ---
>>   drivers/acpi/apei/ghes.c | 59 +++++++++++++++++++++++++++++++++++++++++++++---
>>   drivers/acpi/apei/hest.c |  7 ++++--
>>   include/acpi/ghes.h      |  5 +++-
>>   3 files changed, 65 insertions(+), 6 deletions(-)
>>   static int ghes_proc(struct ghes *ghes)
>>   {
>>   	int rc;
>> @@ -661,6 +704,16 @@ static int ghes_proc(struct ghes *ghes)
>>   			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>   	}
>>   	ghes_do_proc(ghes, ghes->estatus);
>> +
>> +	/*
>> +	 * GHESv2 type HEST entries introduce support for error acknowledgment,
>> +	 * so only acknowledge the error if this support is present.
>> +	 */
>> +	if (is_hest_type_generic_v2(ghes)) {
>> +		rc = ghes_ack_error(ghes->generic_v2);
>> +		if (rc)
>> +			return rc;
>> +	}
>>   out:
>>   	ghes_clear_estatus(ghes);
>>   	return rc;
> was there any specific reason why the ack is sent before clearing the
> block status? Spec says the ack should be sent at last.
>
> Also, the block is never cleared if ghes_ack_error() returns an error.
> IMO we should fall through and clear the block status (this will
> change anyway if the bloc status is cleared first).
Hello Robert,

Thank you for pointing this out. I will send a patch to move the ack 
after the ghes_clear_estatus. This is probably the right thing to do 
since right now if the FW populates an invalid estatus, we will fail to 
read the estatus, jump to 'out:', and never send the ack.

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.




More information about the linux-arm-kernel mailing list