[bug report] iommu/arm-smmu-v3: Event cannot be printed in some scenarios

Jason Gunthorpe jgg at ziepe.ca
Tue Aug 6 05:49:43 PDT 2024


On Mon, Aug 05, 2024 at 03:32:50PM +0000, Pranjal Shrivastava wrote:
> Here's the updated diff:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index a31460f9f3d4..ed2b106e02dd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1777,7 +1777,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>  		goto out_unlock;
>  	}
>  
> -	iommu_report_device_fault(master->dev, &fault_evt);
> +	ret = iommu_report_device_fault(master->dev, &fault_evt);
>  out_unlock:
>  	mutex_unlock(&smmu->streams_mutex);
>  	return ret;
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 0e3a9b38bef2..7684e7562584 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -532,6 +532,9 @@ void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
>  	bool last_page;
>  	u16 sid;
>  
> +	if (!evt)
> +		return;
> +

I'm not sure this make sense??

The point of this path is for the driver to retire the fault with a
failure. This prevents that from happing on Intel and we are back to
loosing track of a fault.

All calls to iommu_report_device_fault() must result in
page_response() properly retiring whatever the event was.

> +static void iopf_error_response(struct device *dev, struct iommu_fault *fault)
> +{
> +	const struct iommu_ops *ops = dev_iommu_ops(dev);
> +	struct iommu_page_response resp = {
> +		.pasid = fault->prm.pasid,
> +		.grpid = fault->prm.grpid,
> +		.code = IOMMU_PAGE_RESP_INVALID
> +	};
> +
> +	ops->page_response(dev, NULL, &resp);
> +}

The issue originates here, why is this NULL?

void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{

The caller has an evt? I think we should pass it down.

Looking at the abort_group path that is effectively what we do, but
the evt is copied to the group's evt first.

I also noticed we have another similar issue with the
report_partial_fault() loosing the fault if memory allocation
fails.. A goto for your new err label after report_partial_fault()
would be appropriate too

Jason



More information about the linux-arm-kernel mailing list