[bug report] iommu/arm-smmu-v3: Event cannot be printed in some scenarios
Jason Gunthorpe
jgg at ziepe.ca
Tue Aug 6 05:49:43 PDT 2024
On Mon, Aug 05, 2024 at 03:32:50PM +0000, Pranjal Shrivastava wrote:
> Here's the updated diff:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index a31460f9f3d4..ed2b106e02dd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1777,7 +1777,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> goto out_unlock;
> }
>
> - iommu_report_device_fault(master->dev, &fault_evt);
> + ret = iommu_report_device_fault(master->dev, &fault_evt);
> out_unlock:
> mutex_unlock(&smmu->streams_mutex);
> return ret;
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 0e3a9b38bef2..7684e7562584 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -532,6 +532,9 @@ void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
> bool last_page;
> u16 sid;
>
> + if (!evt)
> + return;
> +
I'm not sure this make sense??
The point of this path is for the driver to retire the fault with a
failure. This prevents that from happing on Intel and we are back to
loosing track of a fault.
All calls to iommu_report_device_fault() must result in
page_response() properly retiring whatever the event was.
> +static void iopf_error_response(struct device *dev, struct iommu_fault *fault)
> +{
> + const struct iommu_ops *ops = dev_iommu_ops(dev);
> + struct iommu_page_response resp = {
> + .pasid = fault->prm.pasid,
> + .grpid = fault->prm.grpid,
> + .code = IOMMU_PAGE_RESP_INVALID
> + };
> +
> + ops->page_response(dev, NULL, &resp);
> +}
The issue originates here, why is this NULL?
void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
{
The caller has an evt? I think we should pass it down.
Looking at the abort_group path that is effectively what we do, but
the evt is copied to the group's evt first.
I also noticed we have another similar issue with the
report_partial_fault() loosing the fault if memory allocation
fails.. A goto for your new err label after report_partial_fault()
would be appropriate too
Jason
More information about the linux-arm-kernel
mailing list