[PATCH v2 00/17] Solve iommu probe races around iommu_fwspec

Jason Gunthorpe jgg at nvidia.com
Tue Nov 21 09:55:57 PST 2023


On Tue, Nov 21, 2023 at 04:06:15PM +0000, Robin Murphy wrote:

> > Obviously. I rejected that right away because of how incredibly
> > wrongly layered and hacky it is to do something like that.
> 
> What, and dressing up the fundamental layering violation by baking it even
> further into the API flow, while still not actually fixing it or any of its
> *other* symptoms, is somehow better?

It puts the locks and the controlled data in the right place, in the
right layer.

> Ultimately, this series is still basically doing the same thing my patch
> does - extending the scope of the existing iommu_probe_device_lock hack to
> cover fwspec creation. A hack is a hack, so frankly I'd rather it be simple
> and obvious and look like one, and being easy to remove again is an obvious
> bonus too.

Not entirely, as I've said this series removes the use of the dev->iommu
during fwspec creation. This is different than just extending the lock.

> > So, I would move toward having the driver's probe invoke a helper like:
> > 
> >     iommu_of_xlate(dev, fwspec, &per_fwnode_function, &args);
> > 
> > Which generates the same list of (fwnode_handle, of_phandle_args) that
> > was passed to of_xlate today, but is ordered sensibly within the
> > sequence of probe for what many drivers seem to want to do.
> 
> Grep for of_xlate. It is a standard and well-understood callback pattern for
> a subsystem to parse a common DT binding and pass a driver-specific
> specifier to a driver to interpret. Or maybe you just have a peculiar
> definition of what you think "micro-optimisation" means? :/

Don't know about other subsystems, here is making a mess. Look at what
the drivers are doing. The two SMMU drivers are sort of sane, but
everything else has coded half their pobe into of_xlate. It doesn't
make alot of sense.

> > So, it is not so much that that the idea of of_xlate goes away, but
> > the specific op->of_xlate does, it gets shifted into a helper that
> > invokes the same function in a more logical spot.
> 
> I'm curious how you imagine an IOMMU driver's ->probe function could be
> called *before* parsing the firmware to work out what, if any, IOMMU, and
> thus driver, a device is associated with.

You've jumped ahead here, I'm talking about removing ops->of_xlate.

With everything kept the same as today this means we'd scan the FW
description twice. Once to find the ops and once inside the driver to
parse it.

When I say micro-optimization this is what I mean - structuring the
code to try to avoid multiple-scans of the FW table.

Once the drivers are self-parsing I see there are two paths to keep it
at one FW parse:

 1) Have the core code parse and cache common cases in the iommu_fwspec.
    Driver then pulls out the common cases from the iommu_fwspec and
    reparsed in uncommon cases.

 2) Accept we have only a single ops in all real systems and just
    invoke the driver to parse it. That parse will cache enough
    information to decide if EPROBE_DEFER is needed.

In either case the drivers would call the same APIs and have the same
logic. The choice is just infrastructure-side stuff.

It is a different view than trying to do everything up front, but I'm
not seeing that it is so differently efficient as to be a performance
problem.

On the plus side it looks to make the drivers alot simpler and more
logical.

> And then every driver has to have
> identical boilerplate to go off and parse the generic "iommus" binding...
> which is the whole damn reason for *not* going down that route and instead
> using an of_xlate mechanism in the first place.

Let's not guess. I've attached below a sketch conversion for
apple-dart.

Diffstat says it *saves* 1 line. But also we fix several bugs!

 - iommu_of_xlate() will check for NULL fwspec and reject the bus
   probe path

 - iommu_of_xlate() can match the iommu's from the iommu list and
   check that the OF actually points only to iommus with this driver's
   ops (missed sanity check)

 - The of parsing machinery already computes the iommu_driver but throws
   it out. This forces all of the drivers to do their own thing. Pass
   it in and thus apple_dart_of_xlate() doesn't need to mess around
   with of_find_device_by_node() and we fix the bug where it leaks the
   reference on the struct device (woops!)

 - We don't leak the cfg allocation that apple_dart_of_xlate() did on
   various error paths. All error paths logically free in probe. We
   don't have to think about what happens if of_xlate is called twice
   for the same args on error/reprobe paths.

Many drivers follow this pattern of apple-dart and would end up like this.

Drivers that just need the u32 array would be simpler, more like:

   struct iommu_driver *instance;
   unsigned int num_ids;

   instance = iommu_of_get_iommu(..., &num_ids);
   if (IS_ERR(instance))
      return ERR_CAST(instance);
   cfg = kzalloc(struct_size(cfg, sids, num_ids), GFP_KERNEL);
   if (!cfg)
      return -ENOMEM;
   iommu_of_read_u32_ids(..., &cfg->sids);
   [..]
   return instance;

I haven't sketched *every* driver yet, but I've sketched enough to be
confident.

Robin, I know it is not what you have in your head, but you should
stop with the insults and be more open to other perspectives.

> > The per-device data can be allocated at the top of probe and passed
> > through args to fix the lifetime bugs.
> > 
> > It is pretty simple to do.
> 
> I believe the kids these days would say "Say you don't understand the code
> without saying you don't understand the code."

I think you are too fixated on what you have in your mind. It will
take me a bit but I will come with a series to move all the FW parsing
into probe along this model. Once done it is trivial to make bus probe
work like it should.

Regarding this series, I don't really see a problem. There is no
"concrete" or anything like that.

> > Not quite. This decouples two unrelated things into seperate
> > concerns. It is not so much about the concurrency but removing the
> > abuse of dev->iommu by code that has no need to touch it at all.
> 
> Sorry, the "abuse" of storing IOMMU-API-specific data in the place we
> intentionally created to consolidate all the IOMMU-API-specific data
> into?

The global state should not be filled until the driver is probed. It
is an abuse to use a global instead of an on-stack variable when
building it. Publishing only fully initialized data to global
visibility is concurrency 101. :(

Jason

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index ee05f4824bfad1..476938722460b8 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -721,19 +721,29 @@ static struct iommu_domain apple_dart_blocked_domain = {
 
 static struct iommu_device *apple_dart_probe_device(struct device *dev)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
+	struct apple_dart_master_cfg *cfg;
 	struct apple_dart_stream_map *stream_map;
+	int ret;
 	int i;
 
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
 	if (!cfg)
-		return ERR_PTR(-ENODEV);
+		return ERR_PTR(-ENOMEM);
+	ret = iommu_of_xlate(dev_iommu_fwspec_get(dev), &apple_dart_iommu_ops,
+			     1, &apple_dart_of_xlate, cfg);
+	if (ret)
+		goto err_free;
 
 	for_each_stream_map(i, cfg, stream_map)
 		device_link_add(
 			dev, stream_map->dart->dev,
 			DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
 
+	dev_iommu_priv_set(dev, cfg);
 	return &cfg->stream_maps[0].dart->iommu;
+err_free:
+	kfree(cfg);
+	return ret;
 }
 
 static void apple_dart_release_device(struct device *dev)
@@ -777,25 +787,15 @@ static void apple_dart_domain_free(struct iommu_domain *domain)
 	kfree(dart_domain);
 }
 
-static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args)
+static int apple_dart_of_xlate(struct iommu_device *iommu,
+			       struct of_phandle_args *args, void *priv)
 {
-	struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev);
-	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
-	struct apple_dart *dart = platform_get_drvdata(iommu_pdev);
-	struct apple_dart *cfg_dart;
-	int i, sid;
+	struct apple_dart *dart = container_of(iommu, struct apple_dart, iommu);
+	struct apple_dart_master_cfg *cfg = priv;
+	struct apple_dart *cfg_dart = cfg->stream_maps[0].dart;
+	int sid = args->args[0];
+	int i;
 
-	if (args->args_count != 1)
-		return -EINVAL;
-	sid = args->args[0];
-
-	if (!cfg)
-		cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
-	if (!cfg)
-		return -ENOMEM;
-	dev_iommu_priv_set(dev, cfg);
-
-	cfg_dart = cfg->stream_maps[0].dart;
 	if (cfg_dart) {
 		if (cfg_dart->supports_bypass != dart->supports_bypass)
 			return -EINVAL;
@@ -980,7 +980,6 @@ static const struct iommu_ops apple_dart_iommu_ops = {
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
 	.device_group = apple_dart_device_group,
-	.of_xlate = apple_dart_of_xlate,
 	.def_domain_type = apple_dart_def_domain_type,
 	.get_resv_regions = apple_dart_get_resv_regions,
 	.pgsize_bitmap = -1UL, /* Restricted during dart probe */



More information about the linux-riscv mailing list