[PATCH] amd iommu: force flush of iommu prior during shutdown

Thu Apr 1 10:47:36 EDT 2010

On Thu, Apr 01, 2010 at 04:29:02PM +0200, Joerg Roedel wrote:
> Hi Neil,
> 
> first some general words about the problem you discovered: The problem
> is not caused by in-flight DMA. The problem is that the IOMMU hardware
> has cached the old DTE entry for the device (including the old
> page-table root pointer) and is using it still when the kdump kernel has
> booted. We had this problem once and fixed it by flushing a DTE in the
> IOMMU before it is used for the first time. This seems to be broken
> now. Which kernel have you seen this on?
> 
First noted on 2.6.32 (the RHEL6 beta kernel), but I've reproduced with the
latest linus tree as well.

> I am back in office next tuesday and will look into this problem too.
> 
Thank you.

> On Wed, Mar 31, 2010 at 04:27:45PM -0400, Neil Horman wrote:
> > So I'm officially rescinding this patch.
> 
> Yeah, the right solution to this problem is to find out why every DTE is
> not longer flushed before first use.
> 
Right, I've checked the commits that chris noted in his previous email and
they're in place, so I'm not sure how we're getting stale dte's

> > It apparently just covered up the problem, rather than solved it
> > outright.  This is going to take some more thought on my part.  I read
> > the code a bit closer, and the amd iommu on boot up currently marks
> > all its entries as valid and having a valid translation (because if
> > they're marked as invalid they're passed through untranslated which
> > strikes me as dangerous, since it means a dma address treated as a bus
> > address could lead to memory corruption.  The saving grace is that
> > they are marked as non-readable and non-writeable, so any device doing
> > a dma after the reinit should get logged (which it does), and then
> > target aborted (which should effectively squash the translation)
> 
> Right. The default for all devices is to forbid DMA.
> 
Thanks, glad to know I read that right, took me a bit to understand it :)

> > I'm starting to wonder if:
> > 
> > 1) some dmas are so long lived they start aliasing new dmas that get mapped in
> > the kdump kernel leading to various erroneous behavior
> 
> At least not in this case. Even when this is true the DMA would target
> memory of the crashed kernel and not the kdump area. This is not even
> memory corruption because the device will write to memory the driver has
> allocated for it.
> 
Yeah, I figured that old dma's going to old locations were ok, I was more
concerned that if an 'old' dma lived through our resetting of the iommu page
table, leading to us pointing an old dma address to a new physical address
within the new kernel memory space.  Although, given the reset state of the
tables, for that to happen a dma would have to not attempt a memory transaction
until sometime later in the boot, which seems...unlikely to say the least, so I
agree this is almost certainly not happening.

> > 2) a slew of target aborts to some hardware results in them being in an
> > inconsistent state
> 
> Thats indeed true. I have seen that with ixgbe cards for example. They
> seem to be really confused after an target abort.
> 
Yeah, this part worries me, target aborts lead to various brain dead hardware
pieces.  What are you thoughts on leaving the iommu on through a reboot to avoid
this issue (possibly resetting any pci device that encounters a target abort, as
noted in the error log on the iommu?

> > I'm going to try marking the dev table on shutdown such that all devices have no
> > read/write permissions to see if that changes the situation.  It should I think
> > give me a pointer as to weather (1) or (2) is the more likely problem.
> 
> Probably not. You still need to flush the old entries out of the IOMMU.
> 
Yeah, after reading your explination above, I agree
Neil

>