Trying to test my gart/iommu vmcore problem on RH

Thu Sep 4 19:28:48 EDT 2008

On Mon, 2008-08-25 at 13:46 +0000, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal at redhat.com> writes:
> 
> > On Fri, Aug 22, 2008 at 04:48:10PM -0700, Eric W. Biederman wrote:
> >>
> >> Hmm.  Thinking about this we actually have 2 problems.
> >> - Communication about what is going on.
> >> - How to handle an iommu in the event of a crash dump scenario.
> >>
> >> The current solution is to ignore the iommu, and use swiotlb.  This
> >> solution does not look like it will work for future iommus.
Howdy all,

There are several aspects to this problem that make solutions come in
and out of contention:

1.  Kexec vs Kdump

	Kexec: If we are kexec'ing normally, we assume that the shutdown has
successfully stopped DMAs prior to starting our new kernel, and if not,
it's a bug in the previous kernel's driver shutdown.  So no issue here,
right?

	Kdump: The driver shutdown has been skipped as we go down during a
crash, so assume that leftover DMA operations might be in progress as
the kdump kernel comes up.  BUT!  They will be in progress to some area
of memory other than the memory being used by the kdump kernel (it has
its own crashkernel sandbox).  And on my 2.6.18-based system, with an
AMD64 NB GART-acting-as-IOMMU, the kdump kernel *does not* try to
initialize or use an IOMMU when it comes up because its memory size is
too small to need one (no one is setting crashkernel=4G at 4G).  So the
kdump kernel can successfully ignore the old IOs using the old GART
aperture IOMMU.

EXCEPT(!) for the fact the we've left CPU-side translations turned on in
the GART NB hardware and the kdump kernel will currently read through
that zone using /proc/vmcore or /dev/oldmem.  That's why I like fixing
my stone-age problem by turning off CPU-side access. 

Note that real (future?) IOMMUs don't even have the concept of
translating accesses from the CPU side.  They only work on IO requests.
So reading old memory areas from the crashed kernel shouldn't cause an
IOMMU to "do" anything.

2.  GART vs Calgary vs "new AMD IOMMU" vs "new Intel IOMMU"

	The GART-as-IOMMU thing is not a "real" IOMMU.  It doesn't offer much
of the interesting protection of a real IOMMU, just "valid", "coherent"
and a translation address.  An IO card is still free to screw up and hit
other addresses outside the aperture if it wants to, or hit other pages
in the aperture that really belong to some other driver, or to write to
a page that it should only read, etc.  Consequently, there isn't much
desire to utilize the GART thing unless I really need 32-bit IO card
access to 40-bit address space.  Since I don't need that in the kdump
kernel (currently), there's no reason to try to use the GART there, so
it's safe to ignore it, as long as I don't provoke it :-)

BUT, if I had a real IOMMU that provided cool protection stuff and
domain stuff, and not just address range expansion for old IO cards,
then I might want to (or be forced to) use it all the time, independent
of memory size, and then the kdump kernel might really need to deal with
sharing it in some way with old leftover DMAs from the crashed kernel
that we're dumping.  And this, I think, is the only real issue looming.

But this should only be a kdump issue, and not a kexec issue (see #1
above), because the previous kernel should have shut all that down
before it kexec'd, right?

3.  IOMMU vs swiotlb

Isn't swiotlb just a way of hiding bounce buffer copies and management
inside of the dma_map_single and dma_unmap_single calls?  If so, it's
just software(TM) and it just uses addresses in the kdump kernel
sandbox, which (by definition) are not addresses that could have been
used in the old kernel that crashed.  There shouldn't be any conflict
between kdump kernel and old crashed kernel if one or both are using
swiotlb.  Once again, in *my* current situation, there's no reason to
use swiotlb in the kdump kernel because my memory range is restricted to
my crashkernel sandbox and I don't need any IOMMU address translation
capability.  

If the original kernel had been using swiotlb, then there's really no
issue, because any leftover DMAs are just writing to the old bounce
buffers anyway, and there's no driver left waiting to call
dma_unmap_single to copy the result into the real buffer.

What considerations have I missed?

Bob Montgomery
(vacation last week)