[PATCH] amd iommu: force flush of iommu prior during shutdown
Eric W. Biederman
ebiederm at xmission.com
Wed Mar 31 14:43:33 EDT 2010
Vivek Goyal <vgoyal at redhat.com> writes:
> On Wed, Mar 31, 2010 at 11:24:17AM -0400, Neil Horman wrote:
>> Flush iommu during shutdown
>> When using an iommu, its possible, if a kdump kernel boot follows a primary
>> kernel crash, that dma operations might still be in flight from the previous
>> kernel during the kdump kernel boot. This can lead to memory corruption,
>> crashes, and other erroneous behavior, specifically I've seen it manifest during
>> a kdump boot as endless iommu error log entries of the form:
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
>> address=0x000000000245a0c0 flags=0x0070]
>> Followed by an inability to access hard drives, and various other resources.
>> I've written this fix for it. In short it just forces a flush of the in flight
>> dma operations on shutdown, so that the new kernel is certain not to have any
>> in-flight dmas trying to complete after we've reset all the iommu page tables,
>> causing the above errors. I've tested it and it fixes the problem for me quite
> CCing Eric also.
> Neil, this is interesting. In the past we noticed similar issues,
> especially on PPC. But I was told that we could not clear the iommu
> mapping entries as we had no control on in flight DMA and if a DMA comes
> later after clearing an entry and entry is not present, it is an error.
Which is exactly what the reported error looks like.
> Hence one of the suggestions was not to clear iommu mapping entries but
> reserve some for kdump operation and use those in kdump kernel.
> So this call amd_iommu_flush_all_devices() will be able to tell devices
> that don't do any more DMAs and hence it is safe to reprogram iommu
> mapping entries.
I took a quick look at our crash shutdown path and I am very disappointed
with the way it has gone lately.
Regardless of the merits flushing an iommu versus doing things with an
iommu I don't see how we are in any better position in the crash kernel
then we are in the kdump kernel. So what are we doing touching it
in the kdump path?
Likewise for the hpet.
We also seem to be at a point where if we have a tsc we don't need to
enable interrupts until we are ready to enable them in native mode. And
except for a few weird SMP 486's tsc and apics came in at the same time.
So my grumpy code review says we should gut crash.c (like below) and
fix the initialization paths so they do the right thing.
arch/x86/kernel/crash.c | 18 ------------------
1 files changed, 0 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index a4849c1..8e33c50 100644
@@ -22,12 +22,10 @@
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
@@ -56,15 +54,11 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
static void kdump_nmi_shootdown_cpus(void)
@@ -96,17 +90,5 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
More information about the kexec