[PATCH] intel-iommu: Synchronize gcmd value with global command register

Takao Indoh indou.takao at jp.fujitsu.com
Thu Apr 4 01:48:25 EDT 2013

(2013/04/03 17:24), David Woodhouse wrote:
> On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
>> (2013/04/02 23:05), Joerg Roedel wrote:
>>> On Mon, Apr 01, 2013 at 02:45:18PM +0900, Takao Indoh wrote:
>>>> <Current flow on kdump boot>
>>>> enable_IR
>>>>     intel_enable_irq_remapping
>>>>       iommu_disable_irq_remapping  <== IRES/QIES/TES disabled here
>>>>       dmar_disable_qi              <== do nothing
>>>>       dmar_enable_qi               <== QIES enabled
>>>>       intel_setup_irq_remapping    <== IRES enabled
>>> But what we want to do here in the kdumo case is to disable translation
>>> too, right? Because the former kernel might have translation and
>>> irq-remapping enabled and the kdump kernel might be compiled without
>>> support for dma-remapping. So if we don't disable translation here too
>>> the kdump kernel is unable to do DMA.
>> Yeah, you are right. I forgot such a case.
> If you disable translation and there's some device still doing DMA, it's
> going to scribble over random areas of memory. You really want to have
> translation enabled and all the page tables *cleared*, during kexec. I
> think it's fair to insist that the secondary kernel should use the IOMMU
> if the first one did.
>> To be honest, I also expected the side effect of this patch. As I wrote
>> in the previous mail, I'm working on kdump problem with iommu, that is,
>> ongoing DMA causes DMAR fault in 2nd kernel and sometimes kdump fails
>> due to this fault.
> Here you've lost me. The DMAR fault is caught and reported, and how does
> this lead to a kdump failure? Are you using dodgy hardware that just
> keeps *trying* after an abort, and floods the system with a storm of
> DMAR faults? We've occasionally spoken about working around such a
> problem by setting a bit to make subsequent faults *silent*. Would that
> work?

There are several cases.
- DMAR fault messages floods and second kernel does not boot. Recently I
  saw similar report. https://lkml.org/lkml/2013/3/8/120
- igb driver detectes error on linkup and kdump via network fails.
- On a certain platform, though kdump itself works, PCIe error like
  Unexpected Completion is detected and it gets hardware degraded.

Takao Indoh

>>   What we have to do is stopping DMA transaction
>> before DMA-remapping is disabled in 2nd kernel.
> The IOMMU is there to stop DMA transactions. That is its *job*. :)

More information about the kexec mailing list