[PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Dave Young dyoung at redhat.com
Tue Apr 7 20:33:51 PDT 2015


On 04/07/15 at 05:55pm, Li, ZhenHua wrote:
> On 04/07/2015 05:08 PM, Dave Young wrote:
> >On 04/07/15 at 11:46am, Dave Young wrote:
> >>On 04/05/15 at 09:54am, Baoquan He wrote:
> >>>On 04/03/15 at 05:21pm, Dave Young wrote:
> >>>>On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
> >>>>>Hi Dave,
> >>>>>
> >>>>>There may be some possibilities that the old iommu data is corrupted by
> >>>>>some other modules. Currently we do not have a better solution for the
> >>>>>dmar faults.
> >>>>>
> >>>>>But I think when this happens, we need to fix the module that corrupted
> >>>>>the old iommu data. I once met a similar problem in normal kernel, the
> >>>>>queue used by the qi_* functions was written again by another module.
> >>>>>The fix was in that module, not in iommu module.
> >>>>
> >>>>It is too late, there will be no chance to save vmcore then.
> >>>>
> >>>>Also if it is possible to continue corrupt other area of oldmem because
> >>>>of using old iommu tables then it will cause more problems.
> >>>>
> >>>>So I think the tables at least need some verifycation before being used.
> >>>>
> >>>
> >>>Yes, it's a good thinking anout this and verification is also an
> >>>interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
> >>>and then verify this again when panic happens in purgatory. This checks
> >>>whether any code stomps into region reserved for kexec/kernel and corrupt
> >>>the loaded kernel.
> >>>
> >>>If this is decided to do it should be an enhancement to current
> >>>patchset but not a approach change. Since this patchset is going very
> >>>close to point as maintainers expected maybe this can be merged firstly,
> >>>then think about enhancement. After all without this patchset vt-d often
> >>>raised error message, hung.
> >>
> >>It does not convince me, we should do it right at the beginning instead of
> >>introduce something wrong.
> >>
> >>I wonder why the old dma can not be remap to a specific page in kdump kernel
> >>so that it will not corrupt more memory. But I may missed something, I will
> >>looking for old threads and catch up.
> >
> >I have read the old discussion, above way was dropped because it could corrupt
> >filesystem. Apologize about late commenting.
> >
> >But current solution sounds bad to me because of using old memory which is not
> >reliable.
> >
> >Thanks
> >Dave
> >
> Seems we do not have a better solution for the dmar faults.  But I believe
> we can find out how to verify the iommu data which is located in old memory.

That will be great, thanks.

So there's two things:
1) make sure old pg tables are right, this is what we were talking about.
2) avoid writing old memory, I suppose only dma read could corrupt filesystem,
right? So how about for any dma writes just create a scratch page in 2nd kernel
memory. Only using old page table for dma read.

Thanks
Dave



More information about the kexec mailing list