[PATCH v9 0/10] iommu/vt-d: Fix intel vt-d faults in kdump kernel

Li, ZhenHua zhen-hual at hp.com
Tue Apr 7 02:55:03 PDT 2015


On 04/07/2015 05:08 PM, Dave Young wrote:
> On 04/07/15 at 11:46am, Dave Young wrote:
>> On 04/05/15 at 09:54am, Baoquan He wrote:
>>> On 04/03/15 at 05:21pm, Dave Young wrote:
>>>> On 04/03/15 at 05:01pm, Li, ZhenHua wrote:
>>>>> Hi Dave,
>>>>>
>>>>> There may be some possibilities that the old iommu data is corrupted by
>>>>> some other modules. Currently we do not have a better solution for the
>>>>> dmar faults.
>>>>>
>>>>> But I think when this happens, we need to fix the module that corrupted
>>>>> the old iommu data. I once met a similar problem in normal kernel, the
>>>>> queue used by the qi_* functions was written again by another module.
>>>>> The fix was in that module, not in iommu module.
>>>>
>>>> It is too late, there will be no chance to save vmcore then.
>>>>
>>>> Also if it is possible to continue corrupt other area of oldmem because
>>>> of using old iommu tables then it will cause more problems.
>>>>
>>>> So I think the tables at least need some verifycation before being used.
>>>>
>>>
>>> Yes, it's a good thinking anout this and verification is also an
>>> interesting idea. kexec/kdump do a sha256 calculation on loaded kernel
>>> and then verify this again when panic happens in purgatory. This checks
>>> whether any code stomps into region reserved for kexec/kernel and corrupt
>>> the loaded kernel.
>>>
>>> If this is decided to do it should be an enhancement to current
>>> patchset but not a approach change. Since this patchset is going very
>>> close to point as maintainers expected maybe this can be merged firstly,
>>> then think about enhancement. After all without this patchset vt-d often
>>> raised error message, hung.
>>
>> It does not convince me, we should do it right at the beginning instead of
>> introduce something wrong.
>>
>> I wonder why the old dma can not be remap to a specific page in kdump kernel
>> so that it will not corrupt more memory. But I may missed something, I will
>> looking for old threads and catch up.
>
> I have read the old discussion, above way was dropped because it could corrupt
> filesystem. Apologize about late commenting.
>
> But current solution sounds bad to me because of using old memory which is not
> reliable.
>
> Thanks
> Dave
>
Seems we do not have a better solution for the dmar faults.  But I 
believe we can find out how to verify the iommu data which is located in 
old memory.

Thanks
Zhenhua




More information about the kexec mailing list