[PATCH v2] x86/sev: Fix host kdump support for SNP

Ashish Kalra Ashish.Kalra at amd.com
Mon Sep 9 16:33:32 PDT 2024


Hello Sean,

On 9/4/2024 5:23 PM, Sean Christopherson wrote:
>> On Wed, Sep 04, 2024, Ashish Kalra wrote:
>>> On 9/4/2024 2:54 PM, Michael Roth wrote:
>>>>   - Sean inquired about making the target kdump kernel more agnostic to
>>>>     whether or not SNP_SHUTDOWN was done properly, since that might
>>>>     allow for capturing state even for edge cases where we can't go
>>>>     through the normal cleanup path. I mentioned we'd tried this to some
>>>>     degree but hit issues with the IOMMU, and when working around that
>>>>     there was another issue but I don't quite recall the specifics.
>>>>     Can you post a quick recap of what the issues are with that approach
>>>>     so we can determine whether or not this is still an option?
>>>
>>> Yes, i believe without SNP_SHUTDOWN, early_enable_iommus() configure the
>>> IOMMUs into an IRQ remapping configuration causing the crash in
>>> io_apic.c::check_timer().
>>>
>>> It looks like in this case, we enable IRQ remapping configuration *earlier*
>>> than when it needs to be enabled and which causes the panic as indicated:
>>>
>>> EMERGENCY [    1.376701] Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC
>>
>> I assume the problem is that IOMMU setup fails in the kdump kernel, not that it
>> does the setup earlier.  That's that part I want to understand.

>Here is a deeper understanding of this issue:

>It looks like this is happening: when we do SNP_SHUTDOWN without IOMMU_SNP_SHUTDOWN during panic, kdump boot runs with iommu snp 
>enforcement still enabled and IOMMU completion wait buffers (cwb) still locked and exclusivity still setup on those, and then in 
>kdump boot, we allocate new iommu completion wait buffers and try to use them, but we get a iommu command completion wait time-out,
>due to the locked in (prev) completion wait buffers, the newly allocated completion wait buffers are not getting used for iommu 
>command execution and completion indication :

>[    1.711588] AMD-Vi: early_amd_iommu_init: irq remaping enabled
>[    1.718972] AMD-Vi: in early_enable_iommus
>[    1.723543] AMD-Vi: Translation is already enabled - trying to copy translation structures
>[    1.733333] AMD-Vi: Copied DEV table from previous kernel.
>[    1.739566] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.11.0-rc6-next-20240903-snp-host-f2a41ff576cc+ #78
>[    1.750920] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM100AB 10/17/2022
>[    1.759950] Call Trace:
>[    1.762677]  <TASK>
>[    1.765018]  dump_stack_lvl+0x70/0x90
>[    1.769109]  dump_stack+0x14/0x20
>[    1.772809]  iommu_completion_wait.part.0.isra.0+0x38/0x140
>[    1.779035]  amd_iommu_flush_all_caches+0xa3/0x240
>[    1.784383]  ? memcpy_toio+0x25/0xc0
>[    1.788372]  early_enable_iommus+0x151/0x880
>[    1.793140]  state_next+0xe67/0x22b0
>[    1.797130]  ? __raw_callee_save___native_queued_spin_unlock+0x19/0x30
>[    1.804421]  amd_iommu_enable+0x24/0x60
>[    1.808702]  irq_remapping_enable+0x1f/0x50
>[    1.813371]  enable_IR_x2apic+0x155/0x260
>[    1.817848]  x86_64_probe_apic+0x13/0x70
>[    1.822226]  apic_intr_mode_init+0x39/0xf0
>[    1.826799]  x86_late_time_init+0x28/0x40
>[    1.831266]  start_kernel+0x6ad/0xb50
>[    1.835436]  x86_64_start_reservations+0x1c/0x30
>[    1.840591]  x86_64_start_kernel+0xbf/0x110
>[    1.845256]  ? setup_ghcb+0x12/0x130
>[    1.849247]  common_startup_64+0x13e/0x141
>[    1.853821]  </TASK>
>[    2.077901] AMD-Vi: Completion-Wait loop timed out
>...

>And because of this the iommu command, in this case which is for enabling irq remapping does not succeed and that eventually causes 
>timer to fail without irq remapping support enabled.

>Once IOMMU SNP support is enabled, to enforce RMP enforcement the IOMMU completion wait buffers are setup as read-only and 
>exclusivity set on these and additionally the IOMMU registers used to mark the exclusivity on the store addresses associated with 
>these CWB is also locked. This enforcement of SNP in the IOMMU is only disabled with the IOMMU_SNP_SHUTDOWN parameter with 
>SNP_SHUTDOWN_EX command.

>From the AMD IOMMU specifications:

>2.12.2.2 SEV-SNP COMPLETION_WAIT Store Restrictions On systems that are SNP-enabled, the store address associated with any host 
>COMPLETION_WAIT command (s=1) is restricted. The Store Address must fall within the address range specified by the Completion Store 
>Base and Completion Store Limit registers. When the system is SNP-enabled, the memory within this range will be marked in the RMP 
>using a special immutable state by the PSP. This memory region will be readable by the CPU but not writable.

>2.12.2.3 SEV-SNP Exclusion Range Restrictions The exclusion range feature is not supported on systems that are SNP-enabled. 
>Additionally, the Exclusion Base and Exclusion Range Limit registers are re-purposed to act as the Completion Store Base and Limit 
>registers.

>Therefore, we need to disable IOMMU SNP enforcement with SNP_SHUTDOWN_EX command before the kdump kernel starts booting as we can't 
>setup IOMMU CWB again in kdump as SEV-SNP exclusion base and range limit registers are locked as IOMMU SNP support is still enabled.

>I tried to use the previous kernel's CWB (cmd_sem) as below: 

>static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>{
>        if (!is_kdump_kernel())
>                iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
>        else {
>                if (check_feature(FEATURE_SNP)) {
>                        u64 cwwb_sem_paddr;
>
>                        cwwb_sem_paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET);
>                        iommu->cmd_sem = iommu_phys_to_virt(cwwb_sem_paddr);
>        		return iommu->cmd_sem ? 0 : -ENOMEM;
>                }
>        }
>
>        return iommu->cmd_sem ? 0 : -ENOMEM;
>}

>I tried this, but this fails as i believe the kdump kernel will not have these previous kernel's allocated IOMMU CWB in the kernel 
>direct map : 

>[    1.708959] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.714327] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100805000, cmd_sem_vaddr 0xffff9f5340805000
>[    1.726309] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.731676] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050051000, cmd_sem_vaddr 0xffff9f6290051000
>[    1.743742] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.749109] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050052000, cmd_sem_vaddr 0xffff9f6290052000
>[    1.761177] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.766542] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100808000, cmd_sem_vaddr 0xffff9f5340808000
>[    1.778509] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.783877] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050053000, cmd_sem_vaddr 0xffff9f6290053000
>[    1.795942] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.801300] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x100809000, cmd_sem_vaddr 0xffff9f5340809000
>[    1.813268] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.818636] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x1050054000, cmd_sem_vaddr 0xffff9f6290054000
>[    1.830701] AMD-Vi: in alloc_cwwb_sem kdump kernel
>[    1.836069] AMD-Vi: in alloc_cwwb_sem SNP feature enabled, cmd_sem_paddr 0x10080a000, cmd_sem_vaddr 0xffff9f534080a000
>[    1.848039] AMD-Vi: early_amd_iommu_init: irq remaping enabled
>[    1.855431] AMD-Vi: in early_enable_iommus
>[    1.860032] AMD-Vi: Translation is already enabled - trying to copy translation structures
>[    1.869812] AMD-Vi: Copied DEV table from previous kernel.
>[    1.875958] AMD-Vi: in build_completion_wait, paddr = 0x100805000
>[    1.882766] BUG: unable to handle page fault for address: ffff9f5340805000
>[    1.890441] #PF: supervisor read access in kernel mode
>[    1.896177] #PF: error_code(0x0000) - not-present page

>....

>I think that memremap(..,..,MEMREMAP_WB) will also fail for the same reason as memremap(.., MEMREMAP_WB) for the RAM region will 
>again use the kernel directmap.

To follow up on this:

I am able to use memremap() to map the previous kernel's allocated CWB buffers and try to reuse the same CWB buffers in the
kdump kernel, obviously, memremap() does not return a direct pointer to kernel directmap as the previous kernel's CWB buffers 
will be in a RAM address which is not directly mapped into kdump kernel's directmap.
 
And these memremap() mappings seem to be correct, because if i do a memset(0) on these, i get a RMP #PF violation due
to these buffers being setup as RO in the RMP table, so that means that memremap() seems to have done the mapping correctly.

I am getting inconsistent IOMMU command completion wait timeout's with these reused CWB buffers (which are used as
semaphores to indicate IOMMU command completions) and i am still debugging those issues.

Thanks,
Ashish



More information about the kexec mailing list