[PATCH] irqchip/gic-v3-its: Mark some in-memory data structures as 'decrypted'

Thu Dec 9 03:34:44 PST 2021

On 2021-12-09 09:10, Will Deacon wrote:
> On Wed, Dec 08, 2021 at 06:20:36PM +0000, Robin Murphy wrote:
>> On 2021-12-08 15:59, Will Deacon wrote:
>>> The GICv3 ITS driver allocates memory for its tables using alloc_pages()
>>> and performs explicit cache maintenance if necessary. On systems such
>>> as those running pKVM, where the memory encryption API is implemented,
>>> memory shared with the ITS must first be transitioned to the "decrypted"
>>> state, as it would be if allocated via the DMA API.
>>>
>>> Allow pKVM guests to interact with an ITS emulation by ensuring that the
>>> shared pages are decrypted at the point of allocation and encrypted
>>> again upon free().
>>>
>>> Cc: Thomas Gleixner <tglx at linutronix.de>
>>> Cc: Marc Zyngier <maz at kernel.org>
>>> Signed-off-by: Will Deacon <will at kernel.org>
>>> ---
>>>
>>> Although pKVM doesn't yet expose share/unshare hypercalls to the guest,
>>> this change is agnostic of the hypervisor and could be queued
>>> independently as it has no functional impact when the memory encryption
>>> API is not implemented.
>>>
>>>    drivers/irqchip/irq-gic-v3-its.c | 40 ++++++++++++++++++++++++++------
>>>    1 file changed, 33 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>> index eb0882d15366..4559b8dfb9bc 100644
>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>> @@ -27,6 +27,7 @@
>>>    #include <linux/of_pci.h>
>>>    #include <linux/of_platform.h>
>>>    #include <linux/percpu.h>
>>> +#include <linux/set_memory.h>
>>>    #include <linux/slab.h>
>>>    #include <linux/syscore_ops.h>
>>> @@ -2166,6 +2167,7 @@ static void gic_reset_prop_table(void *va)
>>>    	/* Make sure the GIC will observe the written configuration */
>>>    	gic_flush_dcache_to_poc(va, LPI_PROPBASE_SZ);
>>> +	set_memory_decrypted((unsigned long)va, LPI_PROPBASE_SZ >> PAGE_SHIFT);
>>
>> Hmm, that would seem to imply we've just done the memset() and cache clean
>> on encrypted memory... that's not right, surely?
> 
> Actually, that's exactly what we want! The *crypted state is only of concern
> to the DMA accesses, so the idea is that you clear out / initialiase the
> buffer and only then decrypt the memory for the device to access it.
> 
> The DMA code does the same sort of thing -- for example, see
> atomic_pool_expand() in kernel/dma/pool.c.

Sure, we allocate the pages, clean any dirty lines, then decrypt them. 
But the important point is that any actual initialisation happens 
*after* the decryption - the memset() is in dma_alloc_from_pool(), for 
that example. In general, changing the encryption state should be 
treated as changing the contents of the memory.

>>>    }
>>>    static struct page *its_allocate_prop_table(gfp_t gfp_flags)
>>> @@ -2183,8 +2185,10 @@ static struct page *its_allocate_prop_table(gfp_t gfp_flags)
>>>    static void its_free_prop_table(struct page *prop_page)
>>>    {
>>> -	free_pages((unsigned long)page_address(prop_page),
>>> -		   get_order(LPI_PROPBASE_SZ));
>>> +	unsigned long va = (unsigned long)page_address(prop_page);
>>> +
>>> +	set_memory_encrypted(va, LPI_PROPBASE_SZ >> PAGE_SHIFT);
>>
>> Similarly to [1], it might be worth at least pretending to consider failure
>> in these instances. (Side note, might set_memory_encrypted() logically
>> warrant a __must_check annotation?)
> 
> There's not much we can do, but it looks like the DMA allocator leaks the
> memory rather than freeing it, so I'll add that.
> 
>>> @@ -2934,6 +2944,7 @@ static int its_alloc_collections(struct its_node *its)
>>>    static struct page *its_allocate_pending_table(gfp_t gfp_flags)
>>>    {
>>>    	struct page *pend_page;
>>> +	void *va;
>>>    	pend_page = alloc_pages(gfp_flags | __GFP_ZERO,
>>>    				get_order(LPI_PENDBASE_SZ));
>>> @@ -2941,14 +2952,19 @@ static struct page *its_allocate_pending_table(gfp_t gfp_flags)
>>>    		return NULL;
>>>    	/* Make sure the GIC will observe the zero-ed page */
>>> -	gic_flush_dcache_to_poc(page_address(pend_page), LPI_PENDBASE_SZ);
>>> +	va = page_address(pend_page);
>>> +	gic_flush_dcache_to_poc(va, LPI_PENDBASE_SZ);
>>> +	set_memory_decrypted((unsigned long)va, LPI_PENDBASE_SZ >> PAGE_SHIFT);
>>
>> Again, it looks fundamentally sketchy to decrypt memory *after* already
>> doing something to it - under a "real" encryption scheme the cleaned out
>> zeros might now appear as a page full of gibberish ciphertext, which would
>> not be what you want.
>>
>> That said, aren't the pending tables the completely IMP-DEF ones that only
>> the ITS itself ever touches? As-is should we even need to "decrypt" them at
>> all?
> 
> We do, as we need the ITS to be able to access the pages. For pKVM, this
> is actually the virtual ITS emulation in the host.

Derp, sorry, somehow I completely overlooked "pKVM guests" and "ITS 
emulation" in the commit message, and for assumed this was about passing 
the physical ITS through to host EL1. It does indeed make complete sense 
for emulation, so please forgive me the brain-fart there.

>> The overall feeling here is that we're not being particularly consistent
>> about whether we're readily leaning on the assumption of "encryption" only
>> ever meaning CPU stage 2 protection here, or whether we're trying to uphold
>> the illusion of the more general notion with the ITS only being able to
>> observe "plaintext" shared memory.
> 
> Really, I've just tried to follow the way this is used for DMA: pages which
> need to be accessed by the device must first be set as decrypted, and then
> set back to encrypted before freeing back to the page allocator. This maps
> directly to what we're doing with the stage-2 (i.e. decrypt means share, and
> encrypt means unshare) but it would equally apply to real memory encryption
> schemes along the lines of what x86, power and s390 are doing.

That's fine - TBH I'd much *prefer* for all usage of the memory 
encryption APIs to be consistent and not take shortcuts based on 
implementation-specific assumptions - but in that case, please do try to 
actually follow the usage model consistently. This patch as-is would 
*not* work under AMD SME, for example.

Thanks,
Robin.