[PATCH v3 5/7] ARM: KVM: rework HYP page table freeing
Christoffer Dall
cdall at cs.columbia.edu
Thu Apr 18 11:37:37 EDT 2013
On Thu, Apr 18, 2013 at 12:13 AM, Marc Zyngier <marc.zyngier at arm.com> wrote:
> On Wed, 17 Apr 2013 22:23:21 -0700, Christoffer Dall
> <cdall at cs.columbia.edu> wrote:
>> On Mon, Apr 15, 2013 at 1:00 AM, Marc Zyngier <marc.zyngier at arm.com>
> wrote:
>>
>>> On Sun, 14 Apr 2013 23:51:55 -0700, Christoffer Dall
>>> <cdall at cs.columbia.edu> wrote:
>>> > On Fri, Apr 12, 2013 at 8:18 AM, Marc Zyngier <marc.zyngier at arm.com>
>>> wrote:
>>> >> There is no point in freeing HYP page tables differently from
> Stage-2.
>>> >> They now have the same requirements, and should be dealt with the
> same
>>> >> way.
>>> >>
>>> >> Promote unmap_stage2_range to be The One True Way, and get rid of a
>>> >> number
>>> >> of nasty bugs in the process (goo thing we never actually called
>>> >> free_hyp_pmds
>>> >
>>> > could you remind me, did you already point out these nasty bugs
>>> > somewhere or did we discuss them in an older thread?
>>>
>>> No, I decided it wasn't worth the hassle when I spotted them (specially
>>> as
>>> we're moving away from section mapped idmap)... But for the record:
>>>
>>> <quote>
>>> static void free_hyp_pgd_entry(pgd_t *pgdp, unsigned long addr)
>>> {
>>> pgd_t *pgd;
>>> pud_t *pud;
>>> pmd_t *pmd;
>>>
>>> pgd = pgdp + pgd_index(addr);
>>> pud = pud_offset(pgd, addr);
>>>
>>> if (pud_none(*pud))
>>> return;
>>> BUG_ON(pud_bad(*pud));
>>>
>>> pmd = pmd_offset(pud, addr);
>>> free_ptes(pmd, addr);
>>> pmd_free(NULL, pmd);
>>> ^^^^^^^^^^^^^^^^^^^^<-- BUG_ON(pmd not page aligned)
>>>
>>
>> This would never be non-page-aligned for Hyp page tables where PMDs are
>> always 4K in size, that was the assumption.
>
> Well, look at the pmd variable. As the result of pmd_offset(), it gives
> you a pointer to a PMD *entry*, not to a PMD *page*. This entry can be
> anywhere in that page, depending on the 2M section in a 1GB table.
>
>>
>>> pud_clear(pud);
>>> }
>>> </quote>
>>>
>>> Now, if you decide to fix the above by forcing the page alignment:
>>>
>>> <quote>
>>> static void free_ptes(pmd_t *pmd, unsigned long addr)
>>> {
>>> pte_t *pte;
>>> unsigned int i;
>>>
>>> for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
>>> ^^^^^^^^^^^^^<-- start freeing memory outside of the
>>> (unaligned) pmd...
>>>
>>
>> freeing what outside the pmd? I don't see this.
>
> Remember the above (PMD entry vs PMD page)? This code assumes it gets a
> PMD page. To work properly, it would read:
> for (i = pmd_index(addr); i < PTRS_PER_PMDs; ....
>
ok, but prior to the whole trampoline stuff it was never called that
way. But this code was definitely written to be called in a much too
specific situation.
>>
>>> if (!pmd_none(*pmd) && pmd_table(*pmd)) {
>>> pte = pte_offset_kernel(pmd, addr);
>>> pte_free_kernel(NULL, pte);
>>> }
>>> pmd++;
>>> }
>>> }
>>> </quote>
>>>
>>> Once you've fixed that as well, you end up noticing that if you have
> PTEs
>>> pointed to by the same PMD, you need to introduce some refcounting if
> you
>>> need to free one PTE and not the others.
>>>
>>
>> huh? the code always freed all the hyp tables and there would be no
> sharing
>> of ptes across multiple pmds. I'm confused.
>
> Not anymore. In non HOTPLUG_CPU case, we free the trampoline page from the
> runtime HYP page tables (it is worth it if we had to use a bounce page). If
> you free an entire PMD, odds are you'll also unmap some of the HYP code
> (been there...).
>
yes, that was not supported before, it was merely a teardown mechanism
for the error path (and module unloading back when we had that).
>> For the record, I like your patch and we should definitely only have one
>> path of allocating and freeing the tables, but if my code was buggy I
> want
>> to learn from my mistakes and know exactly what went bad.
>
> Well, I hope I made clear what the problem was. The main reason we didn't
> notice this is because this code was only called on the error path, which
> is mostly untested.
>
I think it actually worked in the way we were calling it, certainly
when I was using kvm/arm as a module and unloading the module things
worked just fine. But yeah, way too fragile.
Thanks for taking the time to explain.
>>>
>>> At this point, I had enough, and decided to reuse what we already had
>>> instead of reinventing the wheel.
>>
>>
>>> > nit: s/goo/good/
>>> >
>>> >> before...).
>>>
>>> Will fix.
>>>
>>
>> I made the adjustment in my queue so you don't need to send out a new
>> series, unless you want to do that for other reasons.
>
> No particular reason, no. Unless people have additional comments that
> would require a new version of the series.
>
> Thanks,
>
> M.
> --
> Fast, cheap, reliable. Pick two.
More information about the linux-arm-kernel
mailing list