PMD update corruption (sync question)

Sun Mar 1 21:58:36 PST 2015

Test kernels running with an explicit DSB in all PTE update cases now running overnight. Just in case.

-- 
Computer Architect | Sent from my #ARM Powered Mobile Device

On Mar 1, 2015 9:10 PM, Jon Masters <jcm at redhat.com> wrote:
>
> Hi Folks, 
>
> I've pulled a couple of all nighters reproducing this hard to trHi Folks,

I've pulled a couple of all nighters reproducing this hard to trigger
issue and got some data. It looks like the high half of the (note always
userspace) PMD is all zeros or all ones, which makes me wonder if the
logic in update_mmu_cache might be missing something on AArch64.

When a kernel is built with 64K pages and 2 levels the PMD is
effectively updated using set_pte_at, which explicitly won't perform a
DSB if the address is userspace (it expects this to happen later, in
update_mmu_cache as an example.

Can anyone think of an obvious reason why we might not be properly
flushing the changes prior to them being consumed by a hardware walker?

Jon.

On 02/27/2015 07:42 AM, Jon Masters wrote:
> On 09/26/2014 10:03 AM, Steve Capper wrote:
> 
>> This series implements general forms of get_user_pages_fast and
>> __get_user_pages_fast in core code and activates them for arm and arm64.
>>
>> These are required for Transparent HugePages to function correctly, as
>> a futex on a THP tail will otherwise result in an infinite loop (due to
>> the core implementation of __get_user_pages_fast always returning 0).
>>
>> Unfortunately, a futex on THP tail can be quite common for certain
>> workloads; thus THP is unreliable without a __get_user_pages_fast
>> implementation.
>>
>> This series may also be beneficial for direct-IO heavy workloads and
>> certain KVM workloads.
>>
>> I appreciate that the merge window is coming very soon, and am posting
>> this revision on the off-chance that it gets the nod for 3.18. (The changes
>> thus far have been minimal and the feedback I've got has been mainly
>> positive).
> 
> Head's up: these patches are currently implicated in a rare-to-trigger
> hang that we are seeing on an internal kernel. An extensive effort is
> underway to confirm whether these are the cause. Will followup.
> 
> Jon.
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont at kvack.org"> email at kvack.org </a>
>