Re: ❌ FAIL: Test report for kernel 5.13.0-rc7 (arm-next, 8ab9b1a9)

Tue Jun 29 10:27:57 PDT 2021

On 2021-06-29 18:03, Veronika Kabatova wrote:
> On Tue, Jun 29, 2021 at 5:15 PM Robin Murphy <robin.murphy at arm.com> wrote:
>>
>> On 2021-06-29 15:44, Lorenzo Pieralisi wrote:
>>> On Tue, Jun 29, 2021 at 12:48:14PM +0100, Robin Murphy wrote:
>>>> [ +ACPI audience ]
>>>>
>>>> On 2021-06-25 12:15, Robin Murphy wrote:
>>>>> On 2021-06-25 12:09, Catalin Marinas wrote:
>>>>>> On Fri, Jun 25, 2021 at 12:02:52PM +0100, Robin Murphy wrote:
>>>>>>> On 2021-06-25 10:52, Veronika Kabatova wrote:
>>>>>>> [...]
>>>>>>>>>>            ❌ stress: stress-ng
>>>>>>>>>
>>>>>>>>> Oh no, this looks like another alignment fault in memcpy:
>>>>>>>>>
>>>>>>>>> [13330.651903] Unable to handle kernel paging request at
>>>>>>>>> virtual address ffff8000534705ff
>>>>>>>>> [13330.651914] Mem abort info:
>>>>>>>>> [13330.651918]   ESR = 0x96000021
>>>>>>>>> [13330.651922]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>>>> [13330.651928]   SET = 0, FnV = 0
>>>>>>>>> [13330.651931]   EA = 0, S1PTW = 0
>>>>>>>>> [13330.651933]   FSC = 0x21: alignment fault
>>>>>>>>> [13330.651938] Data abort info:
>>>>>>>>> [13330.651940]   ISV = 0, ISS = 0x00000021
>>>>>>>>> [13330.651941]   CM = 0, WnR = 0
>>>>>>>>> [13330.651943] swapper pgtable: 4k pages, 48-bit VAs,
>>>>>>>>> pgdp=00000000f3e6b000
>>>>>>>>> [13330.651945] [ffff8000534705ff] pgd=1000008ffcfff003,
>>>>>>>>> p4d=1000008ffcfff003, pud=100000088e57d003,
>>>>>>>>> pmd=10000008d0aeb003, pte=006800008021370f
>>>>>>>>> [13330.651956] Internal error: Oops: 96000021 [#1] SMP
>>>>>>>>> [13330.651961] Modules linked in: unix_diag binfmt_misc
>>>>>>>>> fcrypt sm4_generic crc32_generic md4 michael_mic
>>>>>>>>> nhpoly1305_neon nhpoly1305 poly1305_generic libpoly1305
>>>>>>>>> poly1305_neon rmd160 sha3_generic sm3_generic
>>>>>>>>> streebog_generic wp512 blowfish_generic blowfish_common
>>>>>>>>> cast5_generic des_generic libdes chacha_generic
>>>>>>>>> chacha_neon libchacha camellia_generic cast6_generic
>>>>>>>>> cast_common serpent_generic twofish_generic
>>>>>>>>> twofish_common dm_thin_pool dm_persistent_data
>>>>>>>>> dm_bio_prison nvme nvme_core ipmi_watchdog ipmi_poweroff
>>>>>>>>> loop tun af_key crypto_user scsi_transport_iscsi
>>>>>>>>> xt_multiport ip_gre ip_tunnel gre overlay xt_CONNSECMARK
>>>>>>>>> xt_SECMARK nft_counter xt_state xt_conntrack nft_compat
>>>>>>>>> ah6 ah4 nft_objref nft_ct nf_conntrack nf_defrag_ipv6
>>>>>>>>> nf_defrag_ipv4 nf_tables nfnetlink jfs sctp
>>>>>>>>> ip6_udp_tunnel udp_tunnel dm_log_writes dm_flakey rfkill
>>>>>>>>> mlx5_ib ib_uverbs ib_core sunrpc coresight_etm4x
>>>>>>>>> i2c_smbus coresight_replicator coresight_tpiu
>>>>>>>>> coresight_tmc joydev mlx5_core acpi_ipmi psample
>>>>>>>>> ipmi_ssif mlxfw !
>>>>>>>>>      ipmi_devintf
>>>>>>>>> [13330.652076]  ipmi_msghandler coresight_funnel
>>>>>>>>> thunderx2_pmu coresight vfat fat fuse zram ip_tables xfs
>>>>>>>>> ast crct10dif_ce i2c_algo_bit ghash_ce drm_vram_helper
>>>>>>>>> drm_kms_helper syscopyarea sysfillrect sysimgblt
>>>>>>>>> fb_sys_fops cec drm_ttm_helper ttm drm gpio_xlp
>>>>>>>>> i2c_xlp9xx uas usb_storage aes_neon_bs [last unloaded:
>>>>>>>>> nvmet]
>>>>>>>>> [13330.652123] CPU: 115 PID: 188446 Comm: stress-ng
>>>>>>>>> Tainted: G           OEL    5.13.0-rc7 #1
>>>>>>>>> [13330.652129] Hardware name: HPE Apollo 70
>>>>>>>>> /C01_APACHE_MB         , BIOS L50_5.13_1.15 05/08/2020
>>>>>>>>> [13330.652133] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
>>>>>>>>> [13330.652139] pc : __memcpy+0x168/0x250
>>>>>>>>> [13330.652150] lr : memory_read_from_buffer+0x58/0x80
>>>>>>>>> [13330.652161] sp : ffff800063ef3c20
>>>>>>>>> [13330.652163] x29: ffff800063ef3c20 x28:
>>>>>>>>> ffff0008b1380000 x27: 0000000000000000
>>>>>>>>> [13330.652170] x26: 0000000000000000 x25:
>>>>>>>>> 0000000000000000 x24: ffff00080a960fe0
>>>>>>>>> [13330.652176] x23: ffff800063ef3d28 x22:
>>>>>>>>> 000000000000063f x21: ffff800063ef3c88
>>>>>>>>> [13330.652181] x20: 000000000000063f x19:
>>>>>>>>> 000000000000063f x18: 0000000000000000
>>>>>>>>> [13330.652186] x17: 0000000000000000 x16:
>>>>>>>>> 0000000000000000 x15: 0000000000000000
>>>>>>>>> [13330.652191] x14: 0000000000000000 x13:
>>>>>>>>> 0000000000000000 x12: 0000000000000000
>>>>>>>>> [13330.652196] x11: 0000000000000000 x10:
>>>>>>>>> 0000000000000000 x9 : 0000000000000000
>>>>>>>>> [13330.652200] x8 : 0000000000000000 x7 :
>>>>>>>>> 0000000000000000 x6 : 0000000000000000
>>>>>>>>> [13330.652206] x5 : ffff000d0fb0063f x4 :
>>>>>>>>> ffff80005347063f x3 : ffff000d0fb005c0
>>>>>>>>> [13330.652212] x2 : ffffffffffffffef x1 :
>>>>>>>>> ffff800053470600 x0 : ffff000d0fb00000
>>>>>>>>> [13330.652218] Call trace:
>>>>>>>>> [13330.652221]  __memcpy+0x168/0x250
>>>>>>>>> [13330.652225]  acpi_data_show+0x5c/0x8c
>>>>>>>>> [13330.652232]  sysfs_kf_bin_read+0x78/0xa0
>>>>>>>>> [13330.652238]  kernfs_file_read_iter+0x9c/0x1a4
>>>>>>>>> [13330.652241]  kernfs_fop_read_iter+0x34/0x50
>>>>>>>>> [13330.652244]  new_sync_read+0xdc/0x154
>>>>>>>>> [13330.652253]  vfs_read+0x158/0x1e4
>>>>>>>>> [13330.652260]  ksys_read+0x64/0xec
>>>>>>>>> [13330.652266]  __arm64_sys_read+0x28/0x34
>>>>>>>>> [13330.652273]  invoke_syscall+0x50/0x120
>>>>>>>>> [13330.652280]  el0_svc_common.constprop.0+0x4c/0xd4
>>>>>>>>> [13330.652284]  do_el0_svc+0x30/0x9c
>>>>>>>>> [13330.652286]  el0_svc+0x2c/0x54
>>>>>>>>> [13330.652294]  el0t_64_sync_handler+0x1a4/0x1b0
>>>>>>>>> [13330.652296]  el0t_64_sync+0x19c/0x1a0
>>>>>>>>> [13330.652303] Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e)
>>>>>>>>> [13330.652307] ---[ end trace 227d4380f57145d4 ]---
>>>>>>>>>
>>>>>>>>> So maybe this issue isn't limited to weird modules, after all...
>>>>>>>>>
>>>>>>>>
>>>>>>>> It ran on the machine from the same set that we were able to reproduce
>>>>>>>> it on previously. If you or anyone else have an idea on how
>>>>>>>> to stabilize
>>>>>>>> the reproducibility or have a debug patch we'll be happy to try it.
>>>>>>>
>>>>>>> Possibly it depends on the individual machines' firmware exactly how the
>>>>>>> relevant bits of their ACPI tables are aligned in memory?
>>>>>>>
>>>>>>> I've started digging into that callstack - it may not be a
>>>>>>> "weird module"
>>>>>>> but it's definitely crusty ACPI code... a238317ce818 ("ACPI: Clean up
>>>>>>> acpi_os_map/unmap_memory() to eliminate __iomem.") looks frankly a bit
>>>>>>> questionable in its decision to blithely cast away __iomem, but then the
>>>>>>> rationale in aafc65c731fe ("ACPI: add arm64 to the platforms that use
>>>>>>> ioremap") seems particularly dubious on top of that (especially
>>>>>>> given this
>>>>>>> end result).
>>>>>>>
>>>>>>> At a wild guess, I'm wondering if this may be sufficient:
>>>>>>>
>>>>>>> ----->8-----
>>>>>>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>>>>>>> index 327e1b4eb6b0..f5d26b102fbe 100644
>>>>>>> --- a/drivers/acpi/osl.c
>>>>>>> +++ b/drivers/acpi/osl.c
>>>>>>> @@ -277,7 +277,7 @@ acpi_map_lookup_virt(void __iomem *virt,
>>>>>>> acpi_size size)
>>>>>>>            return NULL;
>>>>>>>     }
>>>>>>>
>>>>>>> -#if defined(CONFIG_IA64) || defined(CONFIG_ARM64)
>>>>>>> +#if defined(CONFIG_IA64)
>>>>>>>     /* ioremap will take care of cache attributes */
>>>>>>>     #define should_use_kmap(pfn)   0
>>>>>>>     #else
>>>>>>> -----8<-----
>>>>>>
>>>>>> I thought the same but shouldn't acpi_os_ioremap() map it with the right
>>>>>> attributes? It uses the EFI maps to check what kind of memory this is.
>>>>>
>>>>> Oh crikey, I missed that branch of the rabbit hole... I guess that must
>>>>> mean that the tables being poked here are *not* covered by the EFI
>>>>> memory map, so page_is_ram() is unlikely to help either :(
>>>>
>>>> After picking through the UEFI spec I think I've now got a clearer picture
>>>> of what's happening, but I'm not sure where it goes from here...
>>>>
>>>> The spec implies that it *is* legitimate for runtime-loaded ACPI tables to
>>>> lie outside the EFI memory map, and that case they must be assumed to be
>>>> uncached, so the behaviour of acpi_os_ioremap() is correct.
>>>
>>> I'd agree with the reasoning, it would be good to pinpoint whether
>>> that's what actually triggers the issue.
>>>
>>> I'd like to replicate it if possible (it is TX2 HW but firmware
>>> config is likely to differ from the HW I have at hand), the
>>> test command line that triggers the fault would be useful as
>>> a starting point.
>>>
> 
> The failure is always triggered during stress testing, see source at
> 
> https://gitlab.com/cki-project/kernel-tests/-/tree/main/stress/stress-ng
> 
> The runtest file is specific to run in our lab, but all it does is running
> subsets of the upstream test (see the "*.stressors" files). Upstream
> test version is V0.12.05 and the version wasn't changed since long
> before we started hitting the problem. The failures were observed on
> both Fedora 33 and 34 releases, I don't think the distro choice matters
> but mentioning it just in case.
> 
> It doesn't reproduce 100%, anytime we tried to reproduce it on purpose
> we couldn't trigger the issue. As if it knew we're watching...

Ah, from that I can only assume that this must be stress-ng's --sysfs 
test reading things at random, so not only would it have to be on a 
machine whose firmware presents the right thing in the right way but the 
random test conditions would also have to line up to poke it the "right" 
(wrong) way too.

As a temporary workaround for the CI flakes, might it be possible to 
configure stress-ng to stay away from just these ACPI "data" files?

Robin.

>>> Furthermore, is this a v5.13-rc* regression ? If so it would be
>>> good to bisect it - I can't recollect arm64 changes that could
>>> have introduced this regression in the last cycle but I may have
>>> missed something.
>>
>> The actual change which has brought this to light is the update to
>> arm64's memcpy() routine for 5.13 - the new version is more aggressive
>> at making unaligned loads from the source buffer, so now triggers
>> alignment faults more readily when (wrongly) used on iomem mappings in
>> places that were getting away with it by chance under the previous
>> implementation (see also [1], for example).
>>
> 
> We dug into the history of runs, the first record we have is from the
> mainline commit eb6bbacc46720b8b from April 28. The previously tested
> commit fafe1e39ed213221c0bce doesn't hit any problems when running
> stress-ng. Unfortunately we don't have logs from that first failed run
> anymore as they are only saved for 6 weeks, the first logs we still have
> are from May 11 from mainline commit c90b1834703f13b3a:
> 
> https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2021/05/11/301024644/build_aarch64_redhat:1253670447/tests/9969720_aarch64_2_console.log
> 
> 
> Veronika
> 
>> Thanks,
>> Robin.
>>
>> [1]
>> https://lore.kernel.org/linux-arm-kernel/20210608153344.3813661-1-narmstrong@baylibre.com/
>>
>>>> Given the definition of uncached for arm64 memory types though, that
>>>> means that callers of acpi_os_map_memory() still have to be prepared
>>>> to get an __iomem pointer back even if they know they're mapping a
>>>> table rather than some random bit of MMIO for an AML method.
>>>>
>>>> Therefore in this case it seems the blame lies partway between
>>>> acpi_os_map_memory() for casting away __iomem and acpi_data_show() for
>>>> letting an arbitrary offset lead to an arbitrarily-aligned memcpy(), but I
>>>> don't know what the best way to fix it is. Either way I've satisfied myself
>>>> that it's not an issue with the arm64 code itself - I do wonder whether this
>>>> might also be a problem on IA-64 given ACPI_MISALIGNMENT_NOT_SUPPORTED, and
>>>> I guess RISC-V may have alignment concerns as well.
>>>
>>> Yes agreed but see above, this code has been there for aeons if it
>>> is a v5.13-rc* regression it must be something else that actually
>>> triggered it (test/FW config).
>>>
>>> Thanks for looking into this.
>>>
>>> Lorenzo
>>>
>>
>