[bug report] kernel BUG at mm/hugetlb.c:5868! triggered by blktests nvme/tcp nvme/029

Thu Nov 20 07:18:13 PST 2025

I did more testing today, and found that this issue cannot be
reproduced if I don't set the nr_hugepages

https://github.com/linux-blktests/blktests/blob/master/tests/nvme/029#L71


On Wed, Nov 19, 2025 at 3:49 PM Yi Zhang <yi.zhang at redhat.com> wrote:
>
> HI Jens
>
> It's not one regression issue, I just found I already reported it on
> Jun this year, and the BUG was triggered during "nvme write"
> operation[1].
>
> [1]
> + test_user_io /dev/nvme0n1 511 1024
> + local disk=/dev/nvme0n1
> + local start=511
> + local cnt=1024
> + local bs size img img1
> ++ blockdev --getss /dev/nvme0n1
> + bs=512
> + size=524288
> ++ mktemp /tmp/blk_img_XXXXXX
> + img=/tmp/blk_img_4aWO9O
> ++ mktemp /tmp/blk_img_XXXXXX
> + img1=/tmp/blk_img_mFMZKv
> + dd if=/dev/urandom of=/tmp/blk_img_4aWO9O bs=512 count=1024 status=none
> + (( cnt-- ))
> + nvme write --start-block=511 --block-count=1023 --data-size=524288
> --data=/tmp/blk_img_4aWO9O /dev/nvme0n1
> failed to read data buffer from input file Bad address
>
> [2]
> https://lore.kernel.org/linux-block/CAHj4cs-C76gc67PhHGAE5dak-9AO4gAmRO=yEReWcm7Y+u6kHA@mail.gmail.com/
>
>
> On Wed, Nov 19, 2025 at 10:42 AM Yi Zhang <yi.zhang at redhat.com> wrote:
> >
> > On Tue, Nov 18, 2025 at 10:57 PM Jens Axboe <axboe at kernel.dk> wrote:
> > >
> > > On 11/18/25 7:51 AM, Yi Zhang wrote:
> > > > Hi
> > > >
> > > > The following BUG was triggered during CKI tests. Please help check it
> > > > and let me know if you need any info/test for it. Thanks.
> > > >
> > > > commit: for-next - 5674abb82e2b
> > > >
> > > > [ 1486.502840] run blktests nvme/029 at 2025-11-17 21:34:13
> > > > [ 1486.551942] loop0: detected capacity change from 0 to 2097152
> > > > [ 1486.563593] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> > > > [ 1486.580648] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> > > > [ 1486.627702] nvmet: Created nvm controller 1 for subsystem
> > > > blktests-subsystem-1 for NQN
> > > > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > > > [ 1486.631269] nvme nvme0: creating 32 I/O queues.
> > > > [ 1486.639689] nvme nvme0: mapped 32/0/0 default/read/poll queues.
> > > > [ 1486.655324] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> > > > 127.0.0.1:4420, hostnqn:
> > > > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > > > [ 1487.242297] ------------[ cut here ]------------
> > > > [ 1487.242945] kernel BUG at mm/hugetlb.c:5868!
> > > > [ 1487.243628] Oops: invalid opcode: 0000 [#1] SMP NOPTI
> > > > [ 1487.243923] CPU: 3 UID: 0 PID: 56899 Comm: nvme Not tainted
> > > > 6.18.0-rc5 #1 PREEMPT(lazy)
> > > > [ 1487.244450] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 03/14/2018
> > > > [ 1487.244807] RIP: 0010:__unmap_hugepage_range+0x79b/0x7f0
> > > > [ 1487.245098] Code: 89 ef 48 89 c6 e8 25 90 ff ff 48 8b 3c 24 e8 fc
> > > > c3 df 00 e9 d0 fb ff ff 0f 0b 49 8b 50 30 48 f7 d2 4c 85 e2 0f 84 ec
> > > > f8 ff ff <0f> 0b 0f 0b 65 48 8b 05 f1 4e 10 03 48 8b 10 f7 c2 00 00 00
> > > > 10 74
> > > > [ 1487.246461] RSP: 0018:ffffd4108e577a20 EFLAGS: 00010206
> > > > [ 1487.246784] RAX: 0000000000400000 RBX: 0000000000000000 RCX: 0000000000000009
> > > > [ 1487.247559] RDX: 00000000001fffff RSI: ffff8ca241389800 RDI: ffffd4108e577b98
> > > > [ 1487.248566] RBP: ffffffffffffffff R08: ffffffff963c0658 R09: 0000000000200000
> > > > [ 1487.249340] R10: 00007f6ee0c05000 R11: ffff8ca4772ec000 R12: 00007f6ee0a05000
> > > > [ 1487.250191] R13: ffffd4108e577b98 R14: ffff8ca241389800 R15: ffffd4108e577b40
> > > > [ 1487.250962] FS:  00007f6ee1bfa840(0000) GS:ffff8ca6a1838000(0000)
> > > > knlGS:0000000000000000
> > > > [ 1487.251416] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 1487.252127] CR2: 00007f6ee1a7ccf0 CR3: 0000000441bcf000 CR4: 00000000000406f0
> > > > [ 1487.252933] Call Trace:
> > > > [ 1487.253094]  <TASK>
> > > > [ 1487.253638]  ? unmap_page_range+0x257/0x400
> > > > [ 1487.253876]  unmap_vmas+0xa6/0x180
> > > > [ 1487.254482]  exit_mmap+0xf0/0x3b0
> > > > [ 1487.255095]  __mmput+0x3e/0x140
> > > > [ 1487.255713]  exit_mm+0xaf/0x110
> > > > [ 1487.256328]  do_exit+0x1ad/0x450
> > > > [ 1487.256905]  ? filemap_map_pages+0x27e/0x3d0
> > > > [ 1487.257540]  do_group_exit+0x30/0x80
> > > > [ 1487.257789]  __x64_sys_exit_group+0x18/0x20
> > > > [ 1487.258008]  x64_sys_call+0x14fa/0x1500
> > > > [ 1487.258251]  do_syscall_64+0x84/0x800
> > > > [ 1487.258472]  ? do_read_fault+0xf5/0x220
> > > > [ 1487.258687]  ? do_fault+0x156/0x280
> > > > [ 1487.259260]  ? __handle_mm_fault+0x55c/0x6b0
> > > > [ 1487.259911]  ? count_memcg_events+0xdd/0x1b0
> > > > [ 1487.260555]  ? handle_mm_fault+0x220/0x340
> > > > [ 1487.260784]  ? do_user_addr_fault+0x2c3/0x7f0
> > > > [ 1487.261419]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > [ 1487.261712] RIP: 0033:0x7f6ee1a7cd08
> > > > [ 1487.261954] Code: Unable to access opcode bytes at 0x7f6ee1a7ccde.
> > > > [ 1487.262691] RSP: 002b:00007ffdb391b628 EFLAGS: 00000206 ORIG_RAX:
> > > > 00000000000000e7
> > > > [ 1487.263484] RAX: ffffffffffffffda RBX: 00007f6ee1ba7fc8 RCX: 00007f6ee1a7cd08
> > > > [ 1487.264266] RDX[ 1487.359221] R10: 00007ffdb391b420 R11:
> > > > 0000000000000206 R12: 0000000000000001
> > > > [ 1487.365268] R13: 0000000000000001 R14: 00007f6ee1ba6680 R15: 00007f6ee1ba7fe0
> > > > [ 1487.366071]  </TASK>
> > > > [ 1487.366251] Modules linked in: nvmet_tcp nvmet nvme_tcp
> > > > nvme_fabrics nvme nvme_core nvme_keyring nvme_auth rtrs_core rdma_cm
> > > > iw_cm ib_cm ib_core hkdf rfkill sunrpc amd64_edac edac_mce_amd
> > > > ipmi_ssif acpi_power_meter acpi_ipmi ipmi_si ipmi_devintf kvm
> > > > irqbypass i2c_piix4 ipmi_msghandler hpilo tg3 acpi_cpufreq i2c_smbus
> > > > fam15h_power k10temp pcspkr loop fuse nfnetlink zram lz4hc_compress
> > > > lz4_compress xfs ata_generic pata_acpi polyval_clmulni
> > > > ghash_clmulni_intel hpsa mgag200 serio_raw i2c_algo_bit
> > > > scsi_transport_sas hpwdt sp5100_tco pata_atiixp i2c_dev [last
> > > > unloaded: nvmet]
> > > > [ 1487.369378] ---[ end trace 0000000000000000 ]---
> > > > [ 1487.373697] ERST: [Firmware Warn]: Firmware does not respond in time.
> > > > [ 1487.374212] pstoreffff R08: ffffffff963c0658 R09: 0000000000200000
> > > > [ 1487.775150] R10: 00007f6ee0c05000 R11: ffff8ca4772ec000 R12: 00007f6ee0a05000
> > > > [ 1487.776024] R13: ffffd4108e577b98 R14: ffff8ca241389800 R15: ffffd4108e577b40
> > > > [ 1487.776853] FS:  00007f6ee1bfa840(0000) GS:ffff8ca6a1838000(0000)
> > > > knlGS:0000000000000000
> > > > [ 1487.777313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 1487.778210] CR2: 00007f6ee1a7ccf0 CR3: 0000000441bcf000 CR4: 00000000000406f0
> > > > [ 1487.778978] Kernel panic - not syncing: Fatal exception
> > > > [ 1487.779714] Kernel Offset: 0x11a00000 from 0xffffffff81000000
> > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > [ 1487.814610] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > > [-- MARK -- Mon Nov 17 21:35:00 2025]
> > >
> > > The usual:
> > >
> >
> > I did a 1000-cycle run for blktests nvme/029, but with no luck to
> > reproduce it, I will try to find one reliable reproducer for it.
> > Thanks.
> >
> > > 1) is it reproducible just re-running the test?
> > > 2) if so, please bisect
> > >
> > > --
> > > Jens Axboe
> > >
> >
> > --
> > Best Regards,
> >   Yi Zhang
>
>
>
> --
> Best Regards,
>   Yi Zhang


--
Best Regards,
  Yi Zhang