BUG Report: kernel NULL pointer dereference in bio_integrity_advance()

Mon Aug 26 07:32:31 PDT 2024

Hi,

I saw that running the following command on 5.4, 5.10, 5.15 stable
kernels crashes the system with a NULL pointer dereference:

 root at pjy:~# touch test.txt
 root at pjy:~# nvme io-passthru /dev/nvme0 --opcode=0x1 --input-file=test.txt --data-len=1 --write --namespace=1 --metadata-len=1
 nvme nvme0: using deprecated NVME_IOCTL_IO_CMD ioctl on the char device!
 Unable to handle kernel NULL pointer dereference at virtual address 000000000000000a
 Mem abort info:
   ESR = 0x96000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106500000
 [000000000000000a] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 Modules linked in: crct10dif_ce nvme nvme_core fuse drm dm_mod ip_tables x_tables ipv6
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.1-gb6abb62daa55 #1
 Hardware name: linux,dummy-virt (DT)
 pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : bio_integrity_advance+0x4c/0x100
 lr : bio_advance+0x34/0x120
 sp : ffff800010003d90
 x29: ffff800010003d90 x28: ffff800011d234c0 x27: ffff80001151ff20
 x26: ffff0000be180310 x25: 00000000000000b1 x24: 0000000000000001
 x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000c6733640
 x20: ffff0000c47afa00 x19: ffff0000c5f81108 x18: 0000000000000001
 x17: ffff8000edf70000 x16: ffff800010004000 x15: 0000000000004000
 x14: 0000000000000001 x13: 0000000000000002 x12: 0000000000000400
 x11: 0000000000000040 x10: ffff0000c0034168 x9 : ffff0000c0034160
 x8 : ffff0000c0424550 x7 : 0000000000000000 x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
 x2 : ffff0000c5f81100 x1 : 0000000000000001 x0 : 0000000000000000
 Call trace:
  bio_integrity_advance+0x4c/0x100
  bio_advance+0x34/0x120
  blk_update_request+0x174/0x400
  blk_mq_end_request+0x2c/0x150
  nvme_complete_rq+0x4c/0x10c [nvme_core]
  nvme_pci_complete_rq+0x4c/0xa4 [nvme]
  nvme_process_cq+0x144/0x250 [nvme]
  nvme_irq+0x18/0x30 [nvme]
  __handle_irq_event_percpu+0x40/0x15c
  handle_irq_event+0x64/0x140
  handle_fasteoi_irq+0xa8/0x1a0
  handle_domain_irq+0x64/0x94
  gic_handle_irq+0xbc/0x140
  call_on_irq_stack+0x2c/0x60
  do_interrupt_handler+0x54/0x60
  el1_interrupt+0x30/0x80
  el1h_64_irq_handler+0x1c/0x2c
  el1h_64_irq+0x78/0x7c
  finish_task_switch.isra.0+0x98/0x260
  __schedule+0x2a4/0x714
  schedule_idle+0x2c/0x50
  do_idle+0x190/0x2cc
  cpu_startup_entry+0x28/0x80
  rest_init+0xe8/0x100
  arch_call_rest_init+0x14/0x20
  start_kernel+0x634/0x674
  __primary_switched+0xc0/0xc8
 Code: f9402800 f84c8c04 f100009f 9a9f1000 (39402804)
 ---[ end trace 515229a85ac6ccf1 ]---
 Kernel panic - not syncing: Oops: Fatal exception in interrupt
 SMP: stopping secondary CPUs
 Kernel Offset: disabled
 CPU features: 0x11000471,20000846
 Memory Limit: none
 ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

This is because in the function:

void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
	struct bio_integrity_payload *bip = bio_integrity(bio);
	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
	unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);

	bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
	bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}

Here blk_get_integrity() returns NULL and bio_integrity_bytes() uses it
without checking for NULL.

This issue is also present in mainline but doesn't trigger because after

d4aa57a1cac3 ("block: don't bother iter advancing a fully done bio")

bio_advance() is not called for this reproducer, but this bug might be
triggerable through another path.

I want to send a patch to fix this but need some help to understand
where the change has to be made.

in 5.15 for example:

void bio_advance(struct bio *bio, unsigned bytes)
{
	if (bio_integrity(bio))
		bio_integrity_advance(bio, bytes);

	bio_crypt_advance(bio, bytes);
	bio_advance_iter(bio, &bio->bi_iter, bytes);
}

Here bio_integrity(bio) returns non-null and therefore
bio_integrity_advance() is called but in that fuction,
blk_get_integrity(bio->bi_bdev->bd_disk) returns NULL because for this
disk bi->profile is NULL.

So, the problem is that bi->profile is NULL for this disk but
bio->bi_integrity is non-NULL for this bio.

Please help me debug this further.

P.S. - Reproducing using qemu.
Here are the commands I used:

qemu-system-aarch64 -machine 'virt,gic-version=3' -cpu 'cortex-a57' -smp \
 2 -m 4G -drive format=raw,file=rootfs -device \
 virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2222-:22 \ 
 -kernel linux/arch/arm64/boot/Image -nographic -append "root=/dev/vda rw \
 console=ttyAMA0 debug earlyprintk=serial slub_debug=UZ nokaslr" -gdb \
 tcp::1234 -d guest_errors,unimp -D log.txt -drive \
 file=nvm.img,if=none,id=nvm -device nvme,serial=deadbeef,drive=nvm

Kernel is v5.15.1 compiled with arm64 defconfig

The following commands will then crash the kernel:
# touch test.txt
# nvme io-passthru /dev/nvme0 --opcode=0x1 --input-file=test.txt --data-len=1 --write --namespace=1 --metadata-len=1

Thanks,
Puranjay