BUG Report: kernel NULL pointer dereference in bio_integrity_advance()
pjy at amazon.com
pjy at amazon.com
Mon Aug 26 07:32:31 PDT 2024
Hi,
I saw that running the following command on 5.4, 5.10, 5.15 stable
kernels crashes the system with a NULL pointer dereference:
root at pjy:~# touch test.txt
root at pjy:~# nvme io-passthru /dev/nvme0 --opcode=0x1 --input-file=test.txt --data-len=1 --write --namespace=1 --metadata-len=1
nvme nvme0: using deprecated NVME_IOCTL_IO_CMD ioctl on the char device!
Unable to handle kernel NULL pointer dereference at virtual address 000000000000000a
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106500000
[000000000000000a] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in: crct10dif_ce nvme nvme_core fuse drm dm_mod ip_tables x_tables ipv6
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.1-gb6abb62daa55 #1
Hardware name: linux,dummy-virt (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : bio_integrity_advance+0x4c/0x100
lr : bio_advance+0x34/0x120
sp : ffff800010003d90
x29: ffff800010003d90 x28: ffff800011d234c0 x27: ffff80001151ff20
x26: ffff0000be180310 x25: 00000000000000b1 x24: 0000000000000001
x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000c6733640
x20: ffff0000c47afa00 x19: ffff0000c5f81108 x18: 0000000000000001
x17: ffff8000edf70000 x16: ffff800010004000 x15: 0000000000004000
x14: 0000000000000001 x13: 0000000000000002 x12: 0000000000000400
x11: 0000000000000040 x10: ffff0000c0034168 x9 : ffff0000c0034160
x8 : ffff0000c0424550 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
x2 : ffff0000c5f81100 x1 : 0000000000000001 x0 : 0000000000000000
Call trace:
bio_integrity_advance+0x4c/0x100
bio_advance+0x34/0x120
blk_update_request+0x174/0x400
blk_mq_end_request+0x2c/0x150
nvme_complete_rq+0x4c/0x10c [nvme_core]
nvme_pci_complete_rq+0x4c/0xa4 [nvme]
nvme_process_cq+0x144/0x250 [nvme]
nvme_irq+0x18/0x30 [nvme]
__handle_irq_event_percpu+0x40/0x15c
handle_irq_event+0x64/0x140
handle_fasteoi_irq+0xa8/0x1a0
handle_domain_irq+0x64/0x94
gic_handle_irq+0xbc/0x140
call_on_irq_stack+0x2c/0x60
do_interrupt_handler+0x54/0x60
el1_interrupt+0x30/0x80
el1h_64_irq_handler+0x1c/0x2c
el1h_64_irq+0x78/0x7c
finish_task_switch.isra.0+0x98/0x260
__schedule+0x2a4/0x714
schedule_idle+0x2c/0x50
do_idle+0x190/0x2cc
cpu_startup_entry+0x28/0x80
rest_init+0xe8/0x100
arch_call_rest_init+0x14/0x20
start_kernel+0x634/0x674
__primary_switched+0xc0/0xc8
Code: f9402800 f84c8c04 f100009f 9a9f1000 (39402804)
---[ end trace 515229a85ac6ccf1 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x11000471,20000846
Memory Limit: none
---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
This is because in the function:
void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
bip->bip_iter.bi_sector += bio_integrity_intervals(bi, bytes_done >> 9);
bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
Here blk_get_integrity() returns NULL and bio_integrity_bytes() uses it
without checking for NULL.
This issue is also present in mainline but doesn't trigger because after
d4aa57a1cac3 ("block: don't bother iter advancing a fully done bio")
bio_advance() is not called for this reproducer, but this bug might be
triggerable through another path.
I want to send a patch to fix this but need some help to understand
where the change has to be made.
in 5.15 for example:
void bio_advance(struct bio *bio, unsigned bytes)
{
if (bio_integrity(bio))
bio_integrity_advance(bio, bytes);
bio_crypt_advance(bio, bytes);
bio_advance_iter(bio, &bio->bi_iter, bytes);
}
Here bio_integrity(bio) returns non-null and therefore
bio_integrity_advance() is called but in that fuction,
blk_get_integrity(bio->bi_bdev->bd_disk) returns NULL because for this
disk bi->profile is NULL.
So, the problem is that bi->profile is NULL for this disk but
bio->bi_integrity is non-NULL for this bio.
Please help me debug this further.
P.S. - Reproducing using qemu.
Here are the commands I used:
qemu-system-aarch64 -machine 'virt,gic-version=3' -cpu 'cortex-a57' -smp \
2 -m 4G -drive format=raw,file=rootfs -device \
virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2222-:22 \
-kernel linux/arch/arm64/boot/Image -nographic -append "root=/dev/vda rw \
console=ttyAMA0 debug earlyprintk=serial slub_debug=UZ nokaslr" -gdb \
tcp::1234 -d guest_errors,unimp -D log.txt -drive \
file=nvm.img,if=none,id=nvm -device nvme,serial=deadbeef,drive=nvm
Kernel is v5.15.1 compiled with arm64 defconfig
The following commands will then crash the kernel:
# touch test.txt
# nvme io-passthru /dev/nvme0 --opcode=0x1 --input-file=test.txt --data-len=1 --write --namespace=1 --metadata-len=1
Thanks,
Puranjay
More information about the Linux-nvme
mailing list