BUG: hot removal during writes on ext4 formatted nvme device
Dmitry Monakhov
dmonlist at gmail.com
Thu May 18 07:25:17 PDT 2017
Jon Derrick <jonathan.derrick at intel.com> writes:
> Hello,
>
> I've encountered a BUG that I've experienced during hot removal on an
> ext4-formatted nvme device undergoing writes. I have been able to verify
> that 4.5, 4.6, 4.10.12, 4.11, and 4.12-rc1 show similar issues (the v4.6
> trace below shows issues with block that have already been fixed). I'm
> using VMD hardware for my hotplug controller so 4.5 is as far back as I
> can go (maybe someone else can verify on non-VMD hardware?).
>
> To reproduce:
> 1) mkfs.ext4 <nvme>
> 2) mount <nvme> <mnt>
> 3) dd if=/dev/zero of=<mnt>/file bs=1M count=10000
> 4) Hot remove the drive while above is writing
>
> From what I can tell, the ext4 sb is trying to be committed in the error
> path. There is supposed to be a check if the device is still alive via
> block_device_ejected(), but my guess is that there is a race between the
> removal/deletion in genhd and this check. I would appreciate any help
> resolving this.
>
> Here are the traces for v4.11 and v4.6:
> v4.11:
> [ 217.509412] EXT4-fs (nvme1n1): mounted filesystem with ordered data
> mode. Opts: (null)
> [ 300.232914] pciehp 10000:0d:00.0:pcie204: Slot(0-1): Card not present
> [ 300.241468] nvme 10000:0f:00.0: PME# disabled
> [ 300.907560] nvme1n1: detected capacity change from 400088457216 to 0
> [ 300.916727] VFS: busy inodes on changed media or resized disk nvme1n1
> [ 300.941164] blk_update_request: I/O error, dev nvme1n1, sector 1048576
> [ 300.949889] blk_update_request: I/O error, dev nvme1n1, sector 1048832
> [ 300.958608] blk_update_request: I/O error, dev nvme1n1, sector 1049088
> [ 300.967327] blk_update_request: I/O error, dev nvme1n1, sector 1049344
> [ 300.976044] blk_update_request: I/O error, dev nvme1n1, sector 1049600
> [ 300.984762] blk_update_request: I/O error, dev nvme1n1, sector 1049856
> [ 300.993469] blk_update_request: I/O error, dev nvme1n1, sector 1050112
> [ 301.002563] blk_update_request: I/O error, dev nvme1n1, sector 1050368
> [ 301.011228] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 2101248 starting block 131328)
> [ 301.028172] Buffer I/O error on device nvme1n1, logical block 131072
> [ 301.036604] Buffer I/O error on device nvme1n1, logical block 131073
> [ 301.045047] Buffer I/O error on device nvme1n1, logical block 131074
> [ 301.053476] Buffer I/O error on device nvme1n1, logical block 131075
> [ 301.061903] Buffer I/O error on device nvme1n1, logical block 131076
> [ 301.070332] Buffer I/O error on device nvme1n1, logical block 131077
> [ 301.078760] Buffer I/O error on device nvme1n1, logical block 131078
> [ 301.122809] Buffer I/O error on device nvme1n1, logical block 131079
> [ 301.165647] Buffer I/O error on device nvme1n1, logical block 131080
> [ 301.208168] Buffer I/O error on device nvme1n1, logical block 131081
> [ 301.250531] blk_update_request: I/O error, dev nvme1n1, sector 1050624
> [ 301.292612] blk_update_request: I/O error, dev nvme1n1, sector 1050880
> [ 301.333576] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 3149824 starting block 131584)
> [ 301.417798] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 4198400 starting block 131840)
> [ 301.506349] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 5246976 starting block 132096)
> [ 301.600295] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 6295552 starting block 132352)
> [ 301.697846] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 7344128 starting block 132608)
> [ 301.800529] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 8388608 starting block 132864)
> [ 301.908730] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 0 size 8388608 starting block 133120)
> [ 302.019761] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 8388608 size 2101248 starting block
> 133376)
> [ 302.135856] EXT4-fs warning (device nvme1n1): ext4_end_bio:313: I/O
> error -5 writing to inode 12 (offset 8388608 size 3149824 starting block
> 133632)
> [ 302.324114] EXT4-fs error (device nvme1n1):
> ext4_wait_block_bitmap:503: comm kworker/u577:1: Cannot read block
> bitmap - block_group = 5, block_bitmap = 1077
> [ 302.457958] Buffer I/O error on dev nvme1n1, logical block 0, lost
> sync page write
> [ 302.524509] EXT4-fs (nvme1n1): Delayed block allocation failed for
> inode 12 at logical offset 32768 with max blocks 2048 with error 5
> [ 302.659921] EXT4-fs (nvme1n1): This should not happen!! Data will be lost
> [ 302.659921]
> [ 302.793498] JBD2: Detected IO errors while flushing file data on
> nvme1n1-8
> [ 302.797644] EXT4-fs error (device nvme1n1):
> ext4_wait_block_bitmap:503: comm kworker/u577:1: Cannot read block
> bitmap - block_group = 5, block_bitmap = 1077
> [ 302.797699] EXT4-fs (nvme1n1): previous I/O error to superblock detected
> [ 302.797709] ------------[ cut here ]------------
> [ 302.797711] kernel BUG at fs/buffer.c:3103!
This is common bug which happens if device dies under our feet.
bh becomes invalidated and unmapped.
My proposed fix is here:
https://www.spinics.net/lists/kernel/msg2483231.html
Full patchset was not accepted, I'll update it and try again soon.
> [ 302.797713] invalid opcode: 0000 [#1] SMP KASAN
> [ 302.797714] Modules linked in: ext4 jbd2 mbcache vfat fat iTCO_wdt
> iTCO_vendor_support intel_rapl x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm btrfs xor irqbypass crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel pcbc aesni_intel crypto_simd cryptd glue_helper
> raid6_pq pcspkr i2c_i801 i2c_core lpc_ich shpchp sg ioatdma dca wmi
> ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_pad tpm_crb
> acpi_power_meter ip_tables xfs libcrc32c sd_mod crc32c_intel e1000e ahci
> nvme ptp libahci nvme_core pps_core libata
> [ 302.797762] CPU: 41 PID: 513 Comm: kworker/u577:1 Not tainted 4.11.0 #62
> [ 302.797764] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS
> PLYDCRB1.86B.0121.R04.1702012027 02/01/2017
> [ 302.797771] Workqueue: writeback wb_workfn (flush-259:2)
> [ 302.797774] task: ffff88016d490000 task.stack: ffff88016d498000
> [ 302.797778] RIP: 0010:submit_bh_wbc+0x277/0x2a0
> [ 302.797780] RSP: 0018:ffff88016d49f248 EFLAGS: 00010246
> [ 302.797783] RAX: 0000000000200005 RBX: ffff880155c2a498 RCX:
> ffffffff81384751
> [ 302.797785] RDX: dffffc0000000000 RSI: 0000000000020000 RDI:
> ffff880155c2a498
> [ 302.797787] RBP: ffff88016d49f290 R08: 0000000000000000 R09:
> ffffed00476881b7
> [ 302.797788] R10: ffff88016d49f3e0 R11: ffffed00476881b6 R12:
> 0000000000020000
> [ 302.797790] R13: 0000000000020000 R14: 0000000000000001 R15:
> 0000000000000000
> [ 302.797792] FS: 0000000000000000(0000) GS:ffff88017a440000(0000)
> knlGS:0000000000000000
> [ 302.797794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 302.797796] CR2: 00007faa8d9ae140 CR3: 000000000200d000 CR4:
> 00000000007406e0
> [ 302.797798] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 302.797799] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 302.797800] PKRU: 55555554
> [ 302.797801] Call Trace:
> [ 302.797806] ? __wake_up_bit+0x45/0x90
> [ 302.797810] __sync_dirty_buffer+0x84/0x130
> [ 302.797851] ext4_commit_super+0x3e8/0x4f0 [ext4]
> [ 302.797886] __ext4_error+0xa3/0x140 [ext4]
> [ 302.797889] ? autoremove_wake_function+0xa0/0xa0
> [ 302.797920] ext4_wait_block_bitmap+0xbd/0xe0 [ext4]
> [ 302.797956] ext4_mb_init_cache+0x35e/0xb10 [ext4]
> [ 302.797992] ext4_mb_init_group+0x1fc/0x330 [ext4]
> [ 302.798028] ext4_mb_good_group+0x274/0x280 [ext4]
> [ 302.798065] ext4_mb_regular_allocator+0x4d0/0x750 [ext4]
> [ 302.798101] ext4_mb_new_blocks+0x93f/0x1640 [ext4]
> [ 302.798105] ? kasan_kmalloc+0x93/0xc0
> [ 302.798108] ? __kmalloc+0x12e/0x230
> [ 302.798142] ? ext4_find_extent+0x3cf/0x450 [ext4]
> [ 302.798178] ? ext4_ext_search_right+0x108/0x490 [ext4]
> [ 302.798213] ext4_ext_map_blocks+0x1128/0x15c0 [ext4]
> [ 302.798245] ext4_map_blocks+0x1b7/0xa20 [ext4]
> [ 302.798277] ext4_writepages+0xa71/0x13e0 [ext4]
> [ 302.798282] do_writepages+0x4b/0x70
> [ 302.798284] ? do_writepages+0x4b/0x70
> [ 302.798288] __writeback_single_inode+0x6a/0x4a0
> [ 302.798292] writeback_sb_inodes+0x271/0x650
> [ 302.798296] wb_writeback+0x1db/0x430
> [ 302.798299] wb_workfn+0x19a/0x590
> [ 302.798302] ? wb_workfn+0x19a/0x590
> [ 302.798307] ? finish_task_switch+0x9b/0x330
> [ 302.798310] process_one_work+0x2a2/0x680
> [ 302.798313] worker_thread+0x89/0x790
> [ 302.798316] kthread+0x18c/0x1e0
> [ 302.798318] ? rescuer_thread+0x600/0x600
> [ 302.798321] ? kthread_park+0xd0/0xd0
> [ 302.798324] ret_from_fork+0x2c/0x40
> [ 302.798325] Code: 0f 45 e8 45 09 f5 e8 09 86 f7 ff 45 89 6c 24 14 4c
> 89 e7 e8 3c ca 13 00 48 83 c4 20 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3
> 0f 0b <0f> 0b 48 8d 43 20 48 89 45 d0 48 8d 43 10 48 89 45 b8 e9 a8 fe
> [ 302.798363] RIP: submit_bh_wbc+0x277/0x2a0 RSP: ffff88016d49f248
> [ 302.798531] ---[ end trace 273a42299ae29efd ]---
> [ 302.802839]
> ==================================================================
>
>
>
> v4.6:
> [ 401.567515] ------------[ cut here ]------------
> [ 401.567526] WARNING: CPU: 16 PID: 730 at lib/list_debug.c:33
> __list_add+0xbf/0xf0
> [ 401.567530] list_add corruption. prev->next should be next
> (ffffe8fffc600548), but was ffff88016e540000. (prev=ffff88016e540000).
> [ 401.567579] Modules linked in: ext4 jbd2 mbcache vfat fat
> x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul
> iTCO_wdt ablk_helper cryptd iTCO_vendor_support btrfs xor pcspkr
> raid6_pq i2c_i801 sg i2c_core lpc_ich shpchp wmi ipmi_devintf ipmi_si
> ipmi_msghandler nfit fjes tpm_crb libnvdimm acpi_power_meter acpi_pad
> ip_tables xfs libcrc32c sd_mod crc32c_intel ahci e1000e nvme libahci ptp
> nvme_core pps_core libata
> [ 401.567584] CPU: 16 PID: 730 Comm: kworker/u898:2 Not tainted 4.6.0 #64
> [ 401.567586] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS
> PLYDCRB1.86B.0121.R04.1702012027 02/01/2017
> [ 401.567595] Workqueue: writeback wb_workfn (flush-259:2)
> [ 401.567600] ffffffff81e50fc1 00000000b7d5bb37 ffff88023207ee88
> ffffffff814b0578
> [ 401.567604] ffff88023207eee0 0000000000000000 ffff88023207eed0
> ffffffff810bf821
> [ 401.567608] ffff880230c9dd00 00000021fc6005c0 ffff88016e540000
> ffff88016e540000
> [ 401.567609] Call Trace:
> [ 401.567619] [<ffffffff814b0578>] dump_stack+0x63/0x8b
> [ 401.567626] [<ffffffff810bf821>] __warn+0x111/0x130
> [ 401.567630] [<ffffffff810bf89f>] warn_slowpath_fmt+0x5f/0x80
> [ 401.567633] [<ffffffff814ddbff>] __list_add+0xbf/0xf0
> [ 401.567640] [<ffffffff8147e008>] blk_mq_insert_requests+0x168/0x2a0
> [ 401.567645] [<ffffffff8147f41a>] blk_mq_flush_plug_list+0x1ca/0x1f0
> [ 401.567651] [<ffffffff8146d203>] blk_flush_plug_list+0x133/0x330
> [ 401.567659] [<ffffffff81a0871f>] io_schedule_timeout+0x6f/0x1a0
> [ 401.567663] [<ffffffff81a0a139>] bit_wait_io+0x29/0x80
> [ 401.567667] [<ffffffff81a09b0f>] __wait_on_bit+0x7f/0xd0
> [ 401.567671] [<ffffffff81a0a110>] ? bit_wait_timeout+0xd0/0xd0
> [ 401.567675] [<ffffffff81a0a110>] ? bit_wait_timeout+0xd0/0xd0
> [ 401.567679] [<ffffffff81a09be1>] out_of_line_wait_on_bit+0x81/0xb0
> [ 401.567686] [<ffffffff8112a3b0>] ? autoremove_wake_function+0x50/0x50
> [ 401.567692] [<ffffffff81343683>] __sync_dirty_buffer+0x103/0x120
> [ 401.567743] [<ffffffffa0dd9609>] ext4_commit_super+0x3a9/0x4a0 [ext4]
> [ 401.567794] [<ffffffffa0dd9b1f>] __ext4_error+0x7f/0x120 [ext4]
> [ 401.567827] [<ffffffffa0d98c7d>] ext4_wait_block_bitmap+0xbd/0xe0 [ext4]
> [ 401.567868] [<ffffffffa0dfd2ce>] ext4_mb_init_cache+0x35e/0xb00 [ext4]
> [ 401.567906] [<ffffffffa0dfdc6c>] ext4_mb_init_group+0x1fc/0x330 [ext4]
> [ 401.567944] [<ffffffffa0dfe656>] ext4_mb_load_buddy_gfp+0x636/0x680
> [ext4]
> [ 401.567982] [<ffffffffa0e01dee>] ext4_mb_find_by_goal+0x15e/0x640 [ext4]
> [ 401.568020] [<ffffffffa0e02ff6>]
> ext4_mb_regular_allocator+0xd6/0x770 [ext4]
> [ 401.568025] [<ffffffff812c1b36>] ? ___slab_alloc+0x146/0x450
> [ 401.568057] [<ffffffffa0d98a59>] ?
> ext4_get_group_no_and_offset+0x99/0xb0 [ext4]
> [ 401.568100] [<ffffffffa0e05eb1>] ext4_mb_new_blocks+0x6b1/0x960 [ext4]
> [ 401.568137] [<ffffffffa0deacc8>] ? ext4_ext_search_right+0x108/0x490
> [ext4]
> [ 401.568174] [<ffffffffa0ded672>] ? ext4_find_extent+0x472/0x4c0 [ext4]
> [ 401.568211] [<ffffffffa0df457b>] ext4_ext_map_blocks+0x112b/0x15c0
> [ext4]
> [ 401.568216] [<ffffffff8123a3f4>] ? find_get_pages_tag+0x214/0x230
> [ 401.568250] [<ffffffffa0da0800>] ? __ext4_new_inode+0xbb0/0x1d20 [ext4]
> [ 401.568283] [<ffffffffa0da6257>] ext4_map_blocks+0x1b7/0x800 [ext4]
> [ 401.568317] [<ffffffffa0dac151>] ext4_writepages+0x951/0x1280 [ext4]
> [ 401.568323] [<ffffffff8124e9db>] do_writepages+0x4b/0x70
> [ 401.568326] [<ffffffff8133373a>] __writeback_single_inode+0x6a/0x480
> [ 401.568330] [<ffffffff813342ad>] writeback_sb_inodes+0x26d/0x660
> [ 401.568334] [<ffffffff8133474c>] __writeback_inodes_wb+0xac/0x100
> [ 401.568338] [<ffffffff81334c6c>] wb_writeback+0x3fc/0x420
> [ 401.568341] [<ffffffff813355cb>] wb_workfn+0x32b/0x590
> [ 401.568347] [<ffffffff810e3bd6>] process_one_work+0x256/0x620
> [ 401.568351] [<ffffffff810e4c55>] worker_thread+0x85/0x750
> [ 401.568355] [<ffffffff810e4bd0>] ? rescuer_thread+0x520/0x520
> [ 401.568358] [<ffffffff810ece22>] kthread+0x122/0x140
> [ 401.568362] [<ffffffff81a0e582>] ret_from_fork+0x22/0x40
> [ 401.568365] [<ffffffff810ecd00>] ? kthread_park+0x80/0x80
> [ 401.568383] ---[ end trace 53be6edc49257ee0 ]---
> [ 401.568384] ------------[ cut here ]------------
More information about the Linux-nvme
mailing list