[PATCH net-next v3 10/18] nvme/host: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage

Aurelien Aptel aaptel at nvidia.com
Thu Jun 29 07:45:15 PDT 2023


Hi David,

David Howells <dhowells at redhat.com> writes:
> When transmitting data, call down into TCP using a single sendmsg with
> MSG_SPLICE_PAGES to indicate that content should be spliced rather than
> performing several sendmsg and sendpage calls to transmit header, data
> pages and trailer.

This series makes my kernel crash.

>From the current net-next main branch:

commit 9ae440b8fdd6772b6c007fa3d3766530a09c9045 (HEAD)
Merge: b545a13ca9b2 b848b26c6672
Author: Jakub Kicinski <kuba at kernel.org>
Date:   Sat Jun 24 15:50:21 2023 -0700

    Merge branch 'splice-net-switch-over-users-of-sendpage-and-remove-it'


Steps to reproduce:

* connect a remote nvme null block device (nvmet) with 1 IO queue to keep
  things simple
* open /dev/nvme0n1 with O_RDWR|O_DIRECT|O_SYNC
* write() a 8k buffer or 4k buffer

Trace:

[  311.766163] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  311.768136] #PF: supervisor read access in kernel mode
[  311.769327] #PF: error_code(0x0000) - not-present page
[  311.770393] PGD 148988067 P4D 0
[  311.771074] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  311.771978] CPU: 0 PID: 180 Comm: kworker/0:1H Not tainted 6.4.0-rc7+ #27
[  311.773380] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[  311.774808] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[  311.775547] RIP: 0010:skb_splice_from_iter+0xf1/0x370
[  311.776176] Code: 8b 45 88 4d 89 fa 4d 89 e7 45 89 ec 44 89 e3 41 83
               c4 01 83 fb 07 0f 87 56 02 00 00 48 8b 5c dd 90 41 bd 00 10 00 00 49 29
               c5 <48> 8b 53 08 4d 39 f5 4d 0f 47 ee f6 c2 01 0f 85 c7 01 00 00 66 90
[  311.778472] RSP: 0018:ff633e24c0747b08 EFLAGS: 00010206
[  311.779115] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[  311.780007] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff633e24c0747d30
[  311.780861] RBP: ff633e24c0747bb0 R08: ff633e24c0747d40 R09: 000000006db29140
[  311.781748] R10: ff3001bd00a22800 R11: 0000000008000000 R12: 0000000000000001
[  311.782631] R13: 0000000000001000 R14: 0000000000001000 R15: 0000000000000000
[  311.783506] FS:  0000000000000000(0000) GS:ff3001be77800000(0000) knlGS:0000000000000000
[  311.784494] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  311.785197] CR2: 0000000000000008 CR3: 0000000107f5c001 CR4: 0000000000771ef0
[  311.786076] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  311.786948] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  311.787822] PKRU: 55555554
[  311.788165] Call Trace:
[  311.788480]  <TASK>
[  311.788756]  ? show_regs+0x6e/0x80
[  311.789189]  ? __die+0x29/0x70
[  311.789577]  ? page_fault_oops+0x154/0x4a0
[  311.790097]  ? ip_output+0x7c/0x110
[  311.790541]  ? __sys_socketpair+0x1b4/0x280
[  311.791065]  ? __pfx_ip_finish_output+0x10/0x10
[  311.791640]  ? do_user_addr_fault+0x360/0x770
[  311.792184]  ? exc_page_fault+0x7d/0x190
[  311.792677]  ? asm_exc_page_fault+0x2b/0x30
[  311.793198]  ? skb_splice_from_iter+0xf1/0x370
[  311.793748]  ? skb_splice_from_iter+0xb7/0x370
[  311.794312]  ? __sk_mem_schedule+0x34/0x50
[  311.794824]  tcp_sendmsg_locked+0x3a6/0xdd0
[  311.795344]  ? tcp_push+0x10c/0x120
[  311.795789]  tcp_sendmsg+0x31/0x50
[  311.796213]  inet_sendmsg+0x47/0x80
[  311.796655]  sock_sendmsg+0x99/0xb0
[  311.797095]  ? inet_sendmsg+0x47/0x80
[  311.797557]  nvme_tcp_try_send_data+0x149/0x490 [nvme_tcp]
[  311.798242]  ? kvm_clock_get_cycles+0xd/0x20
[  311.799181]  nvme_tcp_try_send+0x1b7/0x300 [nvme_tcp]
[  311.800133]  nvme_tcp_io_work+0x40/0xc0 [nvme_tcp]
[  311.801044]  process_one_work+0x21c/0x430
[  311.801847]  worker_thread+0x54/0x3e0
[  311.802611]  ? __pfx_worker_thread+0x10/0x10
[  311.803433]  kthread+0xf8/0x130
[  311.804116]  ? __pfx_kthread+0x10/0x10
[  311.804865]  ret_from_fork+0x29/0x50
[  311.805596]  </TASK>
[  311.806165] Modules linked in: mlx5_ib ib_uverbs ib_core nvme_tcp
 mlx5_core mlxfw psample pci_hyperv_intf rpcsec_gss_krb5 nfsv3
 auth_rpcgss nfs_acl nfsv4 nfs lockd grace fscache netfs nvme_fabrics
 nvme_core nvme_common intel_rapl_msr intel_rapl_common
 intel_uncore_frequency_common nfit kvm_intel kvm rapl input_leds
 serio_raw sunrpc binfmt_misc qemu_fw_cfg sch_fq_codel dm_multipath
 scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon
 efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic
 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
 async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
 hid_generic usbhid hid qxl drm_ttm_helper ttm crct10dif_pclmul
 crc32_pclmul ghash_clmulni_intel drm_kms_helper sha512_ssse3
 syscopyarea sysfillrect sysimgblt aesni_intel crypto_simd i2c_i801 ahci
 cryptd psmous e drm virtio_net i2c_smbus libahci lpc_ich net_failover
 xhci_pci virtio_blk failover xhci_pci_renesas [last unloaded: ib_core]
[  311.818698] CR2: 0000000000000008
[  311.819437] ---[ end trace 0000000000000000 ]---

Cheers,



More information about the Linux-nvme mailing list