[PATCH] nvme: Boot as soon as the boot controller has been probed
Keith Busch
kbusch at kernel.org
Thu Nov 12 09:20:24 EST 2020
On Wed, Nov 11, 2020 at 08:26:12PM -0800, Bart Van Assche wrote:
> On 11/11/20 11:19 AM, Keith Busch wrote:
> > On Sat, Nov 07, 2020 at 08:09:03PM -0800, Bart Van Assche wrote:
> >> The following two issues have been introduced by commit 1811977568e0
> >> ("nvme/pci: Use async_schedule for initial reset work"):
> >> - The boot process waits until all NVMe controllers have been probed
> >> instead of only waiting until the boot controller has been probed.
> >> This slows down the boot process.
> >> - Some of the controller probing work happens asynchronously without
> >> the device core being aware of this.
> >>
> >> Hence this patch that makes all probing work happen from nvme_probe()
> >> and that tells the device core to probe multiple NVMe controllers
> >> concurrently by setting PROBE_PREFER_ASYNCHRONOUS.
> >
> > I am finding that this setting probes devices in parallel on boot up,
> > but serially for devices added after boot. That's making this a rather
> > unappealing patch.
>
> Hi Keith,
>
> Thanks for having verified this. However, to me it is unexpected that my
> patch results in serial probing of NVMe devices added after boot. It
> would be appreciated if more information could be shared about how that
> test was run. I'm wondering whether the serialization perhaps was the
> result of how the test was run? My understanding of the driver core is
> that the decision whether or not to probe concurrently is taken inside
> __driver_attach() and also that PROBE_PREFER_ASYNCHRONOUS should trigger
> a call of the following code:
>
> async_schedule_dev(__driver_attach_async_helper, dev);
>
> I expect that calling the above code should result in concurrent probing.
The easiest check is something like this (as long as you're not actively
using your nvme namespaces):
# echo 1 | tee /sys/class/nvme/nvme*/device/remove
# echo 1 > /sys/bus/pci/rescan
The subsequent probes don't go through the async_schedule. They have
this stack instead:
[ 44.528906] Call Trace:
[ 44.530512] dump_stack+0x6d/0x88
[ 44.532373] nvme_probe+0x2f/0x59b [nvme]
[ 44.534466] local_pci_probe+0x3d/0x70
[ 44.536471] pci_device_probe+0x107/0x1b0
[ 44.538562] really_probe+0x1be/0x410
[ 44.540564] driver_probe_device+0xe1/0x150
[ 44.542733] ? driver_allows_async_probing+0x50/0x50
[ 44.545170] bus_for_each_drv+0x7b/0xc0
[ 44.547172] __device_attach+0xeb/0x170
[ 44.549163] pci_bus_add_device+0x4a/0x70
[ 44.551182] pci_bus_add_devices+0x2c/0x70
[ 44.553229] pci_rescan_bus+0x25/0x30
[ 44.555126] rescan_store+0x61/0x90
[ 44.556975] kernfs_fop_write+0xcb/0x1b0
[ 44.558940] vfs_write+0xbe/0x200
[ 44.560710] ksys_write+0x5f/0xe0
[ 44.562483] do_syscall_64+0x2d/0x40
[ 44.564354] entry_SYSCALL_64_after_hwframe+0x44/0xa9
More information about the Linux-nvme
mailing list