stack smashing detected with 'nvme sanitize-log /dev/nvme0'
Keith Busch
kbusch at kernel.org
Mon Aug 21 08:11:38 PDT 2023
On Mon, Aug 21, 2023 at 03:37:55PM +0200, Daniel Wagner wrote:
> On Wed, Jul 26, 2023 at 03:16:43PM +0200, Christoph Hellwig wrote:
> > On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote:
> > > FYI, I got a a bug report [1] with a 'stack smashing detected' when running
> > > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
> > > udisk. udisk recently added libnvme which does now a sanitize-log call, so this
> > > problem might exists for a while.
> > >
> > > We figured out that an older kernel such as 4.19.289 work but newer not (it's a
> > > bit hard for the reporter to test all combinations on his setup due to compiler
> > > changes etc.).
> > >
> > > There was a bit of refactoring in v5.2 which could be the cause of the stack
> > > smash, because saw this recent fix:
> > >
> > > b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")
> > >
> > > [1] https://github.com/storaged-project/udisks/issues/1152
> >
> > If you think it is related to DMA, there are good ways to check for:
> >
> > 1) force that an IOMMU is used for this device
> > 2) hack nvme or the blk-map code that we never do the direct mapping
> > to user space but do the copy based version, and then enable
> > all kernel memory debugging helpers, most importantly KASAN
>
> Collected some info:
>
> - this happens only with devices from MAXIO Technology
>
> vid : 0x1e4b
> ssvid : 0x1e4b
>
> - Oleg bissect the kernel and he landed on 3b2a1ebceba3 ("nvme: set
> dma alignment to qword"). He hacked the kernel and this made it
> work again:
>
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1871,7 +1871,6 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
> blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
> }
> blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1);
> - blk_queue_dma_alignment(q, 3);
> blk_queue_write_cache(q, vwc, vwc);
> }
>
> - modified the reproducer so that it allocates a 4k buffer with a well
> known pattern and checked the buffer after fetching the sanitize log
> [1]. The first invocation wrote 0x940 bytes and the second 0x5c0
> bytes. Note we just asking for 0x200 bytes.
>
> - modified blk_rq_map_user_iov so that it always uses the copy path.
> The problem went away. Though forgot to ask to turn on KASAN but
> given that we know the device writes too much data it is likely also
> overwriting some kernel memory.
>
> So what's the best way forward from here? Introduce a quirk and always
> use bounce buffer?
I don't think we want to bounce to kernel memory for the device to
overwrite it. I suggest just change nvme-cli's stack allocated santize
log to a use page aligned and sized buffer.
---
diff --git a/nvme.c b/nvme.c
index 181141ad..7f98a506 100644
--- a/nvme.c
+++ b/nvme.c
@@ -2376,7 +2376,7 @@ ret:
static int sanitize_log(int argc, char **argv, struct command *command, struct plugin *plugin)
{
const char *desc = "Retrieve sanitize log and show it.";
- struct nvme_sanitize_log_page sanitize_log;
+ struct nvme_sanitize_log_page *sanitize_log;
enum nvme_print_flags flags;
struct nvme_dev *dev;
int err;
@@ -2419,13 +2419,19 @@ static int sanitize_log(int argc, char **argv, struct command *command, struct p
if (cfg.human_readable)
flags |= VERBOSE;
- err = nvme_cli_get_log_sanitize(dev, cfg.rae, &sanitize_log);
+ if (posix_memalign((void *)&sanitize_log, getpagesize(), 0x1000)) {
+ err = -1;
+ goto close_dev;
+ }
+
+ err = nvme_cli_get_log_sanitize(dev, cfg.rae, sanitize_log);
if (!err)
- nvme_show_sanitize_log(&sanitize_log, dev->name, flags);
+ nvme_show_sanitize_log(sanitize_log, dev->name, flags);
else if (err > 0)
nvme_show_status(err);
else
nvme_show_error("sanitize status log: %s", nvme_strerror(errno));
+ free(sanitize_log);
close_dev:
dev_close(dev);
ret:
--
More information about the Linux-nvme
mailing list