[LSF/MM/BPF TOPIC] nvme state machine refactoring

Daniel Wagner dwagner at suse.de
Thu Apr 27 00:27:45 PDT 2023


Hi,

I'd like to use the opportunity to align and discuss the nvme state machine
refactoring work in person. I don't think we need a lot of time for this topic,
so if we could just have the topic during a BOF it would be great.

Sagi proposed following high level API:

  ops.setup_transport(ctrl)
  ops.alloc_admin_queue(ctrl)
  ops.start_admin_queue(ctrl)
  ops.stop_admin_queue(ctrl)
  ops.free_admin_queue(ctrl)
  ops.alloc_io_queues(ctrl)
  ops.start_io_queues(ctrl)
  ops.stop_io_queues(ctrl)
  ops.free_io_queues(ctrl)

Getting the queue functions done is fairly straight forward and I didn't run
into any problems in my experiments.

The more tricky part is the slight different behavior how the transports handle
how many queues are allocated for IO and their placement. To keep it exactly as
it is right now, I had to add a couple of additional callbacks aside to
setup_transport():

 - nr_io_queues(): limit the number of queues
 - set_io_queues(): map the queues to cpu

The first one was mainly necessary for rdma but IIRC Keith has done some work
there which could make the callback unnecessary. My question is should we try
to unify this part as well?

Also I haven't really checked what pci does here.

The second callback should probably be replaced with something which is also
executed during runtime, e.g. for CPU hotplug events. I don't think it is
strictly necessary. At least it looks a bit suspicious that we only do the queue
cpu mapping when (re)connecting. But maybe I am just missing something.

There is also the question how to handle the flags set by the core and the one
set the the transports. There are generic ones like NVME_TCP_Q_LIVE. These can
be translated into generic ones, so fairly simple. Though here is one transport
specific one in rdma: NVME_RDMA_Q_TR_READY. What to do here?

In short, I don't think there are real blockers. The main question for me is, do
we want to unify all transport so far that they act exactly the same?

Required Attendees:
  - Chaitanya Kulkarni
  - Christoph Hellwig
  - Hannes Reinecke
  - James Smart
  - Keith Busch
  - Sagi Grimberg

Anyway, I think it is necessary to have tests in blktests up front. Hence my
current effort with enabling fc transport in blktests.

Thanks,
Daniel

https://lore.kernel.org/linux-nvme/20230301082737.10021-1-dwagner@suse.de/
https://lore.kernel.org/linux-nvme/20230306093244.20775-1-dwagner@suse.de/



More information about the Linux-nvme mailing list