nvmet: race condition while CQE are getting processed concurrently with the DISCONNECTED event
Sagi Grimberg
sagi at grimberg.me
Wed Mar 8 11:34:17 PST 2017
>> For that perhaps you can try patch [1].
>
> Yes, I'll think we need that. Did I mention that the percpu
> refounter API is a complete trainwreck a couple times? :)
Heh, You probably did, I wander what is the use-case
for percpu_ref_kill without the guarantee that subsequent
percpu_ref_tryget_live will fail...
>> 2. ib_destroy_cq does not really protect against a case where
>> the work requeue itself because it runs flush_work(). In this
>> case when the work re-executes it polls a cq array that is
>> already freed and sees a bogus successful completion. Perhaps
>> ib_free_cq should run cancel_work_sync() instead? see [2].
>
> Yeah, we'll probably need that as well. Independent of it solves
> the problem reported here.
I'll send proper patches.
Would be nice if Raju or Yi can see if this helps
at all..
More information about the Linux-nvme
mailing list