CQE corruption issue

Fri Mar 4 23:21:36 PST 2016

Hello,
We're facing a corruption of the CQEs in a system under a very heavy load.
We're not able to reproduce it directly, but statistically it happens for
long sequences. After investigation, we've arrived into nvme_process_cq
and this fragment:

struct nvme_completion cqe = nvmeq->cqes[head];
u16 status = le16_to_cpu(cqe.status);
if ((status & 1) != phase)
         break;

The cqe we're seeing seems to be partially updated, the status field is
new (and the code continues), but some of the other fields have values
from the previous loop (previous phase). We've confirmed this by poisoning
the cqe after use: the new one has partially new values and partially the
poison.

Then we've investigated the assembly code: memcpy is generated by our gcc
for the assignement. As the memcpy copies from the lower to upper
addresses, it may happen that the CQE update arrives during the memcpy.
In this case it is possible that the later loads will see new values
(including the status field that is at the highest address).

We've then rewritten the sequence to:

struct nvme_completion cqe;
u16 status = le16_to_cpu(nvmeq->cqes[head].status);
if ((status & 1) != phase)
         break;
cqe = nvmeq->cqes[head];

This sequence works fine for the long runs and doesn't show the corruption.

We're running the driver on a non-mainline architecture (k1), but we've
think that this problem may happen on other architectures too if they use
multiple loads for the memcpy or the compiler generates multiple non
atomic loads with growing addresses.

Did anyone face a similar problem before? Do you think that our analysis
is correct?

The complete patch follows.

Regards,
Marta Rybczynska