nvme-rdma: possible issue around error injection debugfs entries
Marcin Dziegielewski
marcin.dziegielewski at dell.com
Wed Jan 8 06:30:45 PST 2025
Hi Experts,
During testing one of the corner cases of our NVMe RDMA use case, we discovered
that in the event of many failed connection attempts, the inode_cache and dentry
slab can grow to a huge size (and is not reclaimable). Consequently, if we wait
long enough, we can reach an OOM (Out of Memory) condition.
Example from crash:
ffff88dd2abfca80 592 29007830 29007936 537184 32k inode_cache
During debugging, we noticed that in the case of RDMA connections, these debugfs
entries created in nvme_fault_inject_init are created before a successful RDMA
connection. So, with many failures, these entries are created and removed
repeatedly. This behavior likely causes some troubles for slab/debugfs.
So far, we have worked around this issue by moving the point of creating these
entries to after a successful connection, and this has fixed the issue.
We are wondering if a patch with the same or a similar approach can be applied
on upstream, or another approach (for example: raising the issue with debugfs
maintainers) should be chosen here.
Thanks,
Marcin
More information about the Linux-nvme
mailing list