[PATCH v5 4/5] sd: limit to use write life hints

Kanchan Joshi joshi.k at samsung.com
Tue Sep 17 09:03:44 PDT 2024


On 9/17/2024 11:50 AM, Christoph Hellwig wrote:
>>> But if we increase this to a variable number of hints that don't have
>>> any meaning (and even if that is just the rough order of the temperature
>>> hints assigned to them), that doesn't really work.  We'll need an API
>>> to check if these stream hints are supported and how many of them,
>>> otherwise the applications can't make any sensible use of them.
>> - Since writes are backward compatible, nothing bad happens if the
>> passed placement-hint value is not supported. Maybe desired outcome (in
>> terms of WAF reduction) may not come but that's not a kernel problem
>> anyway. It's rather about how well application is segregating and how
>> well device is doing its job.
> What do you mean with "writes are backward compatible" ?
> 

Writes are not going to fail even if you don't pass the placement-id or 
pass a placement-id that is not valid. FDP-enabled SSD will not shout 
and complete writes fine even with FDP-unaware software.

I think that part is same as how Linux write hints behave ATM. Writes 
don't have to carry the lifetime hint always. And when they do, the hint 
value never becomes the reason of failure (e.g. life hints on NVMe 
vanish in the thin air rather than causing any failure).

>> - Device is perfectly happy to work with numbers (0 to 256 in current
>> spec) to produce some value (i.e., WAF reduction). Any extra
>> semantics/abstraction on these numbers only adds to the work without
>> increasing that value. If any application needs that, it's free to
>> attach any meaning/semantics to these numbers.
> If the device (or file system, which really needs to be in control
> for actual files vs just block devices) does not support all 256
> we need to reduce them to less than that.  The kernel can help with
> that a bit if the streams have meanings (collapsing temperature levels
> that are close), but not at all if they don't have meanings. 

Current patch (nvme) does what you mentioned above.
Pasting the fragment that maps potentially large placement-hints to the 
last valid placement-id.

+static inline void nvme_assign_placement_id(struct nvme_ns *ns,
+					struct request *req,
+					struct nvme_command *cmd)
+{
+	u8 h = umin(ns->head->nr_plids - 1,
+				WRITE_PLACEMENT_HINT(req->write_hint));
+
+	cmd->rw.control |= cpu_to_le16(NVME_RW_DTYPE_DPLCMT);
+	cmd->rw.dsmgmt |= cpu_to_le32(ns->head->plids[h] << 16);
+}

But this was just an implementation choice (and not a failure avoidance 
fallback).



More information about the Linux-nvme mailing list