[PATCH v9 13/13] lpfc: vmid: Introducing vmid in io path.

Hannes Reinecke hare at suse.de
Mon Apr 12 06:27:14 BST 2021


On 4/10/21 5:00 PM, James Smart wrote:
> On 4/8/2021 1:46 AM, Hannes Reinecke wrote:
>> On 4/7/21 1:06 AM, Muneendra wrote:
>>> From: Gaurav Srivastava <gaurav.srivastava at broadcom.com>
> ...
>>> +        /* while the read lock was released, in case the entry was */
>>> +        /* added by other context or is in process of being added */
>>> +        if (vmp && vmp->flag & LPFC_VMID_REGISTERED) {
>>> +            lpfc_vmid_update_entry(vport, cmd, vmp, tag);
>>> +            write_unlock(&vport->vmid_lock);
>>> +            return 0;
>>> +        } else if (vmp && vmp->flag & LPFC_VMID_REQ_REGISTER) {
>>> +            write_unlock(&vport->vmid_lock);
>>> +            return -EBUSY;
>>> +        }
>>> +
>>> +        /* else search and allocate a free slot in the hash table */
>>> +        if (vport->cur_vmid_cnt < vport->max_vmid) {
>>> +            for (i = 0; i < vport->max_vmid; ++i) {
>>> +                vmp = vport->vmid + i;
>>> +                if (vmp->flag == LPFC_VMID_SLOT_FREE) {
>>> +                    vmp = vport->vmid + i;
> 
> delete this last line and adjust parens - really odd that it replicates 
> the assignment 2 lines earlier.
> 
>>> +                    break;
>>> +                }
>>> +            }
> 
> I would prefer that if the table is expended and no slots free, that 
> -ENOMEM is returned here. Rather than falling down below and qualifying 
> by slot free, then by pending (set only if slot free).  I can't believe 
> there is a reason the idle timer has to be started if no slots free as 
> all the other fail cases don't bother with it.
> 
> This also helps indentation levels below.
> 
>>> +        } else {
>>> +            write_unlock(&vport->vmid_lock);
>>> +            return -ENOMEM;
>>> +        }
>>> +
>>> +        if (vmp && (vmp->flag == LPFC_VMID_SLOT_FREE)) {
>>> +            /* Add the vmid and register  */
>>> +            lpfc_put_vmid_in_hashtable(vport, hash, vmp);
>>> +            vmp->vmid_len = len;
>>> +            memcpy(vmp->host_vmid, uuid, vmp->vmid_len);
>>> +            vmp->io_rd_cnt = 0;
>>> +            vmp->io_wr_cnt = 0;
>>> +            vmp->flag = LPFC_VMID_SLOT_USED;
>>> +
>>> +            vmp->delete_inactive =
>>> +                vport->vmid_inactivity_timeout ? 1 : 0;
>>> +
>>> +            /* if type priority tag, get next available vmid */
>>> +            if (lpfc_vmid_is_type_priority_tag(vport))
>>> +                lpfc_vmid_assign_cs_ctl(vport, vmp);
>>> +
>>> +            /* allocate the per cpu variable for holding */
>>> +            /* the last access time stamp only if vmid is enabled */
>>> +            if (!vmp->last_io_time)
>>> +                vmp->last_io_time =
>>> +                    __alloc_percpu(sizeof(u64),
>>> +                           __alignof__(struct
>>> +                                   lpfc_vmid));
>>> +
>>> +            /* registration pending */
>>> +            pending = 1;
>>> +        } else {
>>> +            rc = -ENOMEM;
>>> +        }
>>> +        write_unlock(&vport->vmid_lock);
>>> +
>>> +        /* complete transaction with switch */
>>> +        if (pending) {
>>> +            if (lpfc_vmid_is_type_priority_tag(vport))
>>> +                rc = lpfc_vmid_uvem(vport, vmp, true);
>>> +            else
>>> +                rc = lpfc_vmid_cmd(vport,
>>> +                           SLI_CTAS_RAPP_IDENT,
>>> +                           vmp);
>>> +            if (!rc) {
>>> +                write_lock(&vport->vmid_lock);
>>> +                vport->cur_vmid_cnt++;
>>> +                vmp->flag |= LPFC_VMID_REQ_REGISTER;
>>> +                write_unlock(&vport->vmid_lock);
>>> +            }
>>> +        }
>>> +
>>> +        /* finally, enable the idle timer once */
>>> +        if (!(vport->phba->pport->vmid_flag & LPFC_VMID_TIMER_ENBLD)) {
>>> +            mod_timer(&vport->phba->inactive_vmid_poll,
>>> +                  jiffies +
>>> +                  msecs_to_jiffies(1000 * LPFC_VMID_TIMER));
>>> +            vport->phba->pport->vmid_flag |= LPFC_VMID_TIMER_ENBLD;
>>> +        }
>>> +    }
>>> +    return rc;
>>> +}
>>> +
>>> +/*
>>> + * lpfc_is_command_vm_io - get the uuid from blk cgroup
>>> + * @cmd:Pointer to scsi_cmnd data structure
>>> + * Returns uuid if present if not null
>>> + */
>>> +static char *lpfc_is_command_vm_io(struct scsi_cmnd *cmd)
>>> +{
>>> +    char *uuid = NULL;
>>> +
>>> +    if (cmd->request) {
>>> +        if (cmd->request->bio)
>>> +            uuid = blkcg_get_fc_appid(cmd->request->bio);
>>> +    }
>>> +    return uuid;
>>> +}
>>> +
>>>   /**
>>>    * lpfc_queuecommand - scsi_host_template queuecommand entry point
>>>    * @shost: kernel scsi host pointer.
>>> @@ -5288,6 +5437,7 @@ lpfc_queuecommand(struct Scsi_Host *shost, 
>>> struct scsi_cmnd *cmnd)
>>>       int err, idx;
>>>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
>>>       uint64_t start = 0L;
>>> +    u8 *uuid = NULL;
>>>       if (phba->ktime_on)
>>>           start = ktime_get_ns();
>>> @@ -5415,6 +5565,25 @@ lpfc_queuecommand(struct Scsi_Host *shost, 
>>> struct scsi_cmnd *cmnd)
>>>       }
>>> +    /* check the necessary and sufficient condition to support VMID */
>>> +    if (lpfc_is_vmid_enabled(phba) &&
>>> +        (ndlp->vmid_support ||
>>> +         phba->pport->vmid_priority_tagging ==
>>> +         LPFC_VMID_PRIO_TAG_ALL_TARGETS)) {
>>> +        /* is the IO generated by a VM, get the associated virtual */
>>> +        /* entity id */
>>> +        uuid = lpfc_is_command_vm_io(cmnd);
>>> +
>>> +        if (uuid) {
>>> +            err = lpfc_vmid_get_appid(vport, uuid, cmnd,
>>> +                (union lpfc_vmid_io_tag *)
>>> +                    &lpfc_cmd->cur_iocbq.vmid_tag);
>>> +            if (!err)
>>> +                lpfc_cmd->cur_iocbq.iocb_flag |= LPFC_IO_VMID;
>>> +        }
>>> +    }
>>> +
>>> +    atomic_inc(&ndlp->cmd_pending);
>>>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
>>>       if (unlikely(phba->hdwqstat_on & LPFC_CHECK_SCSI_IO))
>>>           this_cpu_inc(phba->sli4_hba.c_stat->xmt_io);
>>>
>> And that's the bit which I don't particular like.
>>
>> Essentially we'll have to inject additional ELS commands _on each I/O_ 
>> to get a valid VMID.
>> Where there are _so_ many things which might get wrong, causing an I/O 
>> stall.
> 
> I don't follow you - yes ELS's are injected when there isn't an entry 
> for the VM, but once there is, there isn't further ELS's. That is the 
> cost. as we don't know what uuid's I/O will be sent to before hand, so 
> it has to be siphoned off during the I/O flow.  I/O's can be sent 
> non-tagged while the ELS's are completing (and there aren't multiple 
> sets of ELS's as long as it's the same uuid), which is fine.
> 
> so I disagree with "_on each I/O_".
> 
Yeah, that was an exaggeration.
Not on each I/O; I should've said 'on each unregistered I/O'.

This really is my unhappiness with the entire VMID specification,
where you can time out VMIDs after a certain period, and hence never
be sure if any particular I/O really _is_ registered.

>> I would have vastly preferred if we could _avoid_ having to do 
>> additional ELS commands for VMID registration in the I/O path
>> (ie only allow for I/O with a valid VMID), and reject the I/O 
>> otherwise until VMID registration is complete.
>>
>> IE return 'BUSY' (or even a command retry?) when no valid VMID for 
>> this particular I/O is found, register the VMID (preferably in another 
>> thread), and restart the queue once the VMID is registered.
> 
> Why does it bother you with the I/O path ?  It's actually happening in 
> parallel with no real relation between the two.
> 
> I seriously disagree with reject if no vmid tag. Why?  what do you gain 
> ? This actually introduces more disruption than the parallel flow with 
> the ELSs.   Also, after link bounce, where all VMID's have to be done, 
> it adds a stall window after link up right when things are trying to 
> resume after rports rejoin. Why add the i/o rebouncing ? There no real 
> benefit. Issuing a few untagged I/O doesn't hurt.
> 
That indeed is a valid point. I retract my objection.

Reviewed-by: Hannes Reinecke <hare at suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



More information about the Linux-nvme mailing list