[PATCH v16 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O

Manivannan Sadhasivam mani at kernel.org
Thu May 7 02:55:10 PDT 2026


On Mon, Apr 27, 2026 at 11:15:33AM +0200, Bartosz Golaszewski wrote:
> This missed the v7.1 cycle so let's try to get it in for v7.2.
> 
> Merging strategy: there are build-time dependencies between the crypto
> and DMA patches so the best approach is for Vinod to create an immutable
> branch with the DMA part pulled in by the crypto tree.
> 
> This iteration continues to build on top of v12 but uses the BAM's NWD
> bit on data descriptors as suggested by Stephan. To that end, there are
> some more changes like reversing the order of command and data
> descriptors queuedy by the QCE driver.
> 
> Currently the QCE crypto driver accesses the crypto engine registers
> directly via CPU. Trust Zone may perform crypto operations simultaneously
> resulting in a race condition. To remedy that, let's introduce support
> for BAM locking/unlocking to the driver. The BAM driver will now wrap
> any existing issued descriptor chains with additional descriptors
> performing the locking when the client starts the transaction
> (dmaengine_issue_pending()). The client wanting to profit from locking
> needs to switch to performing register I/O over DMA and communicate the
> address to which to perform the dummy writes via a call to
> dmaengine_desc_attach_metadata().
> 
> In the specific case of the BAM DMA this translates to sending command
> descriptors performing dummy writes with the relevant flags set. The BAM
> will then lock all other pipes not related to the current pipe group, and
> keep handling the current pipe only until it sees the the unlock bit.
> 
> In order for the locking to work correctly, we also need to switch to
> using DMA for all register I/O.
> 
> On top of this, the series contains some additional tweaks and
> refactoring.
> 
> The goal of this is not to improve the performance but to prepare the
> driver for supporting decryption into secure buffers in the future.
> 
> Tested with tcrypt.ko, kcapi and cryptsetup.
> 
> Shout out to Daniel and Udit from Qualcomm for helping me out with some
> DMA issues we encountered.
> 
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski at linaro.org>
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski at oss.qualcomm.com>

For the whole series,

Reviewed-by: Manivannan Sadhasivam <mani at kernel.org>

Thanks for incorporating all the comments, Bart!

- Mani

> ---
> Changes in v16:
> - Fix a reported race between dma_map_sg() called with spinlock taken
>   and the corresponding dma_unmap_sg() called without it by moving the
>   descriptor locking data into the descriptor struct
> - Also queue the TX data descriptors before the command descriptors to
>   match what downstream is doing
> - Tweak commit messages
> - Rebase on top of v7.1-rc1
> - Link to v15: https://patch.msgid.link/20260402-qcom-qce-cmd-descr-v15-0-98b5361f7ed7@oss.qualcomm.com
> 
> Changes in v15:
> - Extend the descriptor metadata struct to also carry the channel's
>   transfer direction and stop using dmaengine_slave_config() for that
> - Link to v14: https://patch.msgid.link/20260323-qcom-qce-cmd-descr-v14-0-f323af411274@oss.qualcomm.com
> 
> Changes in v14:
> - Don't return an error to a client which wants to use locking on BAM
>   that doesn't support it
> - Add a comment describing the DMA descriptor metadata structure
> - Fix memory leaks
> - Remove leftovers from previous iterations
> - Propagate errors from dma_cookie_assign() when setting up lock
>   descriptors
> - Link to v13: https://patch.msgid.link/20260317-qcom-qce-cmd-descr-v13-0-0968eb4f8c40@oss.qualcomm.com
> 
> Changes in v13:
> - As part of the DMA changes in the QCE driver: reverse the order of
>   queueing the descriptors in the QCE driver: queue command descriptors
>   with all the register writes first, followed by all the data descriptors,
>   this is in line with the recommandations from the BAM HPG
> - Set the NWD (notify-when-done) bit (DMA_PREP_FENCE in dmaengine
>   parlance) on the data descriptors to ensure that the UNLOCK descriptor
>   will not be processed until after they have been processed by the
>   engine. While technically the NWD bit is only needed on the final data
>   descriptor, it's hard to tell which one *will* be the last from the
>   driver's point-of-view and both the downstream driver as well as
>   the Qualcomm TZ against which we want to synchronize sets NWD on every
>   data descriptor,
> - Revert to creating the LOCK/UNLOCK command descriptor pair in one
>   place now that the NWD bit is in place,
> - Link to v12: https://patch.msgid.link/20260310-qcom-qce-cmd-descr-v12-0-398f37f26ef0@oss.qualcomm.com
> 
> Changes in v12:
> - Wait until the transaction is done before queueing the UNLOCK command
>   descriptor
> - Use descriptor metadata for communicating the scratchpad address to
>   the BAM driver
> - To that end: reverse the order of the series (first BAM, then QCE) to
>   maintain bisectability
> - Unmap buffers used for dummy writes after the transaction
> - Link to v11: https://patch.msgid.link/20260302-qcom-qce-cmd-descr-v11-0-4bf1f5db4802@oss.qualcomm.com
> 
> Changes in v11:
> - Use new approach, not requiring the client to be involved in locking.
> - Add a patch constifying dma_descriptor_metadata_ops
> - Rebase on top of v7.0-rc1
> - Link to v10: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-0-ff7e4bf7dad4@oss.qualcomm.com
> 
> Changes in v10:
> - Move DESC_FLAG_(UN)LOCK BIT definitions from patch 2 to 3
> - Add a patch constifying the dma engine metadata as the first in the
>   series
> - Use the VERSION register for dummy lock/unlock writes
> - Link to v9: https://lore.kernel.org/r/20251128-qcom-qce-cmd-descr-v9-0-9a5f72b89722@linaro.org
> 
> Changes in v9:
> - Drop the global, generic LOCK/UNLOCK flags and instead use DMA
>   descriptor metadata ops to pass BAM-specific information from the QCE
>   to the DMA engine
> - Link to v8: https://lore.kernel.org/r/20251106-qcom-qce-cmd-descr-v8-0-ecddca23ca26@linaro.org
> 
> Changes in v8:
> - Rework the command descriptor logic and drop a lot of unneeded code
> - Use the physical address for BAM command descriptor access, not the
>   mapped DMA address
> - Fix the problems with iommu faults on newer platforms
> - Generalize the LOCK/UNLOCK flags in dmaengine and reword the docs and
>   commit messages
> - Make the BAM locking logic stricter in the DMA engine driver
> - Add some additional minor QCE driver refactoring changes to the series
> - Lots of small reworks and tweaks to rebase on current mainline and fix
>   previous issues
> - Link to v7: https://lore.kernel.org/all/20250311-qce-cmd-descr-v7-0-db613f5d9c9f@linaro.org/
> 
> Changes in v7:
> - remove unused code: writing to multiple registers was not used in v6,
>   neither were the functions for reading registers over BAM DMA-
> - remove
> - don't read the SW_VERSION register needlessly in the BAM driver,
>   instead: encode the information on whether the IP supports BAM locking
>   in device match data
> - shrink code where possible with logic modifications (for instance:
>   change the implementation of qce_write() instead of replacing it
>   everywhere with a new symbol)
> - remove duplicated error messages
> - rework commit messages
> - a lot of shuffling code around for easier review and a more
>   streamlined series
> - Link to v6: https://lore.kernel.org/all/20250115103004.3350561-1-quic_mdalam@quicinc.com/
> 
> Changes in v6:
> - change "BAM" to "DMA"
> - Ensured this series is compilable with the current Linux-next tip of
>   the tree (TOT).
> 
> Changes in v5:
> - Added DMA_PREP_LOCK and DMA_PREP_UNLOCK flag support in separate patch
> - Removed DMA_PREP_LOCK & DMA_PREP_UNLOCK flag
> - Added FIELD_GET and GENMASK macro to extract major and minor version
> 
> Changes in v4:
> - Added feature description and test hardware
>   with test command
> - Fixed patch version numbering
> - Dropped dt-binding patch
> - Dropped device tree changes
> - Added BAM_SW_VERSION register read
> - Handled the error path for the api dma_map_resource()
>   in probe
> - updated the commit messages for batter redability
> - Squash the change where qce_bam_acquire_lock() and
>   qce_bam_release_lock() api got introduce to the change where
>   the lock/unlock flag get introced
> - changed cover letter subject heading to
>   "dmaengine: qcom: bam_dma: add cmd descriptor support"
> - Added the very initial post for BAM lock/unlock patch link
>   as v1 to track this feature
> 
> Changes in v3:
> - https://lore.kernel.org/lkml/183d4f5e-e00a-8ef6-a589-f5704bc83d4a@quicinc.com/
> - Addressed all the comments from v2
> - Added the dt-binding
> - Fix alignment issue
> - Removed type casting from qce_write_reg_dma()
>   and qce_read_reg_dma()
> - Removed qce_bam_txn = dma->qce_bam_txn; line from
>   qce_alloc_bam_txn() api and directly returning
>   dma->qce_bam_txn
> 
> Changes in v2:
> - https://lore.kernel.org/lkml/20231214114239.2635325-1-quic_mdalam@quicinc.com/
> - Initial set of patches for cmd descriptor support
> - Add client driver to use BAM lock/unlock feature
> - Added register read/write via BAM in QCE Crypto driver
>   to use BAM lock/unlock feature
> 
> ---
> Bartosz Golaszewski (12):
>       dmaengine: constify struct dma_descriptor_metadata_ops
>       dmaengine: qcom: bam_dma: convert tasklet to a BH workqueue
>       dmaengine: qcom: bam_dma: Extend the driver's device match data
>       dmaengine: qcom: bam_dma: Add pipe_lock_supported flag support
>       dmaengine: qcom: bam_dma: add support for BAM locking
>       crypto: qce - Include algapi.h in the core.h header
>       crypto: qce - Remove unused ignore_buf
>       crypto: qce - Simplify arguments of devm_qce_dma_request()
>       crypto: qce - Use existing devres APIs in devm_qce_dma_request()
>       crypto: qce - Map crypto memory for DMA
>       crypto: qce - Add BAM DMA support for crypto register I/O
>       crypto: qce - Communicate the base physical address to the dmaengine
> 
>  drivers/crypto/qce/aead.c        |   8 +-
>  drivers/crypto/qce/common.c      |  20 ++--
>  drivers/crypto/qce/core.c        |  28 ++++-
>  drivers/crypto/qce/core.h        |  11 ++
>  drivers/crypto/qce/dma.c         | 163 +++++++++++++++++++++++------
>  drivers/crypto/qce/dma.h         |  11 +-
>  drivers/crypto/qce/sha.c         |   8 +-
>  drivers/crypto/qce/skcipher.c    |   8 +-
>  drivers/dma/qcom/bam_dma.c       | 217 ++++++++++++++++++++++++++++++++++-----
>  drivers/dma/ti/k3-udma.c         |   2 +-
>  drivers/dma/xilinx/xilinx_dma.c  |   2 +-
>  include/linux/dma/qcom_bam_dma.h |  14 +++
>  include/linux/dmaengine.h        |   2 +-
>  13 files changed, 404 insertions(+), 90 deletions(-)
> ---
> base-commit: 06ae5ec2a5f35da6b24d404d16310ee3553dba6f
> change-id: 20251103-qcom-qce-cmd-descr-c5e9b11fe609
> 
> Best regards,
> -- 
> Bartosz Golaszewski <bartosz.golaszewski at oss.qualcomm.com>
> 

-- 
மணிவண்ணன் சதாசிவம்



More information about the linux-arm-kernel mailing list