[RFC PATCH 10/30] iommu/arm-smmu-v3: Add task contexts
Jean-Philippe Brucker
jean-philippe.brucker at arm.com
Mon Feb 27 11:54:21 PST 2017
Now that we support substreams, add the required infrastructure to use
them. Each device can be attached to multiple address spaces. The default
address space is the domain, and additional ones are tasks. Each task can
be attached to multiple devices as well.
+------------+ +---------------+
| group | | domain |
| +-------+ |
| | | | (c)
(a) ,-----[master]-------------[context 1]---------[task]--. (d)
/ | \ | (b) | | \
/ | `------------[context 2]----. [SMMU]
/ | | | | \ /
[SMMU] +------------+ +---------------+ \ /
\ `-[task]--'
\ +------------+ +---------------+ /
\ | | | | /
\----. | | | /
`----[master]-------------[context 1]----'
| | | |
| +-------+ |
| group | (e) | domain |
+------------+ +---------------+
(a) Add a rbtree of streams in each SMMU instance, indexed by stream ID.
Each stream points to a single master, and masters can be identified by
multiple stream IDs. PCIe endpoints are, for the moment, expected to have
a unique stream ID (RID), but see the discussion about groups below.
Platform devices may issue multiple stream IDs.
(b) Add a rbtree of contexts in each SVM-capable master, indexed by
substream ID. A context is the link between a device and a task. It is
bidirectional since we need to lookup all devices attached to a task when
invalidating a mapping, and all tasks attached to a device when handling a
device fault (PRI).
Both these rbtrees will allow fast lookup of an address space when
handling a fault.
(c) Add a task in each context. We need tasks and contexts to be separate
entities with their own refcounting for at least two reasons. To reduce
the number of mmu notifiers and simplify ASID tracking, tasks are allowed
to be attached to multiple masters. It is also necessary for sane PASID
invalidation but we won't go into that now.
(d) Add a list of tasks in each SMMU instance. This allows to find
existing tasks when binding to a device. We could make it an rbtree, but
it's not clear whether we need this complication yet.
Lifetime of contexts and tasks is the following:
* arm_smmu_alloc_task creates smmu_task and adds it to the list.
* arm_smmu_attach_task allocates an SSID, creates smmu_context and
connects task to master. Context descriptor is written, so the SSID is
active and can be used in transactions.
* users may get krefs to task and contexts temporarily, and release them
with put_task/put_context.
* arm_smmu_detach_task severs the link between context and task, and drops
the task's refcount. If the task was only attached to one master, it is
freed and removed from the list. arm_smmu_detach_task also clears the
context descriptor, after which any transaction with this SSID would
result in a fault.
* arm_smmu_put_context is called later, freeing the dangling context.
All these structures are protected by a single context_lock spinlock for
each SMMU instance. Therefore context_lock also serializes modifications
of the context descriptors. Any update of the krefs must be done while
holding the lock. Reads of lists and context->task as well. As we expect a
lot more reads than writes on the structures introduced by this patch,
especially for handling page requests, we hope to make use of the RCU in
the future.
(e) A group still has a default DMA domain, represented by context 0 (no
SSID). When attaching to a different domain, for example a VFIO domain, we
also want to destroy all active contexts. If VFIO is handing control of a
device to a guest for instance, we do not want the guest to have access to
host tasks through bound contexts. So contexts are conceptually contained
in the domain, although we do not need to maintain any structure for this.
When detaching a domain from a group, all contexts attached to this group
are also detached.
An IOMMU group contains all devices that can't be distinguished from one
another by their IOMMU. For PCIe this may be due to buggy devices, lack of
ACS isolation, legacy or non-transparent bridges. Since PCIe SVM platforms
on ARM have yet to be seen in the wild, legacy issues don't matter and SVM
through NTB is pretty far off.
So for prototyping, we currently bypass the concept of IOMMU groups, since
we expect SVM-capable devices to be properly integrated. This might change
as soon as someone starts taping out devices. There will then be some
complication in attaching a task to a group instead of an individual
device (for instance, finding a consensus for PASID sizes of all devices
in the group, which may be altered by hotplugging a device in a group),
but the current design doesn't prevent this change.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker at arm.com>
---
drivers/iommu/arm-smmu-v3.c | 316 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 314 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2724788157a5..5b4d1f265194 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -701,6 +701,16 @@ struct arm_smmu_device {
/* IOMMU core code handle */
struct iommu_device iommu;
+
+ spinlock_t contexts_lock;
+ struct rb_root streams;
+ struct list_head tasks;
+};
+
+struct arm_smmu_stream {
+ u32 id;
+ struct arm_smmu_master_data *master;
+ struct rb_node node;
};
/* SMMU private data for each master */
@@ -710,6 +720,9 @@ struct arm_smmu_master_data {
struct device *dev;
struct list_head group_head;
+
+ struct arm_smmu_stream *streams;
+ struct rb_root contexts;
};
/* SMMU private data for an IOMMU domain */
@@ -738,6 +751,31 @@ struct arm_smmu_domain {
spinlock_t groups_lock;
};
+struct arm_smmu_task {
+ struct pid *pid;
+
+ struct arm_smmu_device *smmu;
+ struct list_head smmu_head;
+
+ struct list_head contexts;
+
+ struct arm_smmu_s1_cfg s1_cfg;
+
+ struct kref kref;
+};
+
+struct arm_smmu_context {
+ u32 ssid;
+
+ struct arm_smmu_task *task;
+ struct arm_smmu_master_data *master;
+
+ struct list_head task_head;
+ struct rb_node master_node;
+
+ struct kref kref;
+};
+
struct arm_smmu_group {
struct arm_smmu_domain *domain;
struct list_head domain_head;
@@ -1363,7 +1401,6 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master_data *master)
cfg->num_entries = 0;
}
-__maybe_unused
static int arm_smmu_alloc_cd(struct arm_smmu_master_data *master)
{
int ssid;
@@ -1401,7 +1438,6 @@ static int arm_smmu_alloc_cd(struct arm_smmu_master_data *master)
return -ENOSPC;
}
-__maybe_unused
static void arm_smmu_free_cd(struct arm_smmu_master_data *master, u32 ssid)
{
unsigned long l1_idx, idx;
@@ -1961,6 +1997,195 @@ static bool arm_smmu_capable(enum iommu_cap cap)
}
}
+__maybe_unused
+static struct arm_smmu_context *
+arm_smmu_attach_task(struct arm_smmu_task *smmu_task,
+ struct arm_smmu_master_data *master)
+{
+ int ssid;
+ int ret = 0;
+ struct arm_smmu_context *smmu_context, *ctx;
+ struct arm_smmu_device *smmu = master->smmu;
+ struct rb_node **new_node, *parent_node = NULL;
+
+ smmu_context = kzalloc(sizeof(*smmu_context), GFP_KERNEL);
+ if (!smmu_context)
+ return ERR_PTR(-ENOMEM);
+
+ smmu_context->task = smmu_task;
+ smmu_context->master = master;
+ kref_init(&smmu_context->kref);
+
+ spin_lock(&smmu->contexts_lock);
+
+ /* Allocate a context descriptor and SSID */
+ ssid = arm_smmu_alloc_cd(master);
+ if (ssid <= 0) {
+ if (WARN_ON_ONCE(ssid == 0))
+ ret = -EEXIST;
+ else
+ ret = ssid;
+ goto err_free_context;
+ }
+
+ smmu_context->ssid = ssid;
+
+ arm_smmu_write_ctx_desc(master, ssid, &smmu_task->s1_cfg);
+
+ list_add(&smmu_context->task_head, &smmu_task->contexts);
+
+ /* Insert into master context list */
+ new_node = &(master->contexts.rb_node);
+ while (*new_node) {
+ ctx = rb_entry(*new_node, struct arm_smmu_context,
+ master_node);
+ parent_node = *new_node;
+ if (ctx->ssid > ssid) {
+ new_node = &((*new_node)->rb_left);
+ } else if (ctx->ssid < ssid) {
+ new_node = &((*new_node)->rb_right);
+ } else {
+ dev_warn(master->dev, "context %u already exists\n",
+ ctx->ssid);
+ ret = -EEXIST;
+ goto err_remove_context;
+ }
+ }
+
+ rb_link_node(&smmu_context->master_node, parent_node, new_node);
+ rb_insert_color(&smmu_context->master_node, &master->contexts);
+
+ spin_unlock(&smmu->contexts_lock);
+
+ return smmu_context;
+
+err_remove_context:
+ list_del(&smmu_context->task_head);
+ arm_smmu_write_ctx_desc(master, ssid, NULL);
+ arm_smmu_free_cd(master, ssid);
+
+err_free_context:
+ spin_unlock(&smmu->contexts_lock);
+
+ kfree(smmu_context);
+
+ return ERR_PTR(ret);
+}
+
+/* Caller must hold contexts_lock */
+static void arm_smmu_free_context(struct kref *kref)
+{
+ struct arm_smmu_master_data *master;
+ struct arm_smmu_context *smmu_context;
+
+ smmu_context = container_of(kref, struct arm_smmu_context, kref);
+
+ WARN_ON_ONCE(smmu_context->task);
+
+ master = smmu_context->master;
+
+ arm_smmu_free_cd(master, smmu_context->ssid);
+
+ rb_erase(&smmu_context->master_node, &master->contexts);
+
+ kfree(smmu_context);
+}
+
+static void _arm_smmu_put_context(struct arm_smmu_context *smmu_context)
+{
+ kref_put(&smmu_context->kref, arm_smmu_free_context);
+}
+
+__maybe_unused
+static void arm_smmu_put_context(struct arm_smmu_device *smmu,
+ struct arm_smmu_context *smmu_context)
+{
+ spin_lock(&smmu->contexts_lock);
+ _arm_smmu_put_context(smmu_context);
+ spin_unlock(&smmu->contexts_lock);
+}
+
+__maybe_unused
+static struct arm_smmu_task *arm_smmu_alloc_task(struct arm_smmu_device *smmu,
+ struct task_struct *task)
+{
+ struct arm_smmu_task *smmu_task;
+
+ smmu_task = kzalloc(sizeof(*smmu_task), GFP_KERNEL);
+ if (!smmu_task)
+ return ERR_PTR(-ENOMEM);
+
+ smmu_task->smmu = smmu;
+ smmu_task->pid = get_task_pid(task, PIDTYPE_PID);
+ INIT_LIST_HEAD(&smmu_task->contexts);
+ kref_init(&smmu_task->kref);
+
+ spin_lock(&smmu->contexts_lock);
+ list_add(&smmu_task->smmu_head, &smmu->tasks);
+ spin_unlock(&smmu->contexts_lock);
+
+ return smmu_task;
+}
+
+/* Caller must hold contexts_lock */
+static void arm_smmu_free_task(struct kref *kref)
+{
+ struct arm_smmu_device *smmu;
+ struct arm_smmu_task *smmu_task;
+ struct arm_smmu_master_data *master;
+ struct arm_smmu_context *smmu_context, *next;
+
+ smmu_task = container_of(kref, struct arm_smmu_task, kref);
+ smmu = smmu_task->smmu;
+
+ if (WARN_ON_ONCE(!list_empty(&smmu_task->contexts))) {
+ list_for_each_entry_safe(smmu_context, next,
+ &smmu_task->contexts, task_head) {
+ master = smmu_context->master;
+
+ arm_smmu_write_ctx_desc(master, smmu_context->ssid, NULL);
+ smmu_context->task = NULL;
+ list_del(&smmu_context->task_head);
+ }
+ }
+
+ list_del(&smmu_task->smmu_head);
+
+ put_pid(smmu_task->pid);
+ kfree(smmu_task);
+}
+
+static void _arm_smmu_put_task(struct arm_smmu_task *smmu_task)
+{
+ kref_put(&smmu_task->kref, arm_smmu_free_task);
+}
+
+/* Caller must hold contexts_lock */
+static void arm_smmu_detach_task(struct arm_smmu_context *smmu_context)
+{
+ struct arm_smmu_task *smmu_task = smmu_context->task;
+
+ smmu_context->task = NULL;
+ list_del(&smmu_context->task_head);
+ _arm_smmu_put_task(smmu_task);
+
+ arm_smmu_write_ctx_desc(smmu_context->master, smmu_context->ssid, NULL);
+}
+
+__maybe_unused
+static void arm_smmu_put_task(struct arm_smmu_device *smmu,
+ struct arm_smmu_task *smmu_task)
+{
+ spin_lock(&smmu->contexts_lock);
+ _arm_smmu_put_task(smmu_task);
+ spin_unlock(&smmu->contexts_lock);
+}
+
+static bool arm_smmu_master_supports_svm(struct arm_smmu_master_data *master)
+{
+ return false;
+}
+
static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
{
struct arm_smmu_domain *smmu_domain;
@@ -2173,12 +2398,31 @@ static struct arm_smmu_group *arm_smmu_group_alloc(struct iommu_group *group)
static void arm_smmu_detach_dev(struct device *dev)
{
struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
+ struct arm_smmu_device *smmu = master->smmu;
+ struct arm_smmu_context *smmu_context;
+ struct rb_node *node, *next;
master->ste.bypass = true;
if (arm_smmu_install_ste_for_dev(dev->iommu_fwspec) < 0)
dev_warn(dev, "failed to install bypass STE\n");
arm_smmu_write_ctx_desc(master, 0, NULL);
+
+ if (!master->ste.valid)
+ return;
+
+ spin_lock(&smmu->contexts_lock);
+ for (node = rb_first(&master->contexts); node; node = next) {
+ smmu_context = rb_entry(node, struct arm_smmu_context,
+ master_node);
+ next = rb_next(node);
+
+ if (smmu_context->task)
+ arm_smmu_detach_task(smmu_context);
+
+ _arm_smmu_put_context(smmu_context);
+ }
+ spin_unlock(&smmu->contexts_lock);
}
static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
@@ -2418,6 +2662,54 @@ static void arm_smmu_disable_ats(struct arm_smmu_master_data *master)
pci_disable_ats(pdev);
}
+static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
+ struct arm_smmu_master_data *master)
+{
+ int i;
+ int ret = 0;
+ struct arm_smmu_stream *new_stream, *cur_stream;
+ struct rb_node **new_node, *parent_node = NULL;
+ struct iommu_fwspec *fwspec = master->dev->iommu_fwspec;
+
+ master->streams = kcalloc(fwspec->num_ids,
+ sizeof(struct arm_smmu_stream), GFP_KERNEL);
+ if (!master->streams)
+ return -ENOMEM;
+
+ spin_lock(&smmu->contexts_lock);
+ for (i = 0; i < fwspec->num_ids && !ret; i++) {
+ new_stream = &master->streams[i];
+ new_stream->id = fwspec->ids[i];
+ new_stream->master = master;
+
+ new_node = &(smmu->streams.rb_node);
+ while (*new_node) {
+ cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
+ node);
+ parent_node = *new_node;
+ if (cur_stream->id > new_stream->id) {
+ new_node = &((*new_node)->rb_left);
+ } else if (cur_stream->id < new_stream->id) {
+ new_node = &((*new_node)->rb_right);
+ } else {
+ dev_warn(master->dev,
+ "stream %u already in tree\n",
+ cur_stream->id);
+ ret = -EINVAL;
+ break;
+ }
+ }
+
+ if (!ret) {
+ rb_link_node(&new_stream->node, parent_node, new_node);
+ rb_insert_color(&new_stream->node, &smmu->streams);
+ }
+ }
+ spin_unlock(&smmu->contexts_lock);
+
+ return ret;
+}
+
static struct iommu_ops arm_smmu_ops;
static int arm_smmu_add_device(struct device *dev)
@@ -2452,6 +2744,8 @@ static int arm_smmu_add_device(struct device *dev)
master->smmu = smmu;
master->dev = dev;
fwspec->iommu_priv = master;
+
+ master->contexts = RB_ROOT;
}
/* Check the SIDs are in range of the SMMU and our stream table */
@@ -2475,6 +2769,9 @@ static int arm_smmu_add_device(struct device *dev)
ats_enabled = !arm_smmu_enable_ats(master);
+ if (arm_smmu_master_supports_svm(master))
+ arm_smmu_insert_master(smmu, master);
+
group = iommu_group_get_for_dev(dev);
if (IS_ERR(group)) {
ret = PTR_ERR(group);
@@ -2510,6 +2807,7 @@ static void arm_smmu_remove_device(struct device *dev)
struct arm_smmu_device *smmu;
struct iommu_group *group;
unsigned long flags;
+ int i;
if (!fwspec || fwspec->ops != &arm_smmu_ops)
return;
@@ -2520,6 +2818,16 @@ static void arm_smmu_remove_device(struct device *dev)
arm_smmu_detach_dev(dev);
if (master) {
+ if (master->streams) {
+ spin_lock(&smmu->contexts_lock);
+ for (i = 0; i < fwspec->num_ids; i++)
+ rb_erase(&master->streams[i].node,
+ &smmu->streams);
+ spin_unlock(&smmu->contexts_lock);
+
+ kfree(master->streams);
+ }
+
group = iommu_group_get(dev);
smmu_group = to_smmu_group(group);
@@ -2820,6 +3128,10 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
{
int ret;
+ spin_lock_init(&smmu->contexts_lock);
+ smmu->streams = RB_ROOT;
+ INIT_LIST_HEAD(&smmu->tasks);
+
ret = arm_smmu_init_queues(smmu);
if (ret)
return ret;
--
2.11.0
More information about the linux-arm-kernel
mailing list