[PATCH 12/41] mm: add per-VMA lock and helper functions to control it

Wed Jan 18 07:11:29 PST 2023

On Wed 18-01-23 14:23:32, Jann Horn wrote:
> On Wed, Jan 18, 2023 at 1:28 PM Michal Hocko <mhocko at suse.com> wrote:
> > On Tue 17-01-23 19:02:55, Jann Horn wrote:
> > > +locking maintainers
> > >
> > > On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb at google.com> wrote:
> > > > Introduce a per-VMA rw_semaphore to be used during page fault handling
> > > > instead of mmap_lock. Because there are cases when multiple VMAs need
> > > > to be exclusively locked during VMA tree modifications, instead of the
> > > > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock
> > > > exclusively and setting vma->lock_seq to the current mm->lock_seq. When
> > > > mmap_write_lock holder is done with all modifications and drops mmap_lock,
> > > > it will increment mm->lock_seq, effectively unlocking all VMAs marked as
> > > > locked.
> > > [...]
> > > > +static inline void vma_read_unlock(struct vm_area_struct *vma)
> > > > +{
> > > > +       up_read(&vma->lock);
> > > > +}
> > >
> > > One thing that might be gnarly here is that I think you might not be
> > > allowed to use up_read() to fully release ownership of an object -
> > > from what I remember, I think that up_read() (unlike something like
> > > spin_unlock()) can access the lock object after it's already been
> > > acquired by someone else.
> >
> > Yes, I think you are right. From a look into the code it seems that
> > the UAF is quite unlikely as there is a ton of work to be done between
> > vma_write_lock used to prepare vma for removal and actual removal.
> > That doesn't make it less of a problem though.
> >
> > > So if you want to protect against concurrent
> > > deletion, this might have to be something like:
> > >
> > > rcu_read_lock(); /* keeps vma alive */
> > > up_read(&vma->lock);
> > > rcu_read_unlock();
> > >
> > > But I'm not entirely sure about that, the locking folks might know better.
> >
> > I am not a locking expert but to me it looks like this should work
> > because the final cleanup would have to happen rcu_read_unlock.
> >
> > Thanks, I have completely missed this aspect of the locking when looking
> > into the code.
> >
> > Btw. looking at this again I have fully realized how hard it is actually
> > to see that vm_area_free is guaranteed to sync up with ongoing readers.
> > vma manipulation functions like __adjust_vma make my head spin. Would it
> > make more sense to have a rcu style synchronization point in
> > vm_area_free directly before call_rcu? This would add an overhead of
> > uncontended down_write of course.
> 
> Something along those lines might be a good idea, but I think that
> rather than synchronizing the removal, it should maybe be something
> that splats (and bails out?) if it detects pending readers. If we get
> to vm_area_free() on a VMA that has pending readers, we might already
> be in a lot of trouble because the concurrent readers might have been
> traversing page tables while we were tearing them down or fun stuff
> like that.
> 
> I think maybe Suren was already talking about something like that in
> another part of this patch series but I don't remember...

This http://lkml.kernel.org/r/20230109205336.3665937-27-surenb@google.com?

-- 
Michal Hocko
SUSE Labs