[SPAM]Re: [PATCH] ring-buffer: Do not read at &event->array[0] if it across the page

Fri Sep 8 00:51:36 PDT 2023

On Thu, 2023-09-07 at 08:14 -0400, Steven Rostedt wrote:
> On Thu, 7 Sep 2023 10:10:57 +0000
> Tze-nan Wu (吳澤南) <Tze-nan.Wu at mediatek.com> wrote:
> 
> > > Same as my thought,
> > > since every time the reader try to access the address in next
> > > page,
> > > the below condition hold in rb_iter_head_event function:
> > > 
> > >         if (iter->page_stamp != iter_head_page->page->time_stamp
> > > ||
> > >                 commit > rb_page_commit(iter_head_page))
> > >                         goto reset;
> > > 
> > > I observe the result by the debug patch provided below:
> > > 
> > >         @@ -2378,6 +2378,19 @@ rb_iter_head_event(struct
> > > ring_buffer_iter *iter)
> > >                 commit = rb_page_commit(iter_head_page);
> > >                 smp_rmb();
> > >         +       if ((iter->head >= 0xFECUL) && commit == 0xFF0UL)
> > > {
> > >         +               pr_info("rbdbg: cpu = %d, event = 0x%lx,
> > > iter-  
> > > > head = 0x%lx,\  
> > > 
> > >         +                       commit = 0xFF0, type = 0x%x, ts =
> > > 0x%x,
> > > array addr = 0x%lx\n",\
> > >         +                       iter->cpu_buffer->cpu, (unsigned
> > > long)event, iter->head,\
> > >         +                       event->type_len, event-
> > > >time_delta,
> > > (unsigned long)(&(event->array[0])));
> > >         +               mdelay(500);
> > >         +               pr_info("rbdbg2: cpu = %d, event = 0x%lx,
> > > iter-  
> > > > head = 0x%lx,\  
> > > 
> > >         +                       commit = 0xFF0, type = 0x%x, ts =
> > > 0x%x,
> > > array addr = 0x%lx\n",\
> > >         +                       iter->cpu_buffer->cpu, (unsigned
> > > long)event, iter->head,\
> > >         +                       event->type_len, event-
> > > >time_delta,
> > > (unsigned long)(&(event->array[0])));
> > >         +               if (iter->page_stamp != iter_head_page-
> > > >page-  
> > > > time_stamp || commit > rb_page_commit(iter_head_page))  
> > > 
> > >         +                       pr_info("rbdbg: writer corrupt
> > > reader\n");
> > >         +       }
> > >                 event = __rb_page_index(iter_head_page, iter-
> > > >head);
> > >                 length = rb_event_length(event);
> > > 
> > > Note that the mdelay(500) in the debug patch can reproduce the
> > > issue
> > > easier with same test in my environmnet,
> > > I am now able to reproduce the issue within 15 minutes if the
> > > debug
> > > patch on.
> > >   
> > 
> > The debug patch may give something similar to the below log just
> > before
> > the invalid access happened, for me it looks like the padding event
> > had
> > just corrupted by the writer before the reader invoke
> > rb_event_length
> > function on it.
> > 
> > [  338.156772] cat: [name:ring_buffer&]rbdbg: cpu = 0, event =
> > 0x????????????dffc, iter->head = 0xfec, commit = 0xFF0, type =
> > 0x1d, ts
> > = 0x0, array addr = 0x????????????e000
> > [  338.656796] cat: [name:ring_buffer&]rbdbg2: cpu = 0, event =
> > 0x????????????dffc, iter->head = 0xfec, commit = 0xFF0, type = 0x0,
> > ts
> > = 0x0, array addr = 0x????????????e000
> > [  338.656803] cat: [name:ring_buffer&]rbdbg: writer corrupt reader
> > [  338.656810] cat:
> > [name:report&]=====================================================
> > ====
> > =========
> > [  338.656819] cat: [name:report&]BUG: KASAN: invalid-access in
> > rb_event_length
> > 
> > 
> > > > > Since the content data start at address 0x71FFFF8111A17FFC
> > > > > are  
> > > > 
> > > > 0xFFFFFFC0.  
> > > > > event->type will be interpret as 0x0, than the reader will
> > > > > try to  
> > > > 
> > > > get the  
> > > > > length by accessing event->array[0], which is an invalid
> > > > > address:
> > > > >     &event->array[0] = 0x71FFFF8111A18000
> > > > > 
> > > > > Signed-off-by: Tze-nan Wu <Tze-nan.Wu at mediatek.com>
> > > > > ---
> > > > > Following patch may not become a solution, it merely checks
> > > > > if
> > > > > the  
> > > > 
> > > > address  
> > > > > to be accessed is valid or not within the rb_event_length
> > > > > before  
> > > > 
> > > > access.  
> > > > > And not sure if there is any side-effect it can lead to.
> > > > > 
> > > > > I am curious about what a better solution for this issue
> > > > > would
> > > > > look  
> > > > 
> > > > like.  
> > > > > Should we address the problem from the writer or the reader?
> > > > > 
> > > > > Also I wonder if the cause of the issue is exactly as I
> > > > > suspected.
> > > > > Any Suggestion will be appreciated.  
> > > > 
> > > > I guess this depends on if you have the fixes or not?
> > > >   
> > > 
> > > yes, I could try to pick the patches that only included in
> > > mainline
> > > but
> > > not in kernel-6.1.52 for ring_buffer.c file,
> > > and do the same test to see if the issue is still reproducible.
> > >   
> > > > > 
> > > > > Test below can reproduce the issue in 2 hours on kernel-
> > > > > 6.1.24:
> > > > >     $cd /sys/kernel/tracing/
> > > > >     # make the reader and writer race more through resize
> > > > > the  
> > > > 
> > > > buffer to 8kb  
> > > > >     $echo 8 > buffer_size_kn
> > > > >     # enable all events
> > > > >     $echo 1 > event/enable
> > > > >     # enable trace
> > > > >     $echo 1 > tracing_on
> > > > >  
> > > > >     # write and run a script that keep reading trace
> > > > >     $./read_trace.sh
> > > > > 
> > > > >     ``` read_trace.sh
> > > > >        while :
> > > > >        do
> > > > >            cat /sys/kernel/tracing/trace > /dev/null
> > > > >        done
> > > > > 
> > > > >     ```  
> > > > 
> > > > Thanks, I'll look at that when I finish debugging the eventfs
> > > > bug.
> > > > 
> > > > -- Steve  
> > > 
> > > Also thank for your reply,
> > >   
> > 
> > And the temporary fix patch in my first mail should have some
> > modified
> > as shown below.
> > +               if (((unsigned long)event & 0xfffUL) >= PAGE_SIZE -
> > 4)
> >                                              ^^^^^^
> >                                             PAGE_SIZE-1
> > +                       return -1;
> 
> 
> Your email is getting really hard to read due to the line wrap
> formatting.
> 
> Anyway, can you try this patch?
> 
> -- Steve
> 

Sorry for the inconvenience, will be carefull next time.

The patch has passed my test in my environment, no more KASAN report
during the same test.
Thanks, and I've learned a lot.

-- Tze-nan

> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 72ccf75defd0..5b0653378089 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2390,6 +2390,10 @@ rb_iter_head_event(struct ring_buffer_iter
> *iter)
>  	 */
>  	commit = rb_page_commit(iter_head_page);
>  	smp_rmb();
> +	/* An event needs to be at least 8 bytes in size */
> +	if (iter->head > commit - 8)
> +		goto reset;
> +
>  	event = __rb_page_index(iter_head_page, iter->head);
>  	length = rb_event_length(event);
>