[PATCH v4 4/9] libperf: Add libperf_evsel__mmap()

Tue Oct 20 10:38:13 EDT 2020

On Mon, Oct 19, 2020 at 3:15 PM Jiri Olsa <jolsa at redhat.com> wrote:
>
> On Fri, Oct 16, 2020 at 04:39:15PM -0500, Rob Herring wrote:
> > On Wed, Oct 14, 2020 at 6:05 AM Jiri Olsa <jolsa at redhat.com> wrote:
> > >
> > > On Thu, Oct 01, 2020 at 09:01:11AM -0500, Rob Herring wrote:
> > >
> > > SNIP
> > >
> > > >
> > > > +void *perf_evsel__mmap(struct perf_evsel *evsel, int pages)
> > > > +{
> > > > +     int ret;
> > > > +     struct perf_mmap *map;
> > > > +     struct perf_mmap_param mp = {
> > > > +             .prot = PROT_READ | PROT_WRITE,
> > > > +     };
> > > > +
> > > > +     if (FD(evsel, 0, 0) < 0)
> > > > +             return NULL;
> > > > +
> > > > +     mp.mask = (pages * page_size) - 1;
> > > > +
> > > > +     map = zalloc(sizeof(*map));
> > > > +     if (!map)
> > > > +             return NULL;
> > > > +
> > > > +     perf_mmap__init(map, NULL, false, NULL);
> > > > +
> > > > +     ret = perf_mmap__mmap(map, &mp, FD(evsel, 0, 0), 0);
> > >
> > > hum, so you map event for FD(0,0) but later in perf_evsel__read
> > > you allow to read any cpu/thread combination ending up reading
> > > data from FD(0,0) map:
> > >
> > >         int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > >                              struct perf_counts_values *count)
> > >         {
> > >                 size_t size = perf_evsel__read_size(evsel);
> > >
> > >                 memset(count, 0, sizeof(*count));
> > >
> > >                 if (FD(evsel, cpu, thread) < 0)
> > >                         return -EINVAL;
> > >
> > >                 if (evsel->mmap && !perf_mmap__read_self(evsel->mmap, count))
> > >                         return 0;
> > >
> > >
> > > I think we should either check cpu == 0, thread == 0, or make it
> > > general and store perf_evsel::mmap in xyarray as we do for fds
> >
> > The mmapped read will actually fail and then we fallback here. My main
> > concern though is adding more overhead on a feature that's meant to be
> > low overhead (granted, it's not much). Maybe we could add checks on
> > the mmap that we've opened the event with pid == 0 and cpu == -1 (so
> > only 1 FD)?
>
> but then you limit this just for single fd.. having mmap as xyarray
> would not be that bad and perf_evsel__mmap will call perf_mmap__mmap
> for each defined cpu/thread .. so it depends on user how fast this
> will be - how many maps needs to be created/mmaped

Given userspace access fails for anything other than the calling
thread and all cpus, how would more than 1 mmap be useful here?

If we did want multiple mmaps, wouldn't we just use the evlist API in
that case? It already does all that.

Rob