About upstreaming ArmChina NPU driver

Dejia Shang Dejia.Shang at armchina.com
Wed Apr 3 03:09:53 PDT 2024


> -----Original Message-----
> From: Oded Gabbay <oded.gabbay at gmail.com>
> Sent: 2024年4月3日 14:26
> To: Dejia Shang <Dejia.Shang at armchina.com>
> Cc: ogabbay at kernel.org; airlied at redhat.com; daniel at ffwll.ch;
> linux-kernel at vger.kernel.org; dri-devel at lists.freedesktop.org;
> linux-arm-kernel at lists.infradead.org
> Subject: Re: About upstreaming ArmChina NPU driver
> 
> On Thu, Mar 28, 2024 at 10:01 AM Dejia Shang <Dejia.Shang at armchina.com>
> wrote:
> >
> > Dear Kernel Maintainers,
> >
> > I am a driver developer and would like to upstream the ArmChina Zhouyi
> NPU driver ("Zhouyi" is the brand) to accel subsystem.
> >
> > The driver is already open sourced (both UMD and KMD) and anyone can
> find the code from https://github.com/Arm-China/Compass_NPU_Driver.git.
> >
> > This driver is responsible for scheduling AI inference tasks to the NPU cores
> (V1/V2/V3). Specifically, a simplified end-to-end flow is:
> >
> >         1. A TFLite/ONNX model is transformed to an executable binary
> file in ELF format by the NN graph compiler (designed by ArmChina)
> >         2. An application loads the executable binary file to UMD and
> provides the input data.
> >         3. UMD parses the binary and sends ioctls to KMD (open device,
> do memory allocation/mmap/free, submit the job descriptor).
> >         4. KMD dispatches the job to NPU h/w, handles interrupts and
> updates the execution status.
> >         5. UMD polls the status of the pre-scheduled job.
> >         6. The application gets the output results.
> >
> > So...for the upstreaming,
> >
> > Q1: do you think our NPU driver is suitable for accel? If the answer is yes,
> which tree & branch should the patches be based on?
> Hi Dejia,
> Yes, it definitely sounds as a good fit to the accel subsystem.
> Please base your patches on "drm-misc-next" branch in drm-misc repo:
> https://anongit.freedesktop.org/git/drm/drm-misc.git
> 

Hi Oded,
Got it.

> >
> > Q2: in thread
> https://lore.kernel.org/lkml/ec547d33-214f-4952-aa33-c271e9edad63@kern
> el.org/ showing a similar case, Oded mentioned that:
> >
> >         "If we would have upstreamed a new driver, the expectation
> would have been that we would use some drm mechanisms.", and
> >         "the minimal requirement is to use GEM/BOs for memory
> management operations".
> >
> > I guess those requirements are also applicable for the Zhouyi NPU KMD?
> Currently, the memory management (MM) in KMD is based on dma-mapping
> APIs, which handles both reserved CMA region(s) and SMMU mapped buffers,
> and supports the dma-buf framework. Maybe I should replace the
> implementations with DRM APIs.
> Yes, those requirements definitely apply here.
> >
> > Q3: if you have looked at the KMD code, do you think I should make any
> other major change before submitting the first patch series? Thank you!
> I took a quick glance. In general, it seems to be ok, but I noticed two things
> related to the integration with drm/accel:
> 
> 1. You us a scheduler for the job submission, which provides the ability to
> defer jobs. In that case, I suggest to check if you can use drm_sched instead of
> your own implementation. No point in re-inventing the wheel.
> 2. You provide several memory zones for allocation of memory. I would
> suggest here to look at using ttm as the memory manager instead of
> re-implementing your own.

Thanks for your time! I will try to refactor the code as suggested and then send the first patch series.

> 
> And please remove the IMPORTANT NOTICE at the end of your emails. I
> would have to refrain from answering to further emails if that notice remains.

Now fixed. I did not realize that because the server auto appended the notice. Sorry for the inconvenience.

Best Regards,
Dejia

> 
> Thanks,
> Oded
> 
> >
> > Thanks for your time and look forward to your reply~ 😊
> >
> > Best Regards,
> > Dejia


More information about the linux-arm-kernel mailing list