[EXTERNAL] Re: [EXTERNAL] Re: [PATCH v2 0/6] Add support for Texas Instruments MCRC64 engine
Kamlesh Gurudasani
kamlesh at ti.com
Tue Sep 19 23:53:25 PDT 2023
Kamlesh Gurudasani <kamlesh at ti.com> writes:
...
> Hi Eric, thanks for your detailed and valuable inputs.
>
> As per your suggestion, we did some profiling.
>
> Use case is to calculate crc32/crc64 for file input from user space.
>
> Instead of directly implementing PMULL based CRC64, we made first comparison between
> Case 1.
> CRC32 (splice() + kernel space SW driver)
> https://gist.github.com/ti-kamlesh/5be75dbde292e122135ddf795fad9f21
>
> Case 2.
> CRC32(mmap() + userspace armv8 crc32 instruction implementation)
> (tried read() as well to get contents of file, but that lost to mmap() so not mentioning number here)
> https://gist.github.com/ti-kamlesh/002df094dd522422c6cb62069e15c40d
>
> Case 3.
> CRC64 (splice() + MCRC64 HW)
> https://gist.github.com/ti-kamlesh/98b1fc36c9a7c3defcc2dced4136b8a0
>
>
> Overall, overhead of userspace + af_alg + driver in (Case 1) and
> ( Case 3) is ~0.025s, which is constant for any file size.
> This is calculated using real time to calculate crc -
> driver time (time spend inside init() + update() +final()) = overhead ~0.025s
>
>
>
> +-------------------+-----------------------------+-----------------------+------------------------+------------------------+
> | | | | | |
> | File size | 120mb(ideal size for us) | 20mb | 15mb | 5mb |
> +===================+=============================+=======================+========================+========================+
> | | | | | |
> | CRC32 (Case 1) | Driver time 0.155s | Driver time 0.0325s | Driver time 0.019s | Driver time 0.0062s |
> | | real time 0.18s | real time 0.06s | real time 0.04s | real time 0.03s |
> | | overhead 0.025s | overhead 0.025s | overhead 0.021s | overhead ~0.023s |
> +-------------------+-----------------------------+-----------------------+------------------------+------------------------+
> | | | | | |
> | CRC32 (Case 2) | Real time 0.30s | Real time 0.05s | Real time 0.04s | Real time 0.02s |
> +-------------------+-----------------------------+-----------------------+------------------------+------------------------+
> | | | | | |
> | CRC64 (Case 3) | Driver time 0.385s | Driver time 0.0665s | Driver time 0.0515s | Driver time 0.019s |
> | | real time 0.41s | real time 0.09s | real time 0.08s | real time 0.04s |
> | | overhead 0.025s | overhead 0.025s | overhead ~0.025s | overhead ~0.021s |
> +-------------------+-----------------------------+-----------------------+------------------------+------------------------+
>
> Here, if we consider similar numbers for crc64 PMULL implementation as
> crc32 (case 2) , we save good number of cpu cycles using mcrc64
> in case of files bigger than 5-10mb as most of the time is being spent in HW offload.
>
> Regards,
> Kamlesh
Hi Eric,
Please let me know if above numbers make sense to you and I should send
next revision.
Regards,
Kamlesh
More information about the linux-arm-kernel
mailing list