remoteproc: imx_rproc race

Fri Apr 8 01:18:56 PDT 2022

Hello Peng,

Am 08.04.2022 um 09:30 schrieb Peng Fan:
>> Subject: remoteproc: imx_rproc race
>>
>> Hi all,
>>
>> to support remote processors started before Linux booted on i.MX8MP SoCs,
>> the rsc-table device tree property was introduced.
>> This entry specifies a location in a shared memory area where the remote
>> processor will place the resource table.
>> In the MCUXpresso SDK samples for i.MX8MP, this is achieved by copying the
>> resource table to this location during startup.
>> While this use case works fine, I don't see how this is supposed to work when
>> booting the same remote processor firmware from user space.
>>
>> Note that I expected "[PATCH V2] remoteproc: imx_rproc: use imx specific
>> hook for find_loaded_rsc_table" [1] to be applied at some point.
>> Without this patch there seem to be some other issues as well.
>>
>> After starting the remote processor here:
>> remoteproc_core.c: rproc_start(): 'ret = rproc->ops->start(rproc);'
>> both cores are running in parallel and both cores are trying to write to the
>> resource table.
>> While Linux starts setting status flags like VIRTIO_CONFIG_S_DRIVER, the
>> remote processor copies the entire resource table to the same place during
>> its startup sequence.
>> If the copy runs after Linux has set the status flags, those flags will be reset
>> and the remote processor will wait for them forever. Since this is racy, this will
>> most likely fail.
> 
> Technically your concern is valid, but NXP ATF has locked down TCM area,
> that means linux has no chance to write TCM area for elf resource table update.
> 
> So I moved to use rsc_table.

This is understandable for me and not a problem per se.

> 
> NXP MCU SDK not check resource table per my understanding. It just copy the
> resource table to the place rsc_table points to.

That's the point. While one core is copying the table to the specified 
area, the other core is modifying the same area. That does not work.

For this to work, there has to be some sort of synchronization between 
the cores. Either Linux has to wait for the remote processor to finish 
its table copy, or the remote processor checks if the table has already 
been prepared by Linux and then skip the copy or something like that...

So, I was wondering how it was intended to address this problem?

Thanks
  ~Matthias

> 
> Regards,
> Peng.
> 
>>
>> Can anyone tell me how it is intended to cover both cases (other than using
>> different firmware images for the remote processor)?
>>
>> Thanks
>>    ~Matthias
>>
>> [1]
>> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infr
>> adead.org%2Fpipermail%2Flinux-arm-kernel%2F2022-January%2F709047.ht
>> ml&data=04%7C01%7Cpeng.fan%40nxp.com%7C423c5e47c2f04920720
>> 408da192d4048%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
>> 7849979390172717%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata
>> =TqF7b3AyxlRAho3RiwQHe7mtMveaHZ0Cl3jKQ8noz6E%3D&reserved=0
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel