[PATCH v4 0/5] liveupdate: tighten handover cleanup and session lifetime

oskar at gerlicz.space oskar at gerlicz.space
Wed Mar 25 23:47:03 PDT 2026


On 2026-03-25 14:38, Pasha Tatashin wrote:
> On Tue, Mar 24, 2026 at 5:29 PM Oskar Gerlicz Kowalczuk
> <oskar at gerlicz.space> wrote:
>> 
>> Hi Pasha,
>> 
>> this v4 keeps the simpler direction from your mail: outgoing handover 
>> is
>> still driven by a boolean rebooting gate, refcount pinning of outgoing
>> sessions during serialization, and session->mutex as the serialization
>> point for in-flight mutations. There is no return to the earlier 
>> closing
>> counter or a larger session state machine.
>> 
>> The main changes in this respin are:
>> 
>> - reshape the series into five commits, each building and standing on 
>> its
>>   own
>> - keep incoming session origin immutable and use retrieved only as the
>>   checked-out bit
>> - make FINISH and implicit close consume incoming sessions without
>>   reopening races through retrieve-by-name
>> - route deserialize failures through explicit rollback paths for
>>   sessions, files, and serialized memfd state
>> - validate KHO-preserved extents before walking serialized metadata
>> - harden incoming FLB lifetime and remaining teardown paths
>> 
>> Patches 1-4 keep the core session, kexec, deserialize and validation 
>> work
>> separate. Patch 5 carries the remaining FLB and teardown fixes needed 
>> to
>> match the final tree.
>> 
>> Oskar Gerlicz Kowalczuk (5):
>>   liveupdate: block outgoing session updates during reboot
>>   kexec: abort liveupdate handover on kernel_kexec() unwind
>>   liveupdate: fail session restore on file deserialization errors
>>   liveupdate: validate handover metadata before using it
>>   liveupdate: harden FLB lifetime and remaining teardown paths
>> 
>>  include/linux/kexec_handover.h     |  13 +
>>  include/linux/liveupdate.h         |  17 +-
>>  kernel/kexec_core.c                |   4 +
>>  kernel/liveupdate/kexec_handover.c |  22 ++
>>  kernel/liveupdate/luo_core.c       |  16 +-
>>  kernel/liveupdate/luo_file.c       | 237 ++++++++++++--
>>  kernel/liveupdate/luo_flb.c        | 116 +++++--
>>  kernel/liveupdate/luo_internal.h   |  14 +-
>>  kernel/liveupdate/luo_session.c    | 500 
>> ++++++++++++++++++++++++-----
>>  lib/tests/liveupdate.c             |   2 +
>>  mm/memfd_luo.c                     | 160 +++++++--
>>  11 files changed, 934 insertions(+), 167 deletions(-)
> 
> Hi Oskar,
> 
> This is a NAK.
> 
> This patch series is still significantly over-engineering solutions to
> straightforward problems, and the patches read as if they were
> mechanically generated rather than thoughtfully designed. The sheer
> volume of changes (nearly 1,000 lines) is unnecessary for what we are
> trying to achieve.
> 
> As I pointed out previously, we need to keep this simple. For example,
> the problem of preventing a return to userspace after a successful
> liveupdate_reboot() does not require inventing a new
> liveupdate_reboot_abort() function. It just requires placing the call
> where it's the last function allowed to return, with a simple
> exception for context-preserved kexec jumps. That is a trivial change:
> 
> -       error = liveupdate_reboot();
> -       if (error)
> -               goto Unlock;
> +       if (!kexec_image->preserve_context) {
> +               error = liveupdate_reboot();
> +               if (error)
> +                       goto Unlock;
> +       }
> 
> The same principle applies to the other issues. You are doing rewrites
> when targeted fixes should be applied:
> - Sessions mutating after entering the reboot() syscall?: This is a
> 14-line fix [1].
> - Destroyed serialization race during liveupdate_reboot()? This is a
> 70-line fix [2].
> 
> I was hoping that v4 (BTW, why are we at v4 in just a week?) would
> include the patches I already provided, along with similarly scoped,
> minimal fixes for the deserialization path if needed. Instead, I am
> seeing another over-complicated rewrite.
> 
> Looking at deserialization, I am seeing, you are deleting a comment
> that explicitly explains our error-handling design:
> -       /*
> -        * Note on error handling:
> -        *
> -        * If deserialization fails (e.g., allocation failure or 
> corrupt data),
> -        * we intentionally skip cleanup of files that were already 
> restored.
> -        *
> -        * A partial failure leaves the preserved state inconsistent.
> -        * Implementing a safe "undo" to unwind complex dependencies 
> (sessions,
> -        * files, hardware state) is error-prone and provides little 
> value, as
> -        * the system is effectively in a broken state.
> -        *
> -        * We treat these resources as leaked. The expected recovery 
> path is for
> -        * userspace to detect the failure and trigger a reboot, which 
> will
> -        * reliably reset devices and reclaim memory.
> -        */
> 
> You are adding a bunch of new functions to solve something we
> intentionally chose not to solve. Attempting to cleanly unwind and
> release incorrectly deserialized resources back to userspace is
> dangerous. It risks security violations and, in the worst-case
> scenario, customer data leaks. A failed deserialization means the
> system is broken; we leak the resources (make them unavailable) and
> rely on a reboot to reliably sanitize the state.
> 
> Please drop the over-engineered approaches. Use the patches provided,
> keep the fixes minimal, and do not alter established security
> boundaries like the deserialization error paths. Take your time to
> manually self-review your work, do not rely on AI to do everything for
> you.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/tatashin/linux.git/commit/?h=luo-reboot-sync/rfc/1&id=d47b76707c4f352c3f70384d3d6a818e2875439a
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/tatashin/linux.git/commit/?h=luo-reboot-sync/rfc/1&id=f27a6a9364cd0a19067734eeb24ea4d290b72139
> 
> Pasha
> 
>> 
>> --
>> 2.53.0
>> 
Hi Pasha, thank you for the detailed feedback and for sharing your 
patches. I understand that my previous version was overcomplicated. I 
will rework the series to follow your approach and keep the changes 
minimal. I plan to send an updated version later today.

Thanks,
Oskar Gerlicz-Kowalczuk



More information about the kexec mailing list