[RFC] IMA Log Snapshotting Design Proposal

Thu Aug 10 07:12:42 PDT 2023

On 8/9/23 21:15, Tushar Sugandhi wrote:
> Thanks a lot Stefan for looking into this proposal,
> and providing your feedback. We really appreciate it.
> 
> On 8/7/23 15:49, Stefan Berger wrote:
>>
>>
>> On 8/1/23 17:21, James Bottomley wrote:
>>> On Tue, 2023-08-01 at 12:12 -0700, Sush Shringarputale wrote:
>>> [...]
>>>> Truncating IMA log to reclaim memory is not feasible, since it makes
>>>> the log go out of sync with the TPM PCR quote making remote
>>>> attestation fail.
>>>
>>> This assumption isn't entirely true.  It's perfectly possible to shard
>>> an IMA log using two TPM2_Quote's for the beginning and end PCR values
>>> to validate the shard.  The IMA log could be truncated in the same way
>>> (replace the removed part of the log with a TPM2_Quote and AK, so the
>>> log still validates from the beginning quote to the end).
>>>
>>> If you use a TPM2_Quote mechanism to save the log, all you need to do
>>> is have the kernel generate the quote with an internal AK.  You can
>>> keep a record of the quote and the AK at the beginning of the truncated
>>> kernel log.  If the truncated entries are saved in a file shard it
>>
>> The truncation seems dangerous to me. Maybe not all the scenarios with an attestation
>> client (client = reading logs and quoting) are possible then anymore, such as starting
>> an attestation client only after truncation but a verifier must have witnessed the
>> system's PCRs and log state before the truncation occurred.
> You are correct that truncation on it’s own is dangerous. It needs to be
> accompanied by (a) saving the IMA log data to disk as snapshots, (b) adding the
> necessary TPM PCR quotes to the current IMA log (as James mentioned above),
> (c) attestation clients having an ability to send the past snapshots to the
> remote-attestation-service (verifiers), (d) and verifiers having an ability
> to use the snapshots along with current IMA logs for the purpose of attestation.
> All these points are explained in the original RFC email in sections B.1 through B.5 [1].

I read it.

Maybe you have dismissed the PCR update counter already...
I am not sure what the PCR update counter is supposed to help with. It won't allow you to detect
missing log events but rather will confuse anyone looking at it when my application extends PCR 12
for example, which also affects the update counter. It's a global counter that increases with every
PCR extension (except PCR 16, 21, 22, 23) and if used as proposed would prevent any application from
extending PCRs.

https://github.com/stefanberger/libtpms/blob/master/src/tpm2/PCR.c#L667
https://github.com/stefanberger/libtpms/blob/master/src/tpm2/PCR.c#L629
https://github.com/stefanberger/libtpms/blob/master/src/tpm2/PCR.c#L161

The shards should will need to be written into some sort of standard location or a config file needs to
be defined, so that everyone knows where to find them and how they are named.

>>
>> I think an ima-buf (or similar) log entry in IMA log would have to appear at the beginning of the
>> truncated log stating the value of all PCRs that IMA touched (typically only PCR 10
>> but it can be others). The needs to be done since the quote itself doesn't
>> provide the state of the individual PCRs. This would at least allow an attestation
>> client to re-read the log from the beginning (when it is re-start or started for the
>> first time after the truncation). 
>   Agreed. See the description of snapshot_aggregate in Section B.5 in the
> original RFC email [1].
>> However, this alone (without the
>> internal AK quoting the old state) could lead to abuse where I could create totally
>> fake IMA logs stating the state of the PCRs at the beginning (so the verifier
>> syncs its internal PCR state to this state). 
> Yes, the PCR quotes sent to the verifier must be signed by the AK that
> is trusted by the verifier. That assumption is true regardless of IMA log
> snapshotting feature.
>> Further, even with the AK-quote that
>> you propose I may be able to create fake logs and trick a verifier into
>> trusting the machine IFF it doesn't know what kernel this system was booted with
>> that I may have hacked to provide a fake AK-quote that just happens to match the
>> PCR state presented at the beginning of the log.
>>
> If the Kernel is compromised, then all-bets are off.
> (Regardless of IMA log snapshotting feature.)
>> => Can a truncated log be made safe for attestation when the attestation starts
>> only after the truncation occurred?
>>
> Yes. If the “PCR quotes in the snapshot_aggregate event in IMA log”

PCR quote or 'quotes'? Why multiple?

Form your proposal but you may have changed your opinion  following what I see in other messages:
"- The Kernel will get the current TPM PCR values and PCR update counter [2]
    and store them as template data in a new IMA event "snapshot_aggregate"."

Afaik TPM quote's don't give you the state of the individual PCR values, therefore
I would expect to at least find the 'PCR values' of all the PCRs that IMA touched to
be in the snapshot_aggregate so I can replay all the following events on top of these
PCR values and come up with the values that were used in the "final PCR quote". This
is unless you expect the server to take an automatic snapshot of the values of the
PCRs  that it computed while evaluating the log in case it ever needs to go back.

> + "replay of rest of the events in IMA log" results in the “final PCR quotes”
> that matches with the “AK signed PCR quotes” sent by the client, then the truncated
> IMA log can be trusted. The verifier can either ‘trust’ the “PCR quotes in the
> snapshot_aggregate event in IMA log” or it can ask for the (n-1)th snapshot shard
> to check the past events.

For anything regarding determining the 'trustworthiness of a system' one would have to
be able to go back to the very beginning of the log *or* remember in what state a
system was when the latest snapshot was taken so that if a restart happens it can resume
with that assumption about state of trustworthiness and know what the values of the PCRs
were at that time so it can resume replaying the log (or the server would get these
values from the log).

The AK quotes by the kernel (which adds a 2nd AK key) that James is proposing
could be useful if the entire log, consisting of multiple shards, is very large and
cannot be transferred from the client to the server in one go so that the server could
evaluate the 'final PCR quote' immediately . However, if a client can indicated 'I will
send more the next time and I have this much more to transfer' and the server allows
this multiple times (until all the 1MB shards of the 20MB log are transferred) then that
kernel AK key would not be necessary since presumably the "final PCR quote", created
by a user space client, would resolve whether the entire log is trustworthy.

> 
>> => Even if attestation was occurring 'what' state does an attestation server
>> need to carry around for an attested-to system so that the truncation is 'safe'
>> and I cannot create fake AK-quotes and fake IMA logs with initial PCR states?
> Assuming most of the client devices take a snapshot at specific checkpoints,
> the “PCR quotes in the snapshot_aggregate event in IMA log” will be the same for them.
> The remote attestation server will have to remember these golden PCR quotes.

I thought maybe 'golden PCR values'... because those let me replay PCR extensions from
a previous point.

> It doesn't have to remember the state of each client device.

Can you give a reason for this? You mean the state doesn't need to be remembered for client
devices whose log hasn't been truncated?

>> Can I ever restart the client and have it read the truncated log from the
>> beginning and what type of verification needs to happen on the server then?
>>
> Yes, restarting the client should be possible.

Yes, this must be possible.

>> It seems like the server would have to remember the state of the IMA PCRs upon
>> last truncation to detect a possible attack. This would make staring to monitor
>> a system after truncation impossible -- would be good to know these details.
>>
> The server is not forced to remember the state of IMA PCRs. It can
> always ask for the last n snapshot files (shards) and replay the events. Even
> though the data is truncated from the IMA log, it is not totally lost. It is
> simply being transferred to the disk. It is saved by UM as snapshot files/shards.
> The goal of IMA snapshotting is to reduce the Kernel memory pressure on the
> client devices - to save them from out-of-memory errors which are harder to manage
> on long running clients. It comes with a cost of additional work on the server
> side to attest those clients.

Agreed.
> 
> 
> Being said that, in the current proposal, taking a snapshots is totally optional
> and controlled by UM attestation clients. If the attestation-clients/services are
> not-ready/don’t-want to take advantage of IMA log snapshotting, they don’t have to.

Agreed.

> 
> No snapshot will be taken, and the client-service can process the monolithic IMA
> log just like they do today.
> 

Agreed.

> [1] https://lore.kernel.org/all/c5737141-7827-1c83-ab38-0119dcfea485@linux.microsoft.com/#t
> 
>>
>>
>>
>>> should have a beginning and end quote and a record of the AK used.
>>> Since verifiers like Keylime are already using this beginning and end
>>> quote for sharded logs, it's the most natural format to feed to
>>> something externally for verification and it means you don't have to
>>> invent a new format to do the same thing.
>>>
>>> Regards,
>>>
>>> James
>>>
>>>
>>> _______________________________________________
>>> kexec mailing list
>>> kexec at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/kexec
>