[RFC PATCH v2] ptp: Add vDSO-style vmclock support

Peter Hilber peter.hilber at opensynergy.com
Fri Jun 28 09:38:15 PDT 2024


On 28.06.24 14:15, David Woodhouse wrote:
> On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
>> On 27.06.24 16:52, David Woodhouse wrote:
>>> I already added a flags field, so this might look something like:
>>>
>>>         /*
>>>          * Smearing flags. The UTC clock exposed through this structure
>>>          * is only ever true UTC, but a guest operating system may
>>>          * choose to offer a monotonic smeared clock to its users. This
>>>          * merely offers a hint about what kind of smearing to perform,
>>>          * for consistency with systems in the nearby environment.
>>>          */
>>> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */
>>>
>>> (UTC-SLS is probably a bad example but are there formal definitions for
>>> anything else?)
>>
>> I think it could also be more generic, like flags for linear smearing,
>> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
>> to the leap second start). That could also represent UTC-SLS, and
>> noon-to-noon, and it would be well-defined.
>>
>> This should reduce the likelihood that the guest doesn't know the smearing
>> variant.
> 
> I'm wary of making it too generic. That would seem to encourage a
> *proliferation* of false "UTC-like" clocks.
> 
> It's bad enough that we do smearing at all, let alone that we don't
> have a single definition of how to do it.
> 
> I made the smearing hint a full uint8_t instead of using bits in flags,
> in the end. That gives us a full 255 ways of lying to users about what
> the time is, so we're unlikely to run out. And it's easy enough to add
> a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
> that get invented.
> 
> 

My concern is that the registry update may come after a driver has already
been implemented, so that it may be hard to ensure that the smearing which
has been chosen is actually implemented.

>>>>> +       /*
>>>>> +        * This field changes to another non-repeating value when the CPU
>>>>> +        * counter is disrupted, for example on live migration.
>>>>> +        */
>>>>> +       uint64_t disruption_marker;
>>>>
>>>> The field could also change when the clock is stepped (leap seconds
>>>> excepted), or when the clock frequency is slewed.
>>>
>>> I'm not sure. The concept of the disruption marker is that it tells the
>>> guest to throw away any calibration of the counter that the guest has
>>> done for *itself* (with NTP, other PTP devices, etc.).
>>>
>>> One mode for this device would be not to populate the clock fields at
>>> all, but *only* to signal disruption when it occurs. So the guest can
>>> abort transactions until it's resynced its clocks (to avoid incurring
>>> fines if breaking databases, etc.).
>>>
>>> Exposing the host timekeeping through the structure means that the
>>> migrated guest can keep working because it can trust the timekeeping
>>> performed by the (new) host and exposed to it.
>>>
>>> If the counter is actually varying in frequency over time, and the host
>>> is slewing the clock frequency that it reports, that *isn't* a step
>>> change and doesn't mean that the guest should throw away any
>>> calibration that it's been doing for itself. One hopes that the guest
>>> would have detected the *same* frequency change, and be adapting for
>>> itself. So I don't think that should indicate a disruption.
>>>
>>> I think the same is even true if the clock is stepped by the host. The
>>> actual *counter* hasn't changed, so the guest is better off ignoring
>>> the vacillating host and continuing to derive its idea of time from the
>>> hardware counter itself, as calibrated against some external NTP/PTP
>>> sources. Surely we actively *don't* to tell the guest to throw its own
>>> calibrations away, in this case?
>>
>> In case the guest is also considering other time sources, it might indeed
>> not be a good idea to mix host clock changes into the hardware counter
>> disruption marker.
>>
>> But if the vmclock is the authoritative source of time, it can still be
>> helpful to know about such changes, maybe through another marker.
> 
> Could that be the existing seq_count field?
> 
> Skewing the counter_period_frac_sec as the underlying oscillator speeds
> up and slows down is perfectly normal and expected, and we already
> expect the seq_count to change when that happens.
> 
> Maybe step changes are different, but arguably if the time advertised
> by the host steps *outside* the error bounds previously advertised,
> that's just broken?

But the error bounds could be large or missing. I am trying to address use
cases where the host steps or slews the clock as well.

> 
> Depending on how the clock information is fed, a change in seq_count
> may even result in non-monotonicity. If the underlying oscillator has
> sped up and the structure is updated accordingly, the time calculated
> the moment *before* that update may appear later than the time
> calculated immediately after it.
> 
> It's up to the guest operating system to feed that information into its
> own timekeeping system and skew towards correctness instead of stepping
> the time it reports to its users.
> 

The guest can anyway infer from the other information that the clock
changed, so my proposal might not be that useful. Maybe it can be added in
a future version if there is any need.



More information about the linux-arm-kernel mailing list