[PATCH v18 00/18] KVM RISC-V Support

Fri May 21 18:13:47 BST 2021

On Wed, 19 May 2021 06:58:05 PDT (-0700), Greg KH wrote:
> On Wed, May 19, 2021 at 03:29:24PM +0200, Paolo Bonzini wrote:
>> On 19/05/21 14:23, Greg Kroah-Hartman wrote:
>> > > - the code could be removed if there's no progress on either changing the
>> > > RISC-V acceptance policy or ratifying the spec
>> >
>> > I really do not understand the issue here, why can this just not be
>> > merged normally?
>>
>> Because the RISC-V people only want to merge code for "frozen" or "ratified"
>> processor extensions, and the RISC-V foundation is dragging their feet in
>> ratifying the hypervisor extension.
>>
>> It's totally a self-inflicted pain on part of the RISC-V maintainers; see
>> Documentation/riscv/patch-acceptance.rst:
>>
>>   We'll only accept patches for new modules or extensions if the
>>   specifications for those modules or extensions are listed as being
>>   "Frozen" or "Ratified" by the RISC-V Foundation.  (Developers may, of
>>   course, maintain their own Linux kernel trees that contain code for
>>   any draft extensions that they wish.)
>>
>> (Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/riscv/patch-acceptance.rst)
>
> Lovely, and how is that going to work for code that lives outside of the
> riscv "arch" layer?  Like all drivers?
>
> And what exactly is "not ratified" that these patches take advantage of?
> If there is hardware out there with these features, well, Linux needs to
> run on it, so we need to support that.  No external committee rules
> should be relevant here.
>
> Now if this is for hardware that is not "real", then that's a different
> story.  In that case, who cares, no one can use it, so why not take it?
>
> So what exactly is this trying to "protect" Linux from?
>
>> > All staging drivers need a TODO list that shows what needs to be done in
>> > order to get it out of staging.  All I can tell so far is that the riscv
>> > maintainers do not want to take this for "unknown reasons" so let's dump
>> > it over here for now where we don't have to see it.
>> >
>> > And that's not good for developers or users, so perhaps the riscv rules
>> > are not very good?
>>
>> I agree wholeheartedly.
>>
>> I have heard contrasting opinions on conflict of interest where the
>> employers of the maintainers benefit from slowing down the integration of
>> code in Linus's tree.  I find these allegations believable, but even if that
>> weren't the case, the policy is (to put it kindly) showing its limits.
>
> Slowing down code merges is horrible, again, if there's hardware out
> there, and someone sends code to support it, and wants to maintain it,
> then we should not be rejecting it.
>
> Otherwise we are not doing our job as an operating system kernel, our
> role is to make hardware work.  We don't get to just ignore code because
> we don't like the hardware (oh if only we could!), if a user wants to
> use it, our role is to handle that.
>
>> > > Of course there should have been a TODO file explaining the situation. But
>> > > if you think this is not the right place, I totally understand; if my
>> > > opinion had any weight in this, I would just place it in arch/riscv/kvm.
>> > >
>> > > The RISC-V acceptance policy as is just doesn't work, and the fact that
>> > > people are trying to work around it is proving it.  There are many ways to
>> > > improve it:
>> >
>> > What is this magical acceptance policy that is preventing working code
>> > from being merged?  And why is it suddenly the rest of the kernel
>> > developer's problems because of this?
>>
>> It is my problem because I am trying to help Anup merging some perfectly
>> good KVM code; when a new KVM port comes up, I coordinate merging the first
>> arch/*/kvm bits with the arch/ maintainers and from that point on that
>> directory becomes "mine" (or my submaintainers').
>
> Agreed, but the riscv maintainers should not be forcing this "problem"
> onto all of us, like it seems is starting to happen :(
>
> Ok, so, Paul, Palmer, and Albert, what do we do here?  Why can't we take
> working code like this into the kernel if someone is willing to support
> and maintain it over time?

I don't view this code as being in a state where it can be maintained, 
at least to the standards we generally set within the kernel.  The ISA 
extension in question is still subject to change, it says so right at 
the top of the H extension 
<https://github.com/riscv/riscv-isa-manual/blob/master/src/hypervisor.tex#L4>

    {\bf Warning! This draft specification may change before being
    accepted as standard by the RISC-V Foundation.}

That means we really can't rely on any of this to be compatible with 
what is eventually ratified and (hopefully, because this is really 
important stuff) widely implemented in hardware.  We've already had 
isuses with other specifications where drafts were propossed as being 
ready for implemnetation, software was ported, and the future drafts 
were later incompatible -- we had this years ago with the debug support, 
which was a huge headache to deal with, and we're running into it again 
with these v-0.7.1 chips coming out.  I don't want to get stuck in a 
spot where we're forced to either deal with some old draft extension 
forever or end up breaking users.

Ultimately the whole RISC-V thing is only going to work out if we can 
get to the point where vendors can agree on a shared ISA.  I understand 
that there's been a lot of frustration WRT the timelines on the H 
extension, it's been frustrating for me as well.  There are clearly 
issues with how the ISA development process is being run and while those 
are coming to a head in other areas (the V extension and non-coherent 
devices, for example) I really don't think that's the case here because 
as far as I know we don't actually have any real hardware that 
implements the H extension.

All I really care about is getting to the point where we have real 
RISC-V systems running software that's as close to upstream as is 
reasonable.  As it currently stands, I don't know of anything this is 
blocking: there's some RTL implementation floating around, but that's a 
very long way from being real hardware.  Something of this complexity 
isn't suitable for a soft core, and RTL alone doesn't fix the 
fundamental problem of having a stable platform to run on (it needs a 
complex FPGA environment, and even then it's very limited in 
functionality).  I'm not sure where exactly the line for real hardware 
is, but for something like this it would at least involve some chip that 
is widely availiable and needs the H extension to be useful.  Such a 
system existing without a ratified extension would obviously be a major 
failing on the specification side, and while I think that's happening 
now for some systems (some of these V-0.7.1 chips, and the non-coherent 
systems) I just don't see that as the case for the H extension.  We've 
got to get to the point where the ISA extensions can be ratified in a 
timely fashion, but circumventing that process by merging code early 
doesn't fix the problem.  This really needs to be fixed at the RISC-V 
foundation, not papered over in software.

We have lots of real RISC-V hardware right now that's going to require a 
huge amount of work to support, trying to chase around a draft extension 
that may not even end up in hardware is just going to make headaches we 
don't have the time for.