[Xen-devel] [PATCHv10 0/9] Xen: extend kexec hypercall for use with pv-ops kernels
Daniel Kiper
daniel.kiper at oracle.com
Fri Nov 8 11:28:02 EST 2013
On Fri, Nov 08, 2013 at 10:42:51AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 08, 2013 at 07:15:00AM -0800, Daniel Kiper wrote:
> > On Fri, Nov 08, 2013 at 02:01:28PM +0000, Andrew Cooper wrote:
> > > On 08/11/13 13:19, Jan Beulich wrote:
> > > >>>> On 08.11.13 at 14:13, David Vrabel <david.vrabel at citrix.com> wrote:
> > > >> Keir,
> > > >>
> > > >> Sorry, forgot to CC you on this series.
> > > >>
> > > >> Can we have your opinion on whether this kexec series can be merged?
> > > >> And if not, what further work and/or testing is required?
> > > > Just to clarify - unless I missed something, there was still no
> > > > review of this from Daniel or someone else known to be
> > > > familiar with the subject. If Keir gave his ack, formally this
> > > > could go in, but I wouldn't feel too well with that (the more
> > > > that apart from not having reviewed it, Daniel seems to also
> > > > continue to have problems with it).
> > > >
> > > > Jan
> > >
> > > Can I have myself deemed to be familiar with the subject as far as this
> > > is concerned?
> > >
> > > A noticeable quantity of my contributions to Xen have been in the kexec
> > > / crash areas, and I am the author of the xen-crashdump-analyser.
> > >
> > > I do realise that I certainly not impartial as far as this series is
> > > concerned, being a co-developer.
> > >
> > > Davids statement of "the current implementation is so broken[1] and
> > > useless[2] that..." is completely accurate. It is frankly a miracle
> > > that the current code ever worked at all (and from XenServers point of
> > > view, failed far more often than it worked).
> > >
> > >
> > > For reference, XenServer 6.2 shipped with approximately v7 of this
> > > series, and an appropriate kexec-tools and xen-crashdump-analyser.
> > > Since we put the code in, we have not had a single failure-to-kexec in
> > > automated testing (both specific crash tests, and from unexpected host
> > > crashes), whereas we were seeing reliable failures to crash on most of
> > > our test infrastructure.
> > >
> > > In stark contrast to previous versions of XenServer, we have not had a
> > > single customer reported host crash where the kexec path has failed.
> > > There was one systematic failure where the HPSA driver was unhappy with
> > > the state of the hardware, resulting in no root filesystem to write logs
> > > to, and a repeated panic and Xen deadlock in the queued invalidation
> > > codepath.
> >
> > Andrew, if it runs on all your hardware it does not mean that it runs
> > everywhere. I have discovered the problem (I hope the last one) and it
> > should be taken into consideration. Another question is what is the
> > source of this problem. Maybe QEMU but it should be checked and not
> > ignored.
>
> I think the question is that the feature freeze is the 18th - and whether
> this single bug should halt the integration of this whole patchset.
>
> Or that it is OK to put in the patchset in and deal with the bugs
> and not stall this initial patchset.
I have never stated that I would like to block this patch series
indefinitely due to this one bug (I am still not sure that this
is a bug; Currently, I feel that I am only one person who tries
to verify that). We have more then one week and I think that we
are able to discover what is going on. If not I think that we
can workout reasonable solution for this issue (as we did in other
cases). Last but not least, I would like to underline that I wish
that this patch series were included in Xen 4.4 too. However,
it must be done in sensible way.
Daniel
More information about the kexec
mailing list