From: Herbert Poetzl (herbert_at_13thfloor.at)
Date: Wed 05 Feb 2003 - 19:53:50 GMT
On Wed, Feb 05, 2003 at 01:26:47PM -0600, John Goerzen wrote:
> Paul Sladen <vserver_at_paul.sladen.org> writes:
>
> > Sorry I didn't express a `bite', I [still] haven't had time to look through
> > the call-path.
>
> That's OK; knowing that somebody knowledgeable is going to look at it
> at least is comforting.
>
> > If you can regularly reproduce these, then dumping them (in full) via
> > serial/netconsole/lights-out would be useful.
>
> This is the first time it's happened. I was so flustered at seeing
> our server go down that I didn't take the time to ready myself for the
> next one. Sigh. Anyway, next time it goes down, I'll transcribe the
> full the trace using my laptop, and at the same time drop in an extra
> serial card (our serial ports are full) so we can do the serial
> console thing.
>
beware, do not draw wrong conclusions from insufficient data ...
although I tend to agree that your assumptions point into
the right direction, I would be careful ...
> Some data points:
>
> 1. I recently moved our vservers from a 2.4.19 uniprocessor machine to
> a 2.4.20 SMP one, with approrpriate upgrades in the ctx patch. So,
> three varibles there: kernel version, SMP, and ctx version.
... and hardware (ram, I/O, etc) and maybe network ...
maybe even the distribution? some upgrade/update?
> 2. I had never experienced this sort of thing on the 2.4.19 machine.
... for how long, and under what load/conditions?
have they changed too?
> 3. This machine is a brand-new Linux-supported Dell server, so
> hardware is unlikely to be an issue.
... brand new "Linux-supported" big company server
never made any problems (like compaq/hp/ibm ???),
please wake up ...
don't get me wrong, but I have two SMP 2.4.20-p8c13e
machines running smoothly for about 126 days ...
best,
Herbert
PS: I value your input, and I'm sure the bug will be
found, sooner or later ...
> -- John