From: Herbert Pötzl (herbert_at_13thfloor.at)
Date: Wed 20 Aug 2003 - 00:26:10 BST
On Tue, Aug 19, 2003 at 02:35:03PM -0700, Roderick A. Anderson wrote:
> As a user of vservers I do not get into the code and barely follow some of
> the threads I find it difficult to mention when I have kernel crashes as I
> can not _really_ explain what was going on when they happened.
about 90% of the kernel 'crashes' do not need any
further explanation (besides a crash report), about
what the user was doing or what was going on ...
> I have turned on kernel logging into /var/log/kernel but do not know what
> else I need to or should do to to get better information for when I do
> have a crash. Case in point this morning or rather last night. Suddenly
> the system froze up. Would not respond to the keyboard and I had to press
freezes and lockups are actually not kernel crashes,
but, if you want to get something useful in such a case
you have to do some preparations, namely
- setup nmi_watchdog (this will cause a kernel oops
when the kernel is not responding ...
- configure magic sys-req (you'll be able to activate
some kernel task/process/memory info in such a
case)
- use lkcd, or a serial line to capture the kernel
oops (handwritten oopses are as much fun as
screenshots done with your webcam :( )
> the reset/power switch to get it to come back up. The last messages in
> /var/log/messages before the hang/crash were of the form
>
> kernel: smb_retry: successful ...
>
> then the reboot messages. Nothing identifiable as a cause.
>
> I'm running 2.4.21ctx-17a because I thought the problem was with the
> eepro100 NIC driver and a thread on this list indicated the e100/e1000
> would be a better choice. Hardware seems solid and the crashes are _too_
huh? what does e100/e1000 have to do with 2.4.21?
as fas as I remember those where in 2.4.14 and for
sure are in 2.4.22-rc2 ...
> random for my liking to make me think it is a hardware issue. I did run
> memtest86 on the system before I put it online.
for how long?
> Where I am trying to take this is; what information is needed to help
> determine the cause of the crashes so I (we) can point a finger in the
> right direction - hardware, software, wetware, or the ctx kernel and
> friends. Not to point fingers as much as to lend a hand to the
> developers.
first step for any further investigation will be
some kind of kernel oops, parsed by ksymoops with
the correct kernel System.map ...
further useful information (after a captured oops)
will be a detailed system description, and some
hardware tests ...
HTH,
Herbert
> Cheers,
> Rod
> --
> "Open Source Software - Sometimes you get more than you paid for..."