From: Herbert Poetzl (herbert_at_13thfloor.at)
Date: Sat 08 Feb 2003 - 13:07:10 GMT
On Sat, Feb 08, 2003 at 01:08:44AM -0500, Jacques Gelinas wrote:
> I am trying to figure that bug. I have withness the bug twice on
> a server although I am using ctx-15 and ctx-16 on many servers.
>
> The big difference between ctx-15 and the previous is the way
> the struct iproot_info is used. In previous kernel, only struct task
> was referencing struct iproot pointers. A reference count was maintained
> when a new process was created and when a process was ending. Easy.
>
> In ctx-15, sockets also reference those pointers, so have to handle
> the reference count. The big issue when debugging ctx-15 was to
> realised that sockets (struct sock) were copied to other struct such
> as tcp_tw_bucket and that some common routine were use
> to handle both struct sock and struct tcp_tw_bucket. Anyway, the
> reference count stuff, while trivial (one line of core here and there) took
> some time to get right (one line of code here and there, but where :-) ).
>
> Now, I realise that ipv6 is sharing much of the code of ipv4. It does
> share the socket initialisation code, but it use the same
> cleanup function: inet_sock_destruct. This function does the reference
> count stuff on iproot_info using a non initialised pointer. Oops.
>
> So I did the following in net/ipv6/af_inet6.c
>
> *** af_inet6.bak 2001-10-31 15:32:46.000000000 -0500
> --- af_inet6.c 2003-02-08 00:50:28.000000000 -0500
> ***************
> *** 183,188 ****
> --- 183,191 ----
> sk->protinfo.af_inet.mc_index = 0;
> sk->protinfo.af_inet.mc_list = NULL;
>
> + sk->s_context = current->s_context;
> + sk->ip_info = NULL;
> +
> if (ipv4_config.no_pmtu_disc)
> sk->protinfo.af_inet.pmtudisc = IP_PMTUDISC_DONT;
> else
> ----------------------------------------------------------------------
>
> Now, I suspect this is not all. But just to make sure:
>
> Aare there some people out there having crash with ctx-15 or 16,
> not using ipv6 at all ?
Jacques,
the panic usually happens at sched.c 570 and this is
a "bug" panic, have a look ...
asmlinkage void schedule(void)
{
...
if (unlikely(in_interrupt())) {
printk("Scheduling in interrupt\n");
570 BUG();
}
.. so my educated guess would be, there are situations
where a schedule request happens while the system is
handling an interrupt ...
the cause for this, might be some badly initialized
pointer or struct, but this seems not very likely,
so I would search for a race condition under heavy
interrupt load ...
best,
Herbert
> Those using ipv6, can you try this patch ?
>
> (for sure it does not make the kernel vserver/ipv6 aware).
>
> ---------------------------------------------------------
> Jacques Gelinas <jack_at_solucorp.qc.ca>
> vserver: run general purpose virtual servers on one box, full speed!
> http://www.solucorp.qc.ca/miscprj/s_context.hc