Re: [vserver] Again: [vserver] Linux vServer: general protection fault with apache2 and kernel 2.6.38.6

From: Herbert Poetzl <herbert_at_13thfloor.at>
Date: Sat 06 Aug 2011 - 13:01:48 BST
Message-ID: <20110806120148.GB12671@MAIL.13thfloor.at>

On Sat, Aug 06, 2011 at 01:07:47PM +0200, Herbert Poetzl wrote:
> On Fri, Aug 05, 2011 at 05:03:33PM +0200, Urban Loesch wrote:
>> Hi Herbert,

>>>> I'd suggest to update to 2.6.38.8-vs2.3.0.37-rc17 and
>>>> see if the issue remains ...

>>> thanks for the response, I installed 2.6.38.8-vs2.3.0.37-rc17.

>>> # uname -r
>>> 2.6.38.8-vs2.3.0.37-rc17-rol-em64t

>> The problem seems not to be solved.
>> Today it happened the same with 2.6.38.8-vs2.3.0.37-rc17 after
>> about 16 days of uptime and without problems.

> hmm, I've googled for the interesting parts and found at
> least two other cases which look like the same or at least
> a very similar issue (without any Linux-VServer patch)

> http://ubuntuforums.org/showthread.php?t=1815522
> http://serverfault.com/questions/296213/apache-hangs-unkillable-kernel-error
> http://d.hatena.ne.jp/syuu1228/mobile?word=*%5BCeph%5D

> so it might be a mainline issue (possibly present since
> 2.6.32 or earlier) which seems to be triggered by apache
> (for whatever reason) but actually within the scheduler
> code (which is not modified in kernels after 2.6.22)

>> Here comes the log:

>> [1462582.761420] general protection fault: 0000 [#1] SMP
>> [1462582.771684] last sysfs file:
>> /sys/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/scsi_host/host2/proc_name
>> [1462582.791973] CPU 5
>> [1462582.795965] Modules linked in: ufs qnx4 hfsplus hfs
>> minix ntfs vfat msdos fat jfs xfs exportfs netconsole drbd

> quite a number of filesystems you have :)

>> lru_cache sch_hfsc ip6_queue act_police cls_flow cls_fw cls_u32
>> sch_htb sch_ingress sch_sfq xt_realm iptable_raw ip6t_LOG
>> xt_connlimit ip6table_raw ipt_ULOG ipt_REJECT ipt_REDIRECT
>> ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP
>> ipt_ah xt_comment ipt_addrtype ip6t_REJECT nf_nat_tftp
>> nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre
>> nf_nat_irc xt_recent nf_nat_h323 nf_nat_ftp nf_nat_amanda
>> ip6table_mangle nf_conntrack_ipv6 nf_conntrack_sane
>> nf_conntrack_tftp nf_conntrack_proto_udplite nf_conntrack_sip
>> nf_conntrack_proto_sctp ts_kmp nf_conntrack_pptp
>> nf_conntrack_proto_gre nf_conntrack_amanda nf_conntrack_netlink
>> xt_time nf_conntrack_netbios_ns xt_TCPMSS nf_conntrack_irc
>> xt_sctp nf_conntrack_h323 xt_policy nf_conntrack_ftp xt_TPROXY
>> nf_tproxy_core nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev
>> xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport
>> xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper
>> xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY
>> ipt_LOG xt_tcpudp xt_conntrack xt_state iptable_nat nf_nat
>> nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle
>> nfnetlink iptable_filter ip_tables ip6table_filter ip6_tables
>> x_tables ipmi_devintf ipmi_si ipmi_msghandler configfs tpm_tis
>> tpm tpm_bios psmouse i7core_edac edac_core ghes serio_raw
>> power_meter hed pcspkr dcdbas ses enclosure igb megaraid_sas
>> dca bnx2 [last unloaded: scsi_wait_scan]
>> [1462583.060009]
>> [1462583.063325] Pid: 4338, comm: apache2 Not tainted
>> 2.6.38.8-vs2.3.0.37-rc17-rol-em64t #1 Dell Inc. PowerEdge R610/086HF8
>> [1462583.085055] RIP: 0010:[<ffffffff8104ecaa>] [<ffffffff8104ecaa>]
>> task_rq_lock+0x4a/0xa0

I just tried to pinpoint the location based on my
2.6.38.8-vs2.3.0.37-rc17 kernel and I suspect that
task_rq(p) is causing this (for certain p), but
I was wondering why your task_rq_lock() is 0xa0
bytes in size, where mine is just 0x65 bytes ...

especially as the task_rq_lock function is quite
compact ...

could you upload the output of the folling commands
for me (executed in the build directory of your
kernel or with the vmlinux object file)

# objdump -t vmlinux | grep task_rq_lock
# objdump -d vmlinux --start-address=0x`objdump -t vmlinux | sed -n '/task_rq_lock/ {s/ .*//; p}'` | sed '/task>:/ Q'

thanks in advance,
Herbert

>> [1462583.101376] RSP: 0018:ffff88041e259dc8 EFLAGS: 00010082
>> [1462583.112307] RAX: 9066669066666605 RBX: 0000000000013c40 RCX:
>> ffffffff8160ae20
>> [1462583.126883] RDX: 0000000000000282 RSI: ffff88041e259e20 RDI:
>> 00007f51ec7ef410
>> [1462583.141460] RBP: ffff88041e259de8 R08: 0000000000989680 R09:
>> 0000000000000164
>> [1462583.156038] R10: 0000000000000001 R11: 0000000000000000 R12:
>> 00007f51ec7ef410
>> [1462583.170616] R13: ffff88041e259e20 R14: 0000000000013c40 R15:
>> 0000000000000005
>> [1462583.185193] FS: 00007f51eda1d6d0(0000) GS:ffff8800bf440000(0000)
>> knlGS:0000000000000000
>> [1462583.201674] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [1462583.213468] CR2: 00000000060a1d80 CR3: 000000041f120000 CR4:
>> 00000000000006e0
>> [1462583.228045] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [1462583.242627] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [1462583.257205] Process apache2 (pid: 4338, threadinfo ffff88041e258000,
>> task ffff88041ea0c500)
>> [1462583.274206] Stack:
>> [1462583.278556] 00007f51ec7ef410 ffff8802f6aa9eb8 000000000000000f
>> 0000000000000000
>> [1462583.293696] ffff88041e259e58 ffffffff8105cfbc ffff88041e259e48
>> 00000007812aa047
>> [1462583.308835] 0000000000000003 000000001e3d2030 ffff88041e259e28
>> 0000000000000282
>> [1462583.323969] Call Trace:
>> [1462583.329191] [<ffffffff8105cfbc>] try_to_wake_up+0x3c/0x410
>> [1462583.340642] [<ffffffff8105d3e5>] wake_up_process+0x15/0x20
>> [1462583.352091] [<ffffffff812710e0>] freeary+0x1e0/0x260
>> [1462583.362503] [<ffffffff812721b1>] T.623+0x71/0xf0
>> [1462583.372223] [<ffffffff81169685>] ? vfs_write+0x125/0x190
>> [1462583.383326] [<ffffffff81272299>] sys_semctl+0x69/0xa0
>> [1462583.393911] [<ffffffff8100bf82>] system_call_fastpath+0x16/0x1b

> please try to feed those addresses and the one below (RIP)
> through addr2line -e vmlinux (using the build tree of that
> kernel)

>> [1462583.406222] Code: 00 48 c7 c3 40 3c 01 00 49 89 fc 49 89 f5 9c 58 0f
>> 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 49 89 55 00 49 8b 44 24 08 49 89
>> de <8b> 40 18 4c 03 34 c5 00 d4 ab 81 4c 89 f7 e8 43 14 54 00 49 8b
>> [1462583.444912] RIP [<ffffffff8104ecaa>] task_rq_lock+0x4a/0xa0
>> [1462583.456545] RSP <ffff88041e259dc8>
>> [1462583.464160] ---[ end trace e26d734810b28493 ]---

>> Some more info:
>> root@hosting05:~ # uptime
>> 16:51:54 up 16 days, 20:11, 1 user, load average: 9.27, 4.24, 1.85

>> "vserver enter" hangs with:
>> root@vhost01:~ # w
>> 16:52:52 up 16 days, 23:22, 4 users, load average: 12.40, 6.03, 2.61
>> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
>> root pts/1 reserved-225136. 16:51 1:35 0.05s 0.05s /bin/bash
>> /usr/sbin/vserver ----nonamespace hosting05 enter
>> root pts/0 reserved-225136. 16:50 39.00s 0.06s 0.00s /bin/grep
>> -q ^ns[[:space:]] /proc/cgroups
>> root pts/3 reserved-225136. 16:52 28.00s 0.00s 0.00s -bash
>> root pts/4 reserved-225136. 16:52 0.00s 0.00s 0.00s w

>> Also "vserver stop" hangs.

>> Now I switched back to 2.6.32.41-vs2.3.0.36.29.7 which works
>> stable on other servers about 37 days.

> might be fixed in .41 or might just be less likely to
> happen, after all, 16 days is not something you consider
> easily reproduceable ...

> in any case, it doesn't really look Linux-VServer specific,
> but let's see what the addr2line gives ...

> thanks,
> Herbert

>> Thanks and regards
>> Urban Loesch
Received on Sat Aug 6 13:01:58 2011

[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]
Generated on Sat 06 Aug 2011 - 13:01:58 BST by hypermail 2.1.8