Hi Herbert,
Herbert Poetzl wrote:
> On Sat, Aug 06, 2011 at 03:01:27PM +0200, Urban Loesch wrote:
>
>
>> Am 06.08.2011 14:38, schrieb Herbert Poetzl:
>>> On Sat, Aug 06, 2011 at 02:13:37PM +0200, Urban Loesch wrote:
>
>>>> Hi Herbert,
>>>>> I just tried to pinpoint the location based on my
>>>>> 2.6.38.8-vs2.3.0.37-rc17 kernel and I suspect that
>>>>> task_rq(p) is causing this (for certain p), but
>>>>> I was wondering why your task_rq_lock() is 0xa0
>>>>> bytes in size, where mine is just 0x65 bytes ...
>>>>> especially as the task_rq_lock function is quite
>>>>> compact ...
>>>>> could you upload the output of the folling commands
>>>>> for me (executed in the build directory of your
>>>>> kernel or with the vmlinux object file)
>>>>> # objdump -t vmlinux | grep task_rq_lock
>>>> gives me only on line:
>>> that's fine, nothing more was expected :)
>
>>>> /usr/src/linux-2.6.38.8 # objdump -t vmlinux | grep task_rq_lock
>>>> ffffffff8104ec60 l F .text 000000000000009c task_rq_lock
>>>>> # objdump -d vmlinux --start-address=0x`objdump -t vmlinux | sed -n
>>>>> '/task_rq_lock/ {s/ .*//; p}'` | sed '/task>:/ Q'
>>>> more lines :-)
>>>> You can download it at http://www.enas.net/objdump.txt
>>> I copied the interesting part here:
>
>>> ffffffff8104ec60<task_rq_lock>:
>>> ffffffff8104ec60: 55 push %rbp
>>> ffffffff8104ec61: 48 89 e5 mov %rsp,%rbp
>>> ffffffff8104ec64: 48 83 ec 20 sub $0x20,%rsp
>>> ffffffff8104ec68: 48 89 1c 24 mov %rbx,(%rsp)
>>> ffffffff8104ec6c: 4c 89 64 24 08 mov %r12,0x8(%rsp)
>>> ffffffff8104ec71: 4c 89 6c 24 10 mov %r13,0x10(%rsp)
>>> ffffffff8104ec76: 4c 89 74 24 18 mov %r14,0x18(%rsp)
>>> ffffffff8104ec7b: e8 40 d0 fb ff callq
>>> ffffffff8100bcc0<mcount>
>
>>> this suggests that you have FTRACE enabled in your
>>> kernel and probably a bunch of related TRACERS
>
>>> ffffffff8104ec80: 48 c7 c3 40 3c 01 00 mov $0x13c40,%rbx
>>> ffffffff8104ec87: 49 89 fc mov %rdi,%r12
>>> ffffffff8104ec8a: 49 89 f5 mov %rsi,%r13
>>> ffffffff8104ec8d: ff 14 25 00 34 a1 81 callq *0xffffffff81a13400
>>> ffffffff8104ec94: 48 89 c2 mov %rax,%rdx
>>> ffffffff8104ec97: ff 14 25 10 34 a1 81 callq *0xffffffff81a13410
>>> ffffffff8104ec9e: 49 89 55 00 mov %rdx,0x0(%r13)
>>> ffffffff8104eca2: 49 8b 44 24 08 mov 0x8(%r12),%rax
>>> ffffffff8104eca7: 49 89 de mov %rbx,%r14
>>> ffffffff8104ecaa: 8b 40 18 mov 0x18(%rax),%eax
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>>> this is where your problem happens, and together with
>>> the information from the original post (which you
>>> removed) this looks like task_rq looking for a task at
>>> 0x9066669066666605 which is rather unlikely to be a
>>> valid address, so to me this looks like some kind of
>>> memory corruption, maybe caused by a completely
>>> different subsystem ...
>
>>> ffffffff8104ecad: 4c 03 34 c5 00 d4 ab add
>>> -0x7e542c00(,%rax,8),%r14
>>> ffffffff8104ecb4: 81
>>> ffffffff8104ecb5: 4c 89 f7 mov %r14,%rdi
>>> ffffffff8104ecb8: e8 43 14 54 00 callq
>>> ffffffff81590100<_raw_spin_lock>
>>> ffffffff8104ecbd: 49 8b 44 24 08 mov 0x8(%r12),%rax
>>> ffffffff8104ecc2: 8b 40 18 mov 0x18(%rax),%eax
>>> ffffffff8104ecc5: 48 8b 14 c5 00 d4 ab mov
>>> -0x7e542c00(,%rax,8),%rdx
>>> ffffffff8104eccc: 81
>>> ffffffff8104eccd: 48 8d 04 13 lea (%rbx,%rdx,1),%rax
>>> ffffffff8104ecd1: 49 39 c6 cmp %rax,%r14
>>> ffffffff8104ecd4: 75 18 jne
>>> ffffffff8104ecee<task_rq_lock+0x8e>
>>> ffffffff8104ecd6: 4c 89 f0 mov %r14,%rax
>>> ffffffff8104ecd9: 48 8b 1c 24 mov (%rsp),%rbx
>>> ffffffff8104ecdd: 4c 8b 64 24 08 mov 0x8(%rsp),%r12
>>> ffffffff8104ece2: 4c 8b 6c 24 10 mov 0x10(%rsp),%r13
>>> ffffffff8104ece7: 4c 8b 74 24 18 mov 0x18(%rsp),%r14
>>> ffffffff8104ecec: c9 leaveq
>>> ffffffff8104eced: c3 retq
>>> ffffffff8104ecee: 49 8b 75 00 mov 0x0(%r13),%rsi
>>> ffffffff8104ecf2: 4c 89 f7 mov %r14,%rdi
>>> ffffffff8104ecf5: e8 86 14 54 00 callq
>>> ffffffff81590180<_raw_spin_unlock_irqrestore>
>>> ffffffff8104ecfa: eb 91 jmp
>>> ffffffff8104ec8d<task_rq_lock+0x2d>
>>> ffffffff8104ecfc: 0f 1f 40 00 nopl 0x0(%rax)
>
>>> but that's just a wild guess and some hand waving, to
>>> get better information and more debug info we would
>>> first need to find a way to trigger this issue within
>>> a short amount of time, let's say a few minutes up to
>>> an hour or so ...
>
>> I have no idea how can I do this.
>
> setup a test system, where you can hit one of your guests
> (or several of them at once) with artificial stress tests
> (like a page impression performance test or so) and
> monitor the kernel output (best via serial console)
I installed a testsystem and installed a fresh new etch vserver with
apache2.
>
> compile the kernel with DEBUG_INFO enabled so that we
> can more precise pinpointing in case of a kernel trace
Kernel I compiled with DEBUG_INFO. Also VSERVER_DEBUG=y is included.
Now I have to find a way to test the performance...
>
>>> btw, does it happen on more than one system or is it
>>> always the same system?
>
>> This happend on two of my servers with heavy apache load.
>> DELL PE R610 and PE R410. Both with 32GB of ram.
>
>> The first time was in June on my R410 with kernel 2.6.38.6-vs2.3.0.37-rc15
>> It gets me the following error in syslog:
>
>> BUG: unable to handle kernel paging request at 00000000801f0f1d
>> [2472685.774427] IP: [<ffffffff8103695a>] task_rq_lock+0x4a/0xa0
>> [2472685.774505] PGD 3164b1067 PUD 0
>> [2472685.774574] Oops: 0000 [#1] SMP
>> [2472685.774647] last sysfs file: /sys/module/drbd/parameters/cn_idx
>> [2472685.774716] CPU 9
>> [2472685.774723] Modules linked in: dca netconsole configfs drbd
>> lru_cache sch_hfsc ip6_queue act_police cls_flow cls_fw cls_u32 sch_htb
>> sch_ingress sch_sfq xt_realm iptable_raw ip6t_LOG xt_connlimit ipt_ULOG
>> ipt_REJECT ip6table_raw ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN
>> ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic
>> nf_nat_sip xt_comment nf_nat_pptp ip6t_REJECT nf_nat_proto_gre
>> nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda xt_recent
>> ip6table_mangle nf_conntrack_ipv6 nf_conntrack_tftp nf_conntrack_sip
>> nf_conntrack_sane nf_conntrack_proto_udplite nf_conntrack_proto_sctp
>> nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink
>> nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323
>> nf_conntrack_ftp ts_kmp nf_conntrack_amanda xt_TPROXY nf_tproxy_core
>> nf_defrag_ipv6 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_NFLOG
>> nfnetlink_log xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE
>> xt_multiport ipt_LOG xt_mark xt_mac xt_limit iptable_nat xt_length
>> xt_iprange nf_nat xt_helper nf_conntrack_ipv4 xt_hashlimit
>> nf_defrag_ipv4 xt_DSCP xt_dscp iptable_mangle xt_dccp xt_connmark
>> xt_CLASSIFY xt_tcpudp xt_state xt_conntrack nf_conntrack nfnetlink
>> iptable_filter ip_tables ip6table_filter ip6_tables x_tables
>> ipmi_devintf ipmi_si ipmi_msghandler mptctl psmouse processor joydev
>> serio_raw tpm_tis tpm i7core_edac pcspkr tpm_bios edac_core dcdbas
>> power_meter button mptsas mptscsih mptbase usbhid bnx2
>> scsi_transport_sas hid e1000e [last unloaded: igb]
>> [2472685.776398]
>> [2472685.776459] Pid: 19778, comm: apache2 Not tainted
>> 2.6.38.6-vs2.3.0.37-rc15-rol-em64t #7 Dell Inc. PowerEdge R410/0N051F
>> [2472685.776608] RIP: 0010:[<ffffffff8103695a>] [<ffffffff8103695a>]
>> task_rq_lock+0x4a/0xa0
>> [2472685.776743] RSP: 0018:ffff880121b75dc8 EFLAGS: 00010082
>> [2472685.776811] RAX: 00000000801f0f05 RBX: 0000000000011d00 RCX:
>> 0000000000000040
>> [2472685.776936] RDX: 0000000000000282 RSI: ffff880121b75e20 RDI:
>> 00007f4cfa5fd9a0
>> [2472685.777057] RBP: ffff880121b75de8 R08: dead000000200200 R09:
>> dead000000100100
>> [2472685.777178] R10: dead000000200200 R11: dead000000100100 R12:
>> 00007f4cfa5fd9a0
>> [2472685.777299] R13: ffff880121b75e20 R14: 0000000000011d00 R15:
>> 0000000000000000
>> [2472685.777420] FS: 00007f4cfb0a2730(0000) GS:ffff8800bf480000(0000)
>> knlGS:0000000000000000
>> [2472685.777542] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [2472685.777608] CR2: 00000000801f0f1d CR3: 00000002884d7000 CR4:
>> 00000000000006e0
>> [2472685.777728] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [2472685.777848] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [2472685.777968] Process apache2 (pid: 19778, threadinfo
>> ffff880121b74000, task ffff8801167114d0)
>> [2472685.778091] Stack:
>> [2472685.778150] 00007f4cfa5fd9a0 ffff88044db61eb8 000000000000000f
>> 0000000000000000
>> [2472685.778280] ffff880121b75e58 ffffffff81040bd6 ffff880121b75e48
>> ffffffff81269f47
>> [2472685.778410] 0000000000000000 ffff88081eaafe30 ffff880121b75e28
>> 0000000000000282
>> [2472685.778540] Call Trace:
>> [2472685.778604] [<ffffffff81040bd6>] try_to_wake_up+0x36/0x310
>> [2472685.778671] [<ffffffff81269f47>] ? idr_remove+0x187/0x1f0
>> [2472685.778739] [<ffffffff81040f05>] wake_up_process+0x15/0x20
>> [2472685.778806] [<ffffffff81231580>] freeary+0x1e0/0x260
>> [2472685.778872] [<ffffffff81232631>] T.616+0x71/0xf0
>> [2472685.778939] [<ffffffff8113376b>] ? fput+0x1cb/0x280
>> [2472685.779005] [<ffffffff81232719>] sys_semctl+0x69/0xa0
>> [2472685.779074] [<ffffffff81002882>] system_call_fastpath+0x16/0x1b
>> [2472685.779142] Code: 00 48 c7 c3 00 1d 01 00 49 89 fc 49 89 f5 9c 58
>> 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 49 89 55 00 49 8b 44 24 08
>> 49 89 de <8b> 40 18 4c 03 34 c5 e0 b2 70 81 4c 89 f7 e8 73 93 4a 00 49 8b
>> [2472685.779515] RIP [<ffffffff8103695a>] task_rq_lock+0x4a/0xa0
>> [2472685.779584] RSP <ffff880121b75dc8>
>> [2472685.779647] CR2: 00000000801f0f1d
>> [2472685.780031] ---[ end trace 63c260cc5b2dc0a5 ]---
>
>> On this server are running 4 vServers (all with debian lenny and apache
>> installed).
>> On my PER610 is running only one vServer with debian etch (apache,
>> php5-fcgi, mysql 5.0).
>
> interestingly both kernel traces include sys_semctl, so
> it might be related to the code handling the semaphores
> maybe just try to bang on that a little (from inside the
> guest) to see if something triggers ...
Sorry but I have no idea how I can do that.
Thanks
Urban
>
> best,
> Herbert
>
>> thanks,
>> Urban
>
>>> best,
>>> Herbert
>
>>>> thanks,
>>>> Urban
>
Received on Wed Aug 10 10:16:45 2011