# 威联通NAS夜半死机:未操作时停止响应,只能强行关机 ## 症状 最近碰到了比较折腾人的问题,每天夜间一点到四点,NAS会在某一时间停止响应,并且不会自动恢复,直到强行关机重启才能恢复。 根据阿里云自动监控nas上个人网站的记录,大约是在一点到四点之间停止响应,但阿里云有时候延迟很多,实际停止响应的时间应该比这早。 昨天我打开网页控制台并持续录屏,发现失去响应前后,Qboost和资源监控器的显示没有明显变化(或者是死机太快来不及反映在网页界面上),就是突然多出一个CPU占用%30的系统内核进程,然后网页就断开了。 失去响应期间,ssh无法登录,用ssh -vvv尝试登录发现卡在“we sent a key, wait for response”步骤。但是看硬盘灯的话,`QVR Pro`似乎还能工作。 第一次失去响应的时候,因为以前遇到过确实是有比较重的任务,就等了一天,结果还是不能用。第二天等了半天也是不行。于是第三天第四天就直接强关了,记得重新开机后检查文件系统。 ## 一些尝试 首先被怀疑的是自动缩略图生成服务,这东西耗CPU我是知道的。这两天刚好下载了几个蓝光,会不会是缩略图生成累死了机器?尝试了禁止为视频文件生成缩略图,甚至关掉这个功能,但第二天起来还是死机。 然后怀疑`hybrid backup`的计划备份,因为第一天出问题是四点,而我依稀记得设置了三点左右自动备份,而且第二天强关后的错误记录里面有一条“因为异常断电,备份没有完成”。于是把这个时间调整到一点,于是下次死机的时间变成了两点多(看阿里云报错),我以为找对了,是以前创建的备份任务与新版不兼容。但重新设置了备份后,第三天还是死机。并且我手动启动了一次备份,并没有死机。 查相关故障,很难搜,只是有一个去年的“360发现威联通平台挖矿病毒”的消息,查了一下不像。毕竟如果一个作者注意到要在白天挺直挖矿,却在晚上全速运行到死机,是有点精神分裂的。 找了一圈找设置在晚上的定时任务,没找到时间完全对得上的。 后面就没辙了,第三天我让电脑打开网页界面,整晚上开着,然后开了个录屏,要看看到底是何方神圣,结果啥也没看到。就是有一瞬间屏幕中出现了一个占30% CPU的“内核进程”,然后网页界面停止刷新,一分钟后报错显示断开,没了。 {{:hardware:nas:录屏截图-正常运行.png?200|}} {{:hardware:nas:录屏截图-出现了占用较高的内核进程_随后网页停止刷新.png?200|}} {{:hardware:nas:录屏截图-网页界面停止响应并最终提示断开.png?200|}} ## 日志分析 于是我觉得这事儿不能忍了,打算报威联通技术支持,报支持过程中发现需要详细日志,可以用“支持平台”APP生成,这个我之前就卸载了,现在只好装回来,然后生成一份日志(zip包),填信息报支持,然后自己分析。 通过时间对应,发现确实有四次彻底死机,分布在凌晨一点到四点之间,特征是在这之前一直陆续有`kernel message`刷新,但某一瞬间过后完全没有日志,下一条就是第二天上午或下午的开机信息。这说明情况起码严重到了日志都写不下去的地步。 于是分析司机前一瞬间的共同特征,倒是挺明显的,类似以下的一端日志在每次死机之前都出现了,在其他时候一次都没出现: ``` 2022-01-11 03:00:59 +08:00 <4> [58878.505271] EXT4-fs (dm-114): Mount option "noacl" will be removed by 3.5 2022-01-11 03:00:59 +08:00 <4> [58878.505271] Contact linux-ext4@vger.kernel.org if you think we should keep it. 2022-01-11 03:00:59 +08:00 <4> [58878.505271] 2022-01-11 03:00:59 +08:00 <6> [58878.676229] ext4_init_reserve_inode_table0: dm-114, 24576 2022-01-11 03:00:59 +08:00 <6> [58878.681715] ext4_init_reserve_inode_table2: dm-114, 24576, 0, 0, 4096 2022-01-11 03:00:59 +08:00 <6> [58878.725057] EXT4-fs (dm-114): orphan cleanup on readonly fs 2022-01-11 03:01:00 +08:00 <6> [58879.015642] EXT4-fs (dm-114): 16 orphan inodes deleted 2022-01-11 03:01:00 +08:00 <6> [58879.183505] EXT4-fs (dm-114): recovery complete 2022-01-11 03:01:00 +08:00 <6> [58879.320703] EXT4-fs (dm-114): mounted filesystem with ordered data mode. Opts: usrjquota=aquota.user,jqfmt=vfsv1,user_xattr,data=ordered,data_err=abort,delalloc,nopriv,nodiscard,noacl 2022-01-11 03:01:03 +08:00 <6> [58882.542275] ext4_init_reserve_inode_table0: dm-114, 24576 2022-01-11 03:01:03 +08:00 <6> [58882.547749] ext4_init_reserve_inode_table2: dm-114, 24576, 0, 0, 4096 2022-01-11 03:01:03 +08:00 <6> [58882.617280] EXT4-fs (dm-114): mounted filesystem with ordered data mode. Opts: usrjquota=aquota.user,jqfmt=vfsv1,user_xattr,data=ordered,data_err=abort,delalloc,nopriv,nodiscard,acl 2022-01-11 03:01:13 +08:00 <4> [58892.638787] ext2 filesystem being mounted at /tmp/nasconfig_tmp supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:01:20 +08:00 <4> [58898.975028] ext2 filesystem being mounted at /tmp/nasconfig_tmp supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:01:30 +08:00 <4> [58908.944119] ext2 filesystem being mounted at /tmp/nasconfig_tmp supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:01:33 +08:00 <4> [58912.249198] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:01:33 +08:00 <4> [58912.299864] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:01:34 +08:00 <4> [58912.948026] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:01:37 +08:00 <4> [58916.328700] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:01:37 +08:00 <4> [58916.363656] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:02:23 +08:00 <4> [58962.421353] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:02:43 +08:00 <4> [58981.940206] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:02:43 +08:00 <4> [58982.059447] ext2 filesystem being mounted at /tmp/config supports timestamps until 2038 (0x7fffffff) 2022-01-11 03:14:11 +08:00 <4> [59670.847551] EXT4-fs (dm-53): Mount option "noacl" will be removed by 3.5 2022-01-11 03:14:11 +08:00 <4> [59670.847551] Contact linux-ext4@vger.kernel.org if you think we should keep it. 2022-01-11 03:14:11 +08:00 <4> [59670.847551] 2022-01-11 03:14:12 +08:00 <6> [59671.021426] ext4_init_reserve_inode_table0: dm-53, 24576 2022-01-11 03:14:12 +08:00 <6> [59671.026806] ext4_init_reserve_inode_table2: dm-53, 24576, 0, 0, 4096 2022-01-11 03:14:12 +08:00 <6> [59671.036967] EXT4-fs (dm-53): orphan cleanup on readonly fs 2022-01-11 03:14:12 +08:00 <6> [59671.346769] EXT4-fs (dm-53): 16 orphan inodes deleted 2022-01-11 03:14:12 +08:00 <6> [59671.642693] EXT4-fs (dm-53): recovery complete 2022-01-11 03:14:12 +08:00 <6> [59671.820782] EXT4-fs (dm-53): mounted filesystem with ordered data mode. Opts: usrjquota=aquota.user,jqfmt=vfsv1,user_xattr,data=ordered,data_err=abort,delalloc,nopriv,nodiscard,noacl 2022-01-11 03:17:06 +08:00 <3> [59845.231117] INFO: task python:2884 blocked for more than 122 seconds. 2022-01-11 03:17:06 +08:00 <3> [59845.237600] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:06 +08:00 <3> [59845.243367] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:06 +08:00 <6> [59845.251231] task:python state:D stack: 0 pid: 2884 ppid: 1 flags:0x00000000 2022-01-11 03:17:06 +08:00 <6> [59845.259602] Call Trace: 2022-01-11 03:17:06 +08:00 <6> [59845.262079] __schedule+0x1dd/0x610 2022-01-11 03:17:06 +08:00 <6> [59845.265596] schedule+0x41/0xa0 2022-01-11 03:17:06 +08:00 <6> [59845.268764] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:06 +08:00 <6> [59845.272797] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.276743] __percpu_down_read+0x53/0x60 2022-01-11 03:17:06 +08:00 <6> [59845.280777] vfs_write+0x2ad/0x2e0 2022-01-11 03:17:06 +08:00 <6> [59845.284199] ksys_write+0x50/0xc0 2022-01-11 03:17:06 +08:00 <6> [59845.287530] do_syscall_64+0x2d/0x40 2022-01-11 03:17:06 +08:00 <6> [59845.291127] entry_SYSCALL_64_after_hwframe+0x44/0xa9 2022-01-11 03:17:06 +08:00 <6> [59845.296200] RIP: 0033:0x7f49325415d0 2022-01-11 03:17:06 +08:00 <6> [59845.299793] RSP: 002b:00007ffdaf30bb78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 2022-01-11 03:17:06 +08:00 <6> [59845.307380] RAX: ffffffffffffffda RBX: 00000000000000a1 RCX: 00007f49325415d0 2022-01-11 03:17:06 +08:00 <6> [59845.314536] RDX: 00000000000000a1 RSI: 00007f493334c000 RDI: 0000000000000005 2022-01-11 03:17:06 +08:00 <6> [59845.321689] RBP: 00007f493334c000 R08: 0000000001a4bc60 R09: 00007f4933343700 2022-01-11 03:17:06 +08:00 <6> [59845.328843] R10: 00007f4933343700 R11: 0000000000000246 R12: 0000000001a4bb80 2022-01-11 03:17:06 +08:00 <6> [59845.336002] R13: 00000000000000a1 R14: 00007f492d11e820 R15: 00007f492d127b90 2022-01-11 03:17:06 +08:00 <3> [59845.343338] INFO: task dockerd:5018 blocked for more than 122 seconds. 2022-01-11 03:17:06 +08:00 <3> [59845.349891] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:06 +08:00 <3> [59845.355666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:06 +08:00 <6> [59845.363520] task:dockerd state:D stack: 0 pid: 5018 ppid: 2884 flags:0x00004000 2022-01-11 03:17:06 +08:00 <6> [59845.371895] Call Trace: 2022-01-11 03:17:06 +08:00 <6> [59845.374362] __schedule+0x1dd/0x610 2022-01-11 03:17:06 +08:00 <6> [59845.377876] schedule+0x41/0xa0 2022-01-11 03:17:06 +08:00 <6> [59845.381083] io_schedule+0xd/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.384423] filemap_fault+0x3cd/0xa70 2022-01-11 03:17:06 +08:00 <6> [59845.388201] ? generic_perform_write+0x1e0/0x1e0 2022-01-11 03:17:06 +08:00 <6> [59845.392839] ext4_filemap_fault+0x27/0x40 2022-01-11 03:17:06 +08:00 <6> [59845.396882] __do_fault+0x2f/0xf0 2022-01-11 03:17:06 +08:00 <6> [59845.400225] handle_mm_fault+0xa7b/0xf00 2022-01-11 03:17:06 +08:00 <6> [59845.404168] exc_page_fault+0x23d/0x420 2022-01-11 03:17:06 +08:00 <6> [59845.408028] ? asm_exc_page_fault+0x8/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.412150] asm_exc_page_fault+0x1e/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.416179] RIP: 0033:0x4550ae 2022-01-11 03:17:06 +08:00 <6> [59845.419255] RSP: 002b:00007ffe6f2415c8 EFLAGS: 00010293 2022-01-11 03:17:06 +08:00 <6> [59845.424503] RAX: 000000007fffffff RBX: 0000000000000000 RCX: 000000c000f0be00 2022-01-11 03:17:06 +08:00 <6> [59845.431686] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 2022-01-11 03:17:06 +08:00 <6> [59845.438849] RBP: 00007ffe6f241890 R08: 0000000000000000 R09: 0000000000000000 2022-01-11 03:17:06 +08:00 <6> [59845.446000] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000033bc4f0 2022-01-11 03:17:06 +08:00 <6> [59845.453158] R13: 0000000004557a20 R14: 000000c000f0be00 R15: 00000000033bc4f0 2022-01-11 03:17:06 +08:00 <3> [59845.460641] INFO: task dockerd:5047 blocked for more than 123 seconds. 2022-01-11 03:17:06 +08:00 <3> [59845.467208] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:06 +08:00 <3> [59845.472973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:06 +08:00 <6> [59845.480821] task:dockerd state:D stack: 0 pid: 5047 ppid: 2884 flags:0x00000000 2022-01-11 03:17:06 +08:00 <6> [59845.489189] Call Trace: 2022-01-11 03:17:06 +08:00 <6> [59845.491660] __schedule+0x1dd/0x610 2022-01-11 03:17:06 +08:00 <6> [59845.495170] schedule+0x41/0xa0 2022-01-11 03:17:06 +08:00 <6> [59845.498332] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:06 +08:00 <6> [59845.502356] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.506297] __percpu_down_read+0x53/0x60 2022-01-11 03:17:06 +08:00 <6> [59845.510330] mnt_want_write+0x60/0x80 2022-01-11 03:17:06 +08:00 <6> [59845.514012] filename_create+0x5f/0x160 2022-01-11 03:17:06 +08:00 <6> [59845.517869] do_mknodat+0xd4/0x200 2022-01-11 03:17:06 +08:00 <6> [59845.521291] do_syscall_64+0x2d/0x40 2022-01-11 03:17:06 +08:00 <6> [59845.524890] entry_SYSCALL_64_after_hwframe+0x44/0xa9 2022-01-11 03:17:06 +08:00 <6> [59845.529966] RIP: 0033:0x4bf25a 2022-01-11 03:17:06 +08:00 <6> [59845.533039] RSP: 002b:000000c0008e1170 EFLAGS: 00000202 ORIG_RAX: 0000000000000103 2022-01-11 03:17:06 +08:00 <6> [59845.540627] RAX: ffffffffffffffda RBX: 000000c000060000 RCX: 00000000004bf25a 2022-01-11 03:17:06 +08:00 <6> [59845.547776] RDX: 00000000000011c0 RSI: 000000c000157a20 RDI: ffffffffffffff9c 2022-01-11 03:17:06 +08:00 <6> [59845.554924] RBP: 000000c0008e11e0 R08: 0000000000000000 R09: 0000000000000000 2022-01-11 03:17:06 +08:00 <6> [59845.562076] R10: 0000000000000000 R11: 0000000000000202 R12: ffffffffffffffff 2022-01-11 03:17:06 +08:00 <6> [59845.569252] R13: 0000000000000027 R14: 0000000000000026 R15: 000000000000002e 2022-01-11 03:17:06 +08:00 <3> [59845.576633] INFO: task dockerd:5116 blocked for more than 123 seconds. 2022-01-11 03:17:06 +08:00 <3> [59845.583189] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:06 +08:00 <3> [59845.588948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:06 +08:00 <6> [59845.596795] task:dockerd state:D stack: 0 pid: 5116 ppid: 2884 flags:0x00000000 2022-01-11 03:17:06 +08:00 <6> [59845.605170] Call Trace: 2022-01-11 03:17:06 +08:00 <6> [59845.607641] __schedule+0x1dd/0x610 2022-01-11 03:17:06 +08:00 <6> [59845.611156] schedule+0x41/0xa0 2022-01-11 03:17:06 +08:00 <6> [59845.614321] io_schedule+0xd/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.617652] filemap_fault+0x3cd/0xa70 2022-01-11 03:17:06 +08:00 <6> [59845.621430] ? generic_perform_write+0x1e0/0x1e0 2022-01-11 03:17:06 +08:00 <6> [59845.626068] ext4_filemap_fault+0x27/0x40 2022-01-11 03:17:06 +08:00 <6> [59845.630093] __do_fault+0x2f/0xf0 2022-01-11 03:17:06 +08:00 <6> [59845.633422] handle_mm_fault+0xa7b/0xf00 2022-01-11 03:17:06 +08:00 <6> [59845.637361] exc_page_fault+0x23d/0x420 2022-01-11 03:17:06 +08:00 <6> [59845.641215] ? asm_exc_page_fault+0x8/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.645332] asm_exc_page_fault+0x1e/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.649361] RIP: 0033:0x4550ae 2022-01-11 03:17:06 +08:00 <6> [59845.652436] RSP: 002b:00007f844d6be720 EFLAGS: 00010293 2022-01-11 03:17:06 +08:00 <6> [59845.657688] RAX: 000000007fffffff RBX: 0000000000000000 RCX: 000000c000f3ed80 2022-01-11 03:17:06 +08:00 <6> [59845.664848] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 2022-01-11 03:17:06 +08:00 <6> [59845.672001] RBP: 00007f844d6be9e8 R08: 0000000000000000 R09: 0000000000000000 2022-01-11 03:17:06 +08:00 <6> [59845.679150] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000334eff0 2022-01-11 03:17:06 +08:00 <6> [59845.686302] R13: 0000000004557a20 R14: 00000000024ed724 R15: 0000000000000000 2022-01-11 03:17:06 +08:00 <3> [59845.693677] INFO: task VM Periodic Tas:8389 blocked for more than 123 seconds. 2022-01-11 03:17:06 +08:00 <3> [59845.700919] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:06 +08:00 <3> [59845.706682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:06 +08:00 <6> [59845.714523] task:VM Periodic Tas state:D stack: 0 pid: 8389 ppid: 8314 flags:0x00000320 2022-01-11 03:17:06 +08:00 <6> [59845.722892] Call Trace: 2022-01-11 03:17:06 +08:00 <6> [59845.725365] __schedule+0x1dd/0x610 2022-01-11 03:17:06 +08:00 <6> [59845.728874] schedule+0x41/0xa0 2022-01-11 03:17:06 +08:00 <6> [59845.732037] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:06 +08:00 <6> [59845.736066] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.740004] __percpu_down_read+0x53/0x60 2022-01-11 03:17:06 +08:00 <6> [59845.744040] ext4_page_mkwrite+0x3bb/0x540 2022-01-11 03:17:06 +08:00 <6> [59845.748158] do_page_mkwrite+0x4f/0xb0 2022-01-11 03:17:06 +08:00 <6> [59845.751927] do_wp_page+0x20e/0x360 2022-01-11 03:17:06 +08:00 <6> [59845.755431] handle_mm_fault+0x300/0xf00 2022-01-11 03:17:06 +08:00 <6> [59845.759379] ? tick_sched_handle+0x20/0x50 2022-01-11 03:17:06 +08:00 <6> [59845.763494] exc_page_fault+0x23d/0x420 2022-01-11 03:17:06 +08:00 <6> [59845.767359] ? asm_exc_page_fault+0x8/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.771473] asm_exc_page_fault+0x1e/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.775497] RIP: 0033:0x7fbba7c237be 2022-01-11 03:17:06 +08:00 <6> [59845.779090] RSP: 002b:00007fbb77cfbd60 EFLAGS: 00010206 2022-01-11 03:17:06 +08:00 <6> [59845.784328] RAX: 000000001c2f81d8 RBX: 00007fbba8ffa420 RCX: 0000000000000002 2022-01-11 03:17:06 +08:00 <6> [59845.791474] RDX: 00007fbba7501080 RSI: 00007fbba828492c RDI: 00007fbba0019a60 2022-01-11 03:17:06 +08:00 <6> [59845.798620] RBP: 00007fbb77cfbd70 R08: 00000881df9ac29c R09: 0000000000000032 2022-01-11 03:17:06 +08:00 <6> [59845.805767] R10: 0000000006d3e5c2 R11: 0018e920e8f95290 R12: 0000000000000008 2022-01-11 03:17:06 +08:00 <6> [59845.812918] R13: 00007fbba0150d60 R14: 0000000000000002 R15: 0000000000000001 2022-01-11 03:17:06 +08:00 <3> [59845.820315] INFO: task cc3-fastcgi:19981 blocked for more than 123 seconds. 2022-01-11 03:17:06 +08:00 <3> [59845.827311] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:06 +08:00 <3> [59845.833074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:06 +08:00 <6> [59845.840922] task:cc3-fastcgi state:D stack: 0 pid:19981 ppid: 1 flags:0x00000000 2022-01-11 03:17:06 +08:00 <6> [59845.849287] Call Trace: 2022-01-11 03:17:06 +08:00 <6> [59845.851754] __schedule+0x1dd/0x610 2022-01-11 03:17:06 +08:00 <6> [59845.855260] schedule+0x41/0xa0 2022-01-11 03:17:06 +08:00 <6> [59845.858430] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:06 +08:00 <6> [59845.862459] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:06 +08:00 <6> [59845.866394] __percpu_down_read+0x53/0x60 2022-01-11 03:17:06 +08:00 <6> [59845.870426] vfs_write+0x2ad/0x2e0 2022-01-11 03:17:06 +08:00 <6> [59845.873845] ksys_write+0x50/0xc0 2022-01-11 03:17:06 +08:00 <6> [59845.877180] do_syscall_64+0x2d/0x40 2022-01-11 03:17:06 +08:00 <6> [59845.880779] entry_SYSCALL_64_after_hwframe+0x44/0xa9 2022-01-11 03:17:06 +08:00 <6> [59845.885854] RIP: 0033:0x7fd1c17ac84d 2022-01-11 03:17:06 +08:00 <6> [59845.889441] RSP: 002b:00007fd1827f9000 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 2022-01-11 03:17:06 +08:00 <6> [59845.897024] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fd1c17ac84d 2022-01-11 03:17:06 +08:00 <6> [59845.904172] RDX: 0000000000001000 RSI: 000000000184e5c0 RDI: 0000000000000009 2022-01-11 03:17:06 +08:00 <6> [59845.911316] RBP: 00000000017e36c8 R08: 0000000000000016 R09: 00007fd1827f9090 2022-01-11 03:17:06 +08:00 <6> [59845.918471] R10: 0000000000000002 R11: 0000000000000293 R12: 0000000000000000 2022-01-11 03:17:06 +08:00 <6> [59845.925626] R13: 000000000184e5c0 R14: 0000000000001000 R15: 0000000000001000 2022-01-11 03:17:07 +08:00 <3> [59845.933248] INFO: task cc3_watchdog.sh:13462 blocked for more than 123 seconds. 2022-01-11 03:17:07 +08:00 <3> [59845.940591] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:07 +08:00 <3> [59845.946356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:07 +08:00 <6> [59845.954202] task:cc3_watchdog.sh state:D stack: 0 pid:13462 ppid: 1 flags:0x00000000 2022-01-11 03:17:07 +08:00 <6> [59845.962572] Call Trace: 2022-01-11 03:17:07 +08:00 <6> [59845.965036] __schedule+0x1dd/0x610 2022-01-11 03:17:07 +08:00 <6> [59845.968544] schedule+0x41/0xa0 2022-01-11 03:17:07 +08:00 <6> [59845.971704] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:07 +08:00 <6> [59845.975734] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:07 +08:00 <6> [59845.979679] __percpu_down_read+0x53/0x60 2022-01-11 03:17:07 +08:00 <6> [59845.983704] mnt_want_write+0x60/0x80 2022-01-11 03:17:07 +08:00 <6> [59845.987386] filename_create+0x5f/0x160 2022-01-11 03:17:07 +08:00 <6> [59845.991238] do_mkdirat+0x49/0x170 2022-01-11 03:17:07 +08:00 <6> [59845.994652] do_syscall_64+0x2d/0x40 2022-01-11 03:17:07 +08:00 <6> [59845.998248] entry_SYSCALL_64_after_hwframe+0x44/0xa9 2022-01-11 03:17:07 +08:00 <6> [59846.003311] RIP: 0033:0x7f80307d52f7 2022-01-11 03:17:07 +08:00 <6> [59846.006898] RSP: 002b:00007fff1c4458c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000053 2022-01-11 03:17:07 +08:00 <6> [59846.014480] RAX: ffffffffffffffda RBX: 0000000001135008 RCX: 00007f80307d52f7 2022-01-11 03:17:07 +08:00 <6> [59846.021636] RDX: 00007fff1c4458f6 RSI: 00000000000001fd RDI: 00007fff1c4458e0 2022-01-11 03:17:07 +08:00 <6> [59846.028791] RBP: 00007fff1c445a00 R08: 00000000004adf48 R09: 0000000000000004 2022-01-11 03:17:07 +08:00 <6> [59846.035941] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000418e40 2022-01-11 03:17:07 +08:00 <6> [59846.043091] R13: 00007fff1c446830 R14: 0000000000000000 R15: 0000000000000000 2022-01-11 03:17:07 +08:00 <3> [59846.050452] INFO: task manager_svc:12583 blocked for more than 123 seconds. 2022-01-11 03:17:07 +08:00 <3> [59846.057444] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:07 +08:00 <3> [59846.063225] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:07 +08:00 <6> [59846.071064] task:manager_svc state:D stack: 0 pid:12583 ppid: 11331 flags:0x00000000 2022-01-11 03:17:07 +08:00 <6> [59846.079433] Call Trace: 2022-01-11 03:17:07 +08:00 <6> [59846.081935] __schedule+0x1dd/0x610 2022-01-11 03:17:07 +08:00 <6> [59846.085439] schedule+0x41/0xa0 2022-01-11 03:17:07 +08:00 <6> [59846.088600] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:07 +08:00 <6> [59846.092625] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:07 +08:00 <6> [59846.096565] __percpu_down_read+0x53/0x60 2022-01-11 03:17:07 +08:00 <6> [59846.100586] mnt_want_write+0x60/0x80 2022-01-11 03:17:07 +08:00 <6> [59846.104269] path_openat+0x42e/0xeb0 2022-01-11 03:17:07 +08:00 <6> [59846.107860] ? path_lookupat+0x77/0x120 2022-01-11 03:17:07 +08:00 <6> [59846.111710] ? filename_lookup+0x108/0x1a0 2022-01-11 03:17:07 +08:00 <6> [59846.115820] ? rcu_sched_clock_irq+0x2b0/0x770 2022-01-11 03:17:07 +08:00 <6> [59846.120286] do_filp_open+0x94/0x110 2022-01-11 03:17:07 +08:00 <6> [59846.123882] ? _copy_to_user+0x1c/0x30 2022-01-11 03:17:07 +08:00 <6> [59846.127645] ? cp_new_stat+0x154/0x190 2022-01-11 03:17:07 +08:00 <6> [59846.131411] ? _cond_resched+0x10/0x20 2022-01-11 03:17:07 +08:00 <6> [59846.135173] ? do_sys_openat2+0x1f3/0x390 2022-01-11 03:17:07 +08:00 <6> [59846.139203] do_sys_openat2+0x1f3/0x390 2022-01-11 03:17:07 +08:00 <6> [59846.143057] do_sys_open+0x55/0x70 2022-01-11 03:17:07 +08:00 <6> [59846.146479] do_syscall_64+0x2d/0x40 2022-01-11 03:17:07 +08:00 <6> [59846.150083] entry_SYSCALL_64_after_hwframe+0x44/0xa9 2022-01-11 03:17:07 +08:00 <6> [59846.155166] RIP: 0033:0x7f681e660d19 2022-01-11 03:17:07 +08:00 <6> [59846.158756] RSP: 002b:00007f681618d600 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 2022-01-11 03:17:07 +08:00 <6> [59846.166340] RAX: ffffffffffffffda RBX: 00007f67ec01b7e0 RCX: 00007f681e660d19 2022-01-11 03:17:07 +08:00 <6> [59846.173492] RDX: 0000000000000442 RSI: 000055abeb6160a0 RDI: 00000000ffffff9c 2022-01-11 03:17:07 +08:00 <6> [59846.180653] RBP: 000055abeb4122a3 R08: 0000000000000000 R09: 000055abeb4122a4 2022-01-11 03:17:07 +08:00 <6> [59846.187803] R10: 00000000000001b6 R11: 0000000000000293 R12: 0000000000001000 2022-01-11 03:17:07 +08:00 <6> [59846.194956] R13: 000055abeb4122a2 R14: 0000000000000000 R15: 0000000000000194 2022-01-11 03:17:07 +08:00 <3> [59846.202347] INFO: task WTJourn.Flusher:14439 blocked for more than 123 seconds. 2022-01-11 03:17:07 +08:00 <3> [59846.209695] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:07 +08:00 <3> [59846.215460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:07 +08:00 <6> [59846.223303] task:WTJourn.Flusher state:D stack: 0 pid:14439 ppid: 11331 flags:0x00000000 2022-01-11 03:17:07 +08:00 <6> [59846.231674] Call Trace: 2022-01-11 03:17:07 +08:00 <6> [59846.234147] __schedule+0x1dd/0x610 2022-01-11 03:17:07 +08:00 <6> [59846.237654] schedule+0x41/0xa0 2022-01-11 03:17:07 +08:00 <6> [59846.240844] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:07 +08:00 <6> [59846.244874] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:07 +08:00 <6> [59846.248811] __percpu_down_read+0x53/0x60 2022-01-11 03:17:07 +08:00 <6> [59846.252839] vfs_write+0x2ad/0x2e0 2022-01-11 03:17:07 +08:00 <6> [59846.256255] ksys_pwrite64+0x61/0x90 2022-01-11 03:17:07 +08:00 <6> [59846.259850] do_syscall_64+0x2d/0x40 2022-01-11 03:17:07 +08:00 <6> [59846.263442] entry_SYSCALL_64_after_hwframe+0x44/0xa9 2022-01-11 03:17:07 +08:00 <6> [59846.268511] RIP: 0033:0x7f1b2aa8b04f 2022-01-11 03:17:07 +08:00 <6> [59846.272096] RSP: 002b:00007f1b1f75e410 EFLAGS: 00000293 ORIG_RAX: 0000000000000012 2022-01-11 03:17:07 +08:00 <6> [59846.279679] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007f1b2aa8b04f 2022-01-11 03:17:07 +08:00 <6> [59846.286829] RDX: 0000000000000080 RSI: 000055edd75b8000 RDI: 0000000000000012 2022-01-11 03:17:07 +08:00 <6> [59846.293977] RBP: 000055edd75b8000 R08: 0000000000000000 R09: 0000000000000000 2022-01-11 03:17:07 +08:00 <6> [59846.301120] R10: 00000000000f6100 R11: 0000000000000293 R12: 0000000000000080 2022-01-11 03:17:07 +08:00 <6> [59846.308265] R13: 00000000000f6100 R14: 000055edd4d50b80 R15: 00000000000f6100 2022-01-11 03:17:07 +08:00 <3> [59846.315562] INFO: task ftdc:14484 blocked for more than 123 seconds. 2022-01-11 03:17:07 +08:00 <3> [59846.321938] Tainted: P U O 5.10.60-qnap #1 2022-01-11 03:17:07 +08:00 <3> [59846.327697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2022-01-11 03:17:07 +08:00 <6> [59846.335541] task:ftdc state:D stack: 0 pid:14484 ppid: 11331 flags:0x00000000 2022-01-11 03:17:07 +08:00 <6> [59846.343906] Call Trace: 2022-01-11 03:17:07 +08:00 <6> [59846.346372] __schedule+0x1dd/0x610 2022-01-11 03:17:07 +08:00 <6> [59846.349910] schedule+0x41/0xa0 2022-01-11 03:17:07 +08:00 <6> [59846.353076] percpu_rwsem_wait+0xd3/0x180 2022-01-11 03:17:07 +08:00 <6> [59846.357103] ? percpu_up_write+0x30/0x30 2022-01-11 03:17:07 +08:00 <6> [59846.361039] __percpu_down_read+0x53/0x60 2022-01-11 03:17:07 +08:00 <6> [59846.365064] mnt_want_write+0x60/0x80 2022-01-11 03:17:07 +08:00 <6> [59846.368753] path_openat+0x42e/0xeb0 2022-01-11 03:17:07 +08:00 <6> [59846.372346] ? release_pages+0x23e/0x300 2022-01-11 03:17:07 +08:00 <6> [59846.376284] do_filp_open+0x94/0x110 2022-01-11 03:17:07 +08:00 <6> [59846.379879] ? do_sys_openat2+0x1f3/0x390 2022-01-11 03:17:07 +08:00 <6> [59846.383914] do_sys_openat2+0x1f3/0x390 2022-01-11 03:17:07 +08:00 <6> [59846.387769] ? hrtimer_interrupt+0x14d/0x260 2022-01-11 03:17:07 +08:00 <6> [59846.392054] do_sys_open+0x55/0x70 2022-01-11 03:17:07 +08:00 <6> [59846.395487] do_syscall_64+0x2d/0x40 2022-01-11 03:17:07 +08:00 <6> [59846.399081] entry_SYSCALL_64_after_hwframe+0x44/0xa9 2022-01-11 03:17:07 +08:00 <6> [59846.404144] RIP: 0033:0x7f1b2a797d19 2022-01-11 03:17:07 +08:00 <6> [59846.407730] RSP: 002b:00007f1b1c7584b0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 2022-01-11 03:17:07 +08:00 <6> [59846.415311] RAX: ffffffffffffffda RBX: 000055edd9aadb40 RCX: 00007f1b2a797d19 2022-01-11 03:17:07 +08:00 <6> [59846.422460] RDX: 0000000000000241 RSI: 000055edd4b189e0 RDI: 00000000ffffff9c 2022-01-11 03:17:07 +08:00 <6> [59846.429603] RBP: 000055edd0d967dc R08: 0000000000000000 R09: 000055edd0d967dd 2022-01-11 03:17:07 +08:00 <6> [59846.436748] R10: 00000000000001b6 R11: 0000000000000293 R12: 0000000000000004 2022-01-11 03:17:07 +08:00 <6> [59846.443892] R13: 000055edd0d967db R14: 000055edd1b76048 R15: 000055edd1b75a10 ``` 首先是几条连续的`ext2 filesystem being mounted at /tmp/nasconfig_tmp supports timestamps until 2038`信息,表示连续进行了一些`mount`,然后是若干程序严重卡住的信息(blocked for more than 123 seconds),以及一些相关的堆栈、十六进制信息等。 ``` INFO: task manager_svc:12583 blocked for more than 123 seconds. Tainted: P U O 5.10.60-qnap #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message ``` 容易查出来`blocked for more than 123 seconds`是内存脏数据太多,来不及写到磁盘而撑爆缓存的消息,说明此时突然出现了大量的IO或者内存占用,但`123 seconds`说明这已经是死机两分钟之后了。而这个条消息与上面的大量`mount`之间似乎没有有用信息了。 **其实还有,但我当时没有想到。** 当时我想的是,书到用时方恨少,也许可以通过一些调试办法找原因,但是太难了。`crontab`也没发现与这个时间相关联的东西。 ## 外援 结果威联通的回复比我想的快,午饭过后就看到了邮件,很直白的让我: ``` > 把区域设置修改为“全球” > 禁用再启用一下“Malware Remover” > 再把区域设置改回去 ``` 这个自信的语气听起来很靠谱,我赶紧去找被遗忘了的“Malware Remover”,先打开看看,发现打开后居然是个错误页面,这就更有谱了,照邮件里面说的做一遍,发现“Malware Remover”能打开了,显示我设置了一个每晚一点钟的的病毒扫描。 试着手动执行了一次错误扫描,没出问题,没有十分卡顿。再去生成一份详细日志,发现最新日志里面出现了一系列的`ext2 filesystem being mounted at /tmp/nasconfig_tmp supports timestamps until 2038 (0x7fffffff)`,但没有后续的IO问题报告。 并且,不知道是不是心理作用,总感觉响应比前几天灵敏了一些。 就这样过了24小时,没有再次死机,功能都正常,这个问题算是解决了。 复盘的话,大概是`Malware Remover`在最近的某次更新里面出了什么问题,同时,设定在一点的扫描在遇到其他定时任务时会自动让路,或者由于更新的问题导致`Malware Remover`在夜里三四点更新后会再扫描一次,最终导致了这一系列似乎随机又似乎有规律的死机事件。 ## 失误总结 以下失误,都是从当时的信息出发应该能够做到的,但没有做到,拖延了排错进程。 1. 寻找定时任务的时候,没有检查`Malware Remover`,否则应该可以更早发现这个APP有问题——不过试了一下,从那个错误信息也搜不到特别有用的东西; 2. 发现“威联通挖矿病毒”的消息的时候,本应该手动拿`Malware Remover`查一下,但实际上只是按消息内容确认了没有可疑文件;当然这里有威联通的锅,在应用商店卖`McAfee`,大家当然本能觉得`Malware Remover`不是什么好货。 3. 发现死机与连串`ext2 filesystem being mounted`的相关性后,在日志中搜`"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message`发现特异性很强,只出现了四组,而`ext2 filesystem being mounted`则散布的比较多,就没有深入挖掘。实际上,`ext2 filesystem being mounted`虽然很多,但连串的则只有在`Malware Remover`运行时才能发现,大概每天一两次。如果进一步按连串消息的出现时间去对应系统事件日志,应该可以发现这种日志代表`Malware Remover`的后台运行。 4. 但是,即使发现了问题与`Malware Remover`相关,网上搜不到也想不到改区域-禁用-启用-改回来这种骚操作。下次一有问题就立即叫技术支持,劳资付过钱的。 ## 淦! 刚才写完文章,去看了一眼威联通的官网,结果居然在页面顶部看到一个红色弹窗,没有标日期,但肯定是最近的(否则不会在其他不相干页面弹窗): ``` 如何解決 Malware Remover 服務更新所造成的 NAS 連線問題 近日 Malware Remover 服務更新導致部分 QNAP NAS 出現連線異常問題。使用者如於 Malware Remover 的設定選項中啟用「自動更新」將受此問題影響。我們已經再次調整 Malware Remover 服務並修正此問題。 如您的 NAS 完成 Malware Remover 服務更新後發生停止回應或是無法存取的問題,請依照以下步驟手動重啟您的 NAS: 持續按壓電源鍵直到 NAS 完成斷電關機 再次按壓電源鍵重新啟動 NAS 受影響的 NAS 經過手動重啟後即可恢復正常運作。如果仍有異常情況或疑問,請聯繫 QNAP 技術支援尋求協助。 ``` 威联通你二大爷…… 不过即使我早看到了这个通知,强关再打开这种操作仍然解决不了我的问题,还是需要改区域-禁用再启用-再把区域改回来这种骚操作。之于为啥不把这个操作放到通知里面,大概是因为太丢人吧……