Linux磁盘空间剧增突发故障解决过程全记录
1.下午,用户反映,无法通过某台服务器向网络打印机打印文件.其它机器正常.
错误现象:
lpr: error - unable to print file: client-error-request-value-too-long
2.错误原因:
1. Are you trying to print a file >2GB? If so, that doesn't
work in CUPS 1.1.x and earlier.
2. Does the RequestRoot directory (/var/spool/cups by default)
exist? If not, "mkdir /var/spool/cups"
3. Does the TempDir directory (/var/spool/cups/tmp by default)
exist? If not, "mkdir /var/spool/cups/tmp"
4. Is the disk full? "df -k /var/spool/cups" will show if
this is the case. If the disk is full, delete files to
free up disk space.
3.判断是第四点原因造成
df -h,
/var 100% used.
4.定位
cd /var
find . -maxdepth 1 -type d -print | xargs du -sk | sort -rn
定位找到文件
/var/log/bandwidth 占用了19g空间
5.删除
rm -rf bandwidth
6.空间仍然没有施放
7.看cpu,rotate进程占用大量cpu,
kill -9
8.恢复正常,可以打印.
9.继续找原因
find /home -type f -ls | perl -e 'while(<>;){$s+=(split)[6];};print "$s\n";'
This will count all hidden files, including the ones from lost+found.
The number should be close to what how much space is used on disk.
If df shows a something way different you may want to run something like:
lsof | grep home
Look for some suspicious applications.
The idea is that if an application opens a file and the file is removed
while the application keeps it open, the actual data is not removed from
disk until the program exits or close the file.
My advice:
1. boot in single mode, recommended from a rescue disk so you have a
'clean' kernel.
2. test du/find versus df output.
3. if different run fsck with -f and, if you can afford to wait, -c flags.
4. I think du/find should show similar numbers now (if it doesn't and
you booted with a 'clean' kernel then I'll be interested for details)
5. reboot with the normal kernel
6. if df shows different than du then most likely you've been hacked -
replace OS with a clean copy - try to find what programs are different
than the original, etc.
删除文件,如果进程在的话,空间是不会被释放的.
10.检查发现/etc/syslog.conf
中的一行配置
kern.=debug -/var/log/bandwidth
安装webmin时候自动配置的.
注释掉,万事大吉了.