能优化一下这段代码吗?
head test
221.122.57.34 - - [03/Aug/2007:14:37:51 +0800] "GET / HTTP/1.1" 200 1062
221.122.57.34 - - [03/Aug/2007:14:37:54 +0800] "GET /apache_pb.gif HTTP/1.1" 200 2326
221.122.57.34 - - [03/Aug/2007:14:37:59 +0800] "GET /fds HTTP/1.1" 404 285
221.122.57.34 - - [03/Aug/2007:14:38:00 +0800] "GET / HTTP/1.1" 200 1062
221.122.57.34 - - [03/Aug/2007:15:06:27 +0800] "GET /htdocs/ HTTP/1.1" 200 666
221.122.57.34 - - [03/Aug/2007:15:06:27 +0800] "GET /icons/blank.gif HTTP/1.1" 200 148
221.122.57.34 - - [03/Aug/2007:15:06:27 +0800] "GET /icons/back.gif HTTP/1.1" 200 216
221.122.57.34 - - [03/Aug/2007:15:06:27 +0800] "GET /icons/folder.gif HTTP/1.1" 200 225
221.122.57.34 - - [03/Aug/2007:15:06:28 +0800] "GET /htdocs/View/ HTTP/1.1" 200 930
221.122.57.34 - - [03/Aug/2007:15:06:28 +0800] "GET /icons/compressed.gif HTTP/1.1" 200 1038
对apache的日志进行统计:要求:每一个ip第一次的开始访问时间及该ip最后一次访问时间,该ip连接的次数
上面那段日志结果应该为:
221.122.57.34 2007-08-03 14:37:51 2007-08-03 15:06:28 10
考虑使用%hash=("221.122.57.34"=>[2007-08-03 14:37:51,2007-08-03 15:06:28,10],"221.122.39.23"=>[2007-08-13 18:37:51,2007-09-03 16:06:48,67],)这样的结构,
写了如下脚本,
#!/usr/bin/perl -w
use strict;
my %hash;
my $fh;
my $file=shift;
my @line;
my ($date,$date_seconds,$seconds);
open( $fh,$file);
while(<$fh>
{
if(@line=$_=~ /(\d+\.\d+\.\d+\.\d+).*\[(\d+)\/(\w+)\/(\d+)\d+:\d+:\d+).*/)
{
my ($ipadd,$day,$mon,$year,$time)=@line;
$date=qx/date -d "$day$mon$year" +%F/;
chomp $date;
$date_seconds=qx/date -d "$date $time" +%s/;
chomp $date_seconds;
if(exists $hash{$ipadd})
{
$hash{$ipadd}[2]++;
if($hash{$ipadd}[1] eq "NULL"
{
$seconds=qx/date -d "$hash{$ipadd}[0]" +%s/;
chomp $seconds;
if(($date_seconds - $seconds) > 0)
{
$hash{$ipadd}[1]="$date $time";
}
}
else
{
$seconds=qx/date -d "$hash{$ipadd}[1]" +%s/;
chomp $seconds;
if(($date_seconds -$seconds) > 0)
{
$hash{$ipadd}[1]="$date $time";
}
}
}
else
{
$hash{$ipadd}[0]="$date $time";
$hash{$ipadd}[1]="NULL";
$hash{$ipadd}[2]=1;
}
}
}
close($fh);
foreach (keys %hash)
{
print "\n$_\t$hash{$_}[0]\t$hash{$_}[1]\t$hash{$_}[2]\n";
}
运行后,发现运行速度太慢!!!
apahce日志文件会很大的,这样的情况该怎么优化处理?????