读取文件更快的方法.

beckheng

UID: 3408
帖子: 87
积分: 200
在线时间: 10 小时

1^# beckheng 发表于 2006-11-26 14:30

读取文件更快的方法.

读取文件更快的方法.
此法我用于分析WEB日志.

[quote]
#!/usr/bin/perl

use strict;

my $r_count = 0;
my $file = shift;
my %page_view = ();

&process_file();

sub process_file(){
my $blksize = 8192;
my ($buf, $pre_buf, $lines);

open F, "$file" or die "open file for reading error : $!\n";
while (my $lines = sysread F, $buf, $blksize){
if (!defined $lines){
die "system read error: $!\n";
}

$buf = "$pre_buf$buf";
while ($buf =~ /\r?\n/gc){
}

$pre_buf = substr($buf, pos($buf));
$buf = substr($buf, 0, pos($buf));

&process_buffer(\$buf);
}
&process_buffer(\$pre_buf);

close F;
}

sub process_buffer{
my $buf_ref = shift;

#perl line process here
while ($$buf_ref =~ m!^(\d+\.\d+\.\d+\.\d+).*?\[(\d{2}\/\w{3}\/\d{4}).*?"(?:GET|POST) (.*) HTTP/1\.[01]" 200 .*? "(.*?)"!gm){
print STDERR "$r_count " if ($r_count++ % 200000 == 0);
my $t_ip = $1;
my $t_d = $2;
my $t_url = $3;
my $t_refer = $4;
if ($t_url =~ /(bj|sz)\.html$/i){
my $ad = $1;
#$t_d =~ s/(\d+)\/(\w+)\/(\d+)/"$3\/$mon_no{$2}\/$1"/e;
$t_url =~ s/^\/+//;
$t_url =~ s/\?.*$//;
$page_view{$t_d}++;
}
}
}
[/quote]

beckheng

UID: 3408
帖子: 87
积分: 200
在线时间: 10 小时

2^# beckheng 发表于 2006-11-26 15:32

anthony,把你的测试代码共.
anthony,把你的测试代码共享一下,我试试看效果是不是会更快一点.[CCB]6[/CCB]

beckheng

UID: 3408
帖子: 87
积分: 200
在线时间: 10 小时

3^# beckheng 发表于 2006-12-04 01:12

我用下面的代码测试了一下.
我用下面的代码测试了一下.
[quote]
#!/usr/bin/perl

use Benchmark qw(:all);
use strict;

$| = 1;

my $pre_buf = "aaaaa" x shift;

cmpthese(shift, {test1 => \&test1, test2 => \&test2});

sub test1{
my $buf1 = "haha";

$buf1 = "$pre_buf$buf1";
}

sub test2{
my $buf2 = "haha";

substr($buf2, 0, 0) = $pre_buf;
}
[/quote]