IO::Socket::INET怎么比LWP::UserAgent慢那么多啊?

IO::Socket::INET怎么比LWP::UserAgent慢那么多啊?

测试脚本:test.pl

#!/usr/bin/perl


use strict;
use IO::Socket;
use LWP::UserAgent;

my $host = "bbs.chinaunix.net";
my $html = "";
my $start = "";
my $end = "";

my $ua = LWP::UserAgent -> new;
$ua -> timeout( 30 );
$ua -> agent( 'Mozilla/4.76 [en] (Win98; U)' );
$ua -> default_header( 'Pragma' => 'no-cache', 'Accept' => '*/*');

$start = time;
print "LWP strat " . $start . "\n";
my $resp = $ua -> get( "http://$host/" );
if ( $resp -> is_success )
    {
    $html = $resp->status_line ."\n\n";
    $html .= $resp->headers_as_string ."\n\n";
    $html .= $resp -> content;
    }
$end = time;
print "LWP end " . $end . "\n";
print "LWP time = " . ($end - $start) . "\n\n";

$html = "";
my $geturl=qq~GET / HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.76 [en] (Win98; U)
Host: $host
Connection: Keep-Alive

~;

$start = time;
print "IO::Socket::INET strat " . $start . "\n";

my $socket = IO::Socket::INET->new(Proto=>"tcp", PeerAddr=>"$host", PeerPort=>80, Timeout => 30);
#$socket->autoflush(1);

my @html;
if($socket){
print $socket "$geturl";
print "IO::Socket::INET time1 " . time . "\n";
@html=<$socket>; # 注意这里,为什么用了那么多的时间啊

print "IO::Socket::INET time2 " . time . "\n";
close($socket);
}else{
}
$html = join("",@html);
$end = time;
print "IO::Socket::INET end " . $end . "\n";
print "IO::Socket::INET time = " . ($end - $start) . "\n\n";


运行结果:

[Copy to clipboard] [ - ]
CODE:
C:\>perl test.pl
LWP strat 1192833431
LWP end 1192833432
LWP time = 1

IO::Socket::INET strat 1192833432
IO::Socket::INET time1 1192833432
IO::Socket::INET time2 1192833448
IO::Socket::INET end 1192833448
IO::Socket::INET time = 16

为什么差那么远啊?

可以看到,是以下那行代码占用了大量时间:
@html=<$socket>; # 注意这里,为什么用了那么多的时间啊


这个还能优化么?






改进了,但有一个大问题,就是数据少了最后一部份。如果用http(80)就是好的,用https(443)就不对了,已证实那个Content-Length值是对的。大家看看是什么原因啊

改进:
#!/usr/bin/perl


use strict;
use IO::Socket;

my $html = "";
my $geturl=qq~GET / HTTP/1.1
Accept: */*
User-Agent: Mozilla/4.76 [en] (Win98; U)
Host: www.paycenter.com.cn
Connection: Keep-Alive

~;

use IO::Socket::SSL;
my $socket = IO::Socket::SSL->new(Proto=>"tcp", PeerAddr=>"www.paycenter.com.cn", PeerPort=>443, Timeout => 30);
$socket->autoflush(1);

if($socket){
print $socket "$geturl";
my $headers = "";
my $length = 0;
while (<$socket>) { last if /^\015?\012/; $headers.= $_; if ($_ =~ /Content\-Length\:\s([0-9]+)/) { $length = $1; } } $headers =~ s/\015?\012[ \t]+/ /g;
print "length = $length\n";
#$socket->read ($html, $length, 0);       # 无论用 read 还是 sysread 都没法完整读出

$socket->sysread ($html, $length, 0);
$socket->close ();
}else{
}




改进了,但有一个大问题,就是数据少了最后一部份。如果用http(80)就是好的,用https(443)就不对了,已证实那个Content-Length值是对的。大家看看是什么原因啊

因为你自己读的话, 如果服务器不主动关闭socket, 你就会一直阻塞.

而lwp, 如果响应头有content-length的话, 它就会读取这么多字节后返回. 如果没有, 人家也有智能超时机制, 几秒之内不再读到数据就当是读完了.
后来我先获得content-length再读就正常了。但IO::Socket::SSL(访问https)下,根据content-length读回来的数据总是少了一部份,为什么啊?
关注!
ssl加密后是不是有影响,content-length是服务器上网页原始长度,还是不一样的值
也许确如楼上所说, 没试过.

建议你读到</html>停止读取.


QUOTE:
原帖由 formalin14 于 2007-10-23 10:08 发表
也许确如楼上所说, 没试过.

建议你读到停止读取.

这样不好吧,很容易读错数据,况且如果是二进制文件呢。。。