关于下载二进制文件的问题?大侠们帮帮忙啊!
关于下载二进制文件的问题?大侠们帮帮忙啊!
1 前几天没事去pop看到一个帖子'http://pop.pcpop.com/t050418/1543497-1.html'
很多人回,想下载那个帖子里的动态图片,所以自己搞了一个小脚本,顺便熟悉一下LWP::UserAgent,
但是$response = $browser->get($1, ':content_cb' => \&callback );
:content_cb怎么用啊?感觉是下到内存里了,但是怎么做才能让他至少下到根目录呢?!我已经要崩溃了。
ps:我不想用WWW::Mechanize,TreeBuilder之类的。
大侠们指点一下吧,感激不禁
[quote]#!/usr/bin/perl -w
use strict; $|++;
use File::Basename;
use LWP 5.64;
open (URLS, ">urls.txt");
my %urls;
my @urls;
my $final_data; # our downloaded data.
my $total_size; # total size of the URL.
my $counter; my @animation = qw( \ | / - );
my $browser = LWP::UserAgent->new;
my $url = 'http://pop.pcpop.com/t050418/1543497-1.html';
my $response = $browser->get($url);die "Can't get $url -- ", $response->status_line
unless $response->is_success;
my $urls = $response->content;
while( $urls =~ m@<a href=\'(/t050418/1543497-\d+.html)\'>@gi ) {
print "$1\n";
#print URLS URI->new_abs( $1, $response->base ) ,"\n";
$urls{URI->new_abs( $1, $response->base )}++;
}
while (my ($url, $value) = each %urls) {
print "$url->$value\n";
my $response = $browser->get($url);
die "Can't get $url -- ", $response->status_line
unless $response->is_success;
my $html = $response->content;
while( $html =~ m@<IMG SRC=\'(http://\S+?\.gif)\' border=0>@gi ) {
print "$1\n";
print "Downloading URL at ", substr($1, 0, 25), "\nLoading... ";
my $filename = basename( $1 );
my $remote_headers = $response->headers;
$total_size = $remote_headers->content_length;
$response = $browser->get($1, ':content_cb' => \&callback );
}
}
close URLS;
sub callback {
my ($data, $response, $protocol) = @_;
$final_data .= $data;
print ".";
print "$animation[$counter++]\b";
$counter = 0 if $counter == scalar(@animation);
}[/quote]