请高手帮助,看asp代码到perl的问题。。。

imxae

UID: 2790
帖子: 197
积分: 453
在线时间: 2 天 5 小时

1^# imxae 发表于 2005-07-03 09:04

请高手帮助,看asp代码到perl的问题。。。

请高手帮助,看asp代码到perl的问题。。。
以下asp代码的作用是把htm中的<a href="*">与<img src="*">中的*部分存入数组且没有重复,请高手指教一下,用perl的正则应该怎么写呢？

Dim re, RemoteFile, RemoteFileurl, SaveFileName, SaveFileType
Set re = new RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "((http|https|ftp|rtsp|mms):(\/\/|\\\\){1}(([A-Za-z0-9_-])+[.]){1,}(net|com|cn|org|cc|tv|[0-9]{1,3})(\S*\/)((\S)+[.]{1}(" & sExt & ")))"

Set RemoteFile = re.Execute(s_Content)
Dim a_RemoteUrl(), n, i, bRepeat
n = 0
' 转入无重复数据
For Each RemoteFileurl in RemoteFile
If n = 0 Then
n = n + 1
Redim a_RemoteUrl(n)
a_RemoteUrl(n) = RemoteFileurl
Else
bRepeat = False
For i = 1 To UBound(a_RemoteUrl)
If UCase(RemoteFileurl) = UCase(a_RemoteUrl(i)) Then
bRepeat = True
Exit For
End If
Next
If bRepeat = False Then
n = n + 1
Redim Preserve a_RemoteUrl(n)
a_RemoteUrl(n) = RemoteFileurl
End If
End If
Next

qiang

UID: 17966
帖子: 1
积分: 2
在线时间: 10 分钟

2^# qiang 发表于 2005-07-03 10:21

获取http 链接请使用 HTML.
获取http 链接请使用 HTML::LinkExtor 好像 img 的也可以获取。自己尝试一下吧。

http://search.cpan.org/~podmaster/HTML-LinkExtractor-0.13/LinkExtractor.pm

imxae

UID: 2790
帖子: 197
积分: 453
在线时间: 2 天 5 小时

3^# imxae 发表于 2005-07-03 18:37

看了下模块
看了下模块,功能强,但是比较复杂,有没有高手可以用正则过滤到数组中的?

imxae

UID: 2790
帖子: 197
积分: 453
在线时间: 2 天 5 小时

4^# imxae 发表于 2005-07-03 20:19

请教
http://search.cpan.org/~gaas/HTML-Parser-3.45/lib/HTML/LinkExtor.pm
使用了这个模块,取www.imx365.com的内容时发现有img 取得不对.出现了如下不全的理像

显示的内容,最后一行怪了...
http://www.imx365.com/images/rand_176/article/ssmx/skill/manimags.gif
http://www.imx365.com/images/rand_176/article/ssmx/skill/drawps.gif
http://www.im

qiang

UID: 17966
帖子: 1
积分: 2
在线时间: 10 分钟

5^# qiang 发表于 2005-07-03 22:14

把代码贴出来，这样别人才.
把代码贴出来，这样别人才可以帮你看看那里出错了。

nsnake

UID: 41482
帖子: 154
积分: 354
在线时间: 1 天 8 小时

6^# nsnake 发表于 2005-07-04 01:25

你加了print "Content-typ.
你加了print "Content-type: text/html\n\n"; 这段没

imxae

UID: 2790
帖子: 197
积分: 453
在线时间: 2 天 5 小时

7^# imxae 发表于 2005-07-08 10:21

我用的是LinkExtor.pm中的.
我用的是LinkExtor.pm中的示例
加了print "Content-type: text/html\n\n";这段