制作词汇表的一个小脚本
cobrawgl
|
1#
cobrawgl 发表于 2008-12-06 12:35
制作词汇表的一个小脚本
这是用来做 html 格式词汇表的小脚本,把生成的文件放到 psp 上看,很舒服 :)
由于字数限制,把数据删掉了不少,可以自己到网上找 http://web.iciba.com/gxch/index.html 那个 get_imgs 本来是想把音标弄下来,效果不理想,不用了。。。 :( -------- gre_words.pl --------------- #!/usr/bin/perl use strict; use warnings; use IO::File; use LWP::Simple; use HTML::Template; my @words = get_words(); my $count = 0; my @file_name = map {'list' . $_ . '.html'} ('1' .. '40'); my $file_count = 0; my $temp = 0; create_index(); my @lists = (); for (@words) { if ($count > 105 || $temp == $#words+1) { my $template = HTML::Template->new(filename => 'words.tpl'); $template->param(file => $file_name[$file_count]); $template->param(words => \@lists); IO::File->new(">$file_name[$file_count]")->print($template->output) or die; $file_count += 1; $count = 0; @lists = (); } $count += 1; my $content = get_content($_); push @lists, {word => "$content"}; #get_imgs($content); # --> useless ... :( } sub get_words { my %words; while (<DATA>) { chomp; $words{$_} = 1 for ((split)); } return sort keys %words; } sub create_index { my $template = HTML::Template->new(filename => 'index.tpl'); my @lists = (); for ('1' .. '40') { push @lists, {name => 'Word List ' . $_, file => 'list' . $_ . '.html'}; } $template->param(lists => \@lists); IO::File->new(">index.html")->print($template->output) or die; } sub get_content { my $word = shift; my $content = get("http://www.baidu.com/baidu?ie=gb2312&cl=3&ct=1048576&word=$word") or die; $content =~ m{(<OL>.*</OL>)}sim; my $t = $1; $t =~ s{<SPAN.*?</SPAN>}{}simg; $t =~ s{<BR>KK.*?</DIV>}{</DIV>}simg; $t =~ s{<BR>以上结果.*</OL>}{</OL>}sim; return $t; } sub get_imgs { my $t = shift; my @imgs = $t =~ m{src="(.*?)"}mg; my $base = 'http://www.baidu.com/'; for (@imgs) { #next if -e; m{(IMAGES/.*?)/}; mkdir $1 unless -d $1; m{(IMAGES/.*)/}; mkdir $1 unless -d $1; my $content = get($base . $_) or die; IO::File->new(">$_")->print($content) or die; } } __DATA__ Ford Sue abate abbreviation abhor abominable abound abridge abrupt absolve abstain abstruse accede acceleration accentuate acclaim accomplice accountant acquiesce activate adamant adaptable adept adjacent adjoin admonish adolescence adore adorn adroit advent adversity aerial aesthetic affected affectionate affiliate affinity affirm afflict affluent aggravate aggregate agitate agitation ailment aisle alienate allegiance allergic alleviate allude allure aloof alteration aluminium amalgamate amass ambiguity ambition ambivalent amenable amend amenity amiable amicable amplify analogous anarchy anathema ancillary anecdote anguish animosity annex annihilate annul anomalous anomaly antagonism antagonist anthem anthropology antibiotic apathetic apathy apparel applaud appraisal appreciable apprehension apprehensive apprentice apt aptitude aquatic arable arc archaeology archives ardent ardor arduous arena arid aristocrat aromatic arrogant articulate artillery as yet ascend ascertain ascetic aspiration aspire aspirin assail asset assiduous assimilate assuage astute asylum atlas attenuate attest attic auction audacious audible audit auditorium augment auspicious austerity authoritative authorize autocrat autonomous autonomy auxiliary avail avalanche avant-garde avarice avenge averse avert avid avow awe awry azure bacteria bacterium badge badger badminton baffle bait bald ballad ballet balmy banal banish bankruptcy barbarous barbecue barge barometer barren barricade barter batch beacon beguile bellicose benefactor benevolent benign bent berth beset besiege bestow beverage bibliography bilingual bizarre blackmail blanch bland blaze bleach bleak blink blithe blot blunder blunt blur boisterous boost booth botanical botany bouquet boycott brawl breach breakdown bribe bridle brink brisk brittle brochure broil brood brook browse brusque buckle bulge bully buoyant bureaucracy burgeon burial burrow bust bustle buzz bypass ------------- header.tpl -------------- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML> <HEAD> <meta http-equiv="content-type" content="text/html;charset=gb2312"> <TITLE> GRE 词汇表 </TITLE> <style type="text/css"> .ptitle {margin-top:3px; font-weight:bold} .pcixin {margin - top : 3px; color = #FF0000} .pexplain {margin-top:3px; margin-left:20px} .pnewword {margin-top:3px; margin-right:10px; font-size:12; color:#9d0006; font-weight:bold} .peng {margin-top:3px; margin-left:40px; font-size:16; color:#0000A0; font-weight:bold} .pchi {margin-top:3px; margin-left:40px; font-size:16; color:#800040} font.engi {color:#FF00FF; font-style:italic} font.chinese {font-weight:normal; font-size:24; color:#008080} #ft{clear:both;line-height:20px;background:#E6E6E6;text-align:center} #ft,#ft *{color:#77C;font-size:12px;font-family:Arial} #ft span{color:#666} body{margin-bottom:0} </style> </HEAD> <BODY> ------------- footer.tpl ----------------- </BODY> </HTML> ----------- index.tpl ---------------- <tmpl_include name="header.tpl"> <h2>GRE 词汇表</h2> <hr /> <tmpl_loop name="lists"> <a href="<tmpl_var name=file>"><tmpl_var name=name></a><br /> </tmpl_loop> <hr /> <h2>GRE 词汇表</h2> <tmpl_include name="footer.tpl"> ----------- words.tpl ----------------- <tmpl_include name="header.tpl"> <h2>GRE 词汇表</h2> <hr /> <a href="index.html">主目录</a><center><tmpl_var name="file"></center> <hr /> <tmpl_loop name="words"> <tmpl_var name="word"> <hr /> </tmpl_loop> <a href="index.html">主目录</a><center><tmpl_var name="file"></center> <hr /> <h2>GRE 词汇表</h2> <tmpl_include name="footer.tpl"> |