帮忙看看我的报表程序?在线等,急!

帮忙看看我的报表程序?在线等,急!

帮忙看看我的报表程序?在线等,急!
今有这样一个文件,里面有很多<sequence...>....</sequence>这样的重复部分,今要取得sequence id, cloneid,等几项,并产生报表:
[quote]<?xml version="1.0"?>
<!DOCTYPE maxml-sequences SYSTEM "http://fantom.gsc.riken.go.jp/maxml/maxml.dtd">
<maxml-sequences>
<sequence id="G530106A19">
<altid type="cloneid">G530106A19</altid>
<altid type="seqid">106195</altid>
<altid type="rearrayid">PS00034I09</altid>
<altid type="accession">AK149923</altid>
<altid type="estaccession">BY484353</altid>
<seqid>106195</seqid>
<cloneid>G530106A19</cloneid>
<accession>AK149923</accession>
<modified_time>Jan 31 2005</modified_time>
<annotations>
<annotation>
<qualifier>cds_location</qualifier>
<anntext>No CDS</anntext>
<evidence>FANTOM3-Unconfirmed</evidence>
</annotation>
<annotation>
<qualifier>transcript_desc_name</qualifier>
<anntext>unclassifiable</anntext>
<evidence></evidence>
</annotation>
</annotations>
</sequence>
<sequence id="D030025E18">
<altid type="cloneid">D030025E18</altid>
<altid type="seqid">56468</altid>
<altid type="rearrayid">PX00180G19</altid>
<altid type="accession">AK083478</altid>
<altid type="estaccession">BB441051 BB655048 BB441051</altid>
<altid type="f2seqid">56468</altid>
<altid type="mgiclone">MGI:2418721</altid>
<altid type="mgimarker">MGI:1345279</altid>
<seqid>56468</seqid>
<cloneid>D030025E18</cloneid>
<accession>AK083478</accession>
<modified_time>Jan 31 2005</modified_time>
<annotations>
<annotation>
<qualifier>cds_location</qualifier>
<anntext>79..1764</anntext>
<evidence>FANTOM2</evidence>
</annotation>
<annotation>
<qualifier>cds_gap</qualifier>
<anntext>M1686</anntext>
<evidence>FANTOM2</evidence>
</annotation>
<annotation>
<qualifier>transcript_desc_name</qualifier>
<anntext>solute carrier family 11 (proton-coupled divalent metal ion transporters), member 2
</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term=NM_008732&doptcmdl=G
enBank">GB|NM_008732</datasrc>
<evidence>BLASTN, 99%, match=1689</evidence>
</annotation>
<annotation>
<qualifier>transcript_desc_symbol</qualifier>
<anntext>Slc11a2</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term=NM_008732&doptcmdl=G
enBank">GB|NM_008732</datasrc>
<evidence>BLASTN, 99%, match=1689</evidence>
</annotation>
<annotation>
<qualifier>transcript_desc_synonym</qualifier>
<anntext>mk van Nramp2</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term=NM_008732&doptcmdl=G
:
enBank">GB|NM_008732</datasrc>
<evidence>BLASTN, 99%, match=1689</evidence>
</annotation>
<annotation>
<qualifier>gene_ontology</qualifier>
<anntext>GO:0016021</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<evidence>IEA|BLASTN/MGD</evidence>
</annotation>
<annotation>
<qualifier>gene_ontology</qualifier>
<anntext>GO:0016020</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<evidence>IEA|BLASTN/MGD</evidence>
</annotation>
<annotation>
<qualifier>gene_ontology</qualifier>
<anntext>GO:0005381</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<evidence>IEA|BLASTN/MGD</evidence>
</annotation>
<annotation>
<qualifier>gene_ontology</qualifier>
<anntext>GO:0005215</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<evidence>IEA|BLASTN/MGD</evidence>
</annotation>
<annotation>
<qualifier>gene_ontology</qualifier>
<anntext>GO:0006810</anntext>
<datasrc xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://w
ww.informatics.jax.org/searches/accession_report.cgi?id=MGI:1345279">MGD|MGI:1345279</datasrc>
<evidence>IEA|BLASTN/MGD</evidence>
</annotation>
</annotations>
................................................[/quote]
我的程序如下,想生成报表格式,可出来很多重复的部分(每一列中),哪里有错了?本想一个序列各条目对应一行的?[code]#!/usr/bin/perl
if(!@ARGV){print "Usage:$0 input_file.\n";exit;}
format STDOUT=
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
@<<<<<<<<<< @<<<<<<<<<< @<<<<<<<<<@<<<<<<<<<<< @<<<<<<<<<<<< @<<<<<<<<<<<<<<
$sequenceid, $cloneid, $seqid, $rearrayid $accession $estaccession
.

open(IN,"$ARGV[0]") or die "$!";
while($line=<IN>){
if($line=~/<sequence/){
($sequenceid)=$line=~/"(\S+)"/;
write;
}elsif($line=~/type="cloneid"/){
($cloneid)=$line=~/>(\S+)</;
write;
}elsif($line=~/type="seqid"/){
($seqid)=$line=~/>(\S+)</;
write;
}elsif($line=~/type="rearrayid"/){
($rearrayid)=$line=~/>(\S+)</;
write;
}elsif($line=~/type="accession"/){
($accession)=$line=~/>(\S+)</;
write;
}elsif($line=~/type="estaccession"/){
($estaccession)=$line=~/>(\S+)</;
write;
}else{;
}
}
close IN or die "$!";
[/code]多谢!
仔细想想 while 里面的 wr.
仔细想想 while 里面的 write 们? 逻辑上就是重复的,如果保证每个 sequence 标记对中都有sequence id, cloneid,seqid,rearrayid,accession,estaccession,则可以简单修改一下程序为
[quote]
if(!@ARGV){print "Usage:$0 input_file.\n";exit;}
format STDOUT=
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
@<<<<<<<<<< @<<<<<<<<<< @<<<<<<<<<@<<<<<<<<<<< @<<<<<<<<<<<< @<<<<<<<<<<<<<<
$sequenceid, $cloneid, $seqid, $rearrayid $accession $estaccession
.

open(IN,"$ARGV[0]") or die "$!";
while($line=<IN>){
if($line=~/<sequence/){
($sequenceid)=$line=~/"(\S+)"/;
}elsif($line=~/type="cloneid"/){
($cloneid)=$line=~/>(\S+)</;
}elsif($line=~/type="seqid"/){
($seqid)=$line=~/>(\S+)</;
}elsif($line=~/type="rearrayid"/){
($rearrayid)=$line=~/>(\S+)</;
}elsif($line=~/type="accession"/){
($accession)=$line=~/>(\S+)</;
}elsif($line=~/type="estaccession"/){
($estaccession)=$line=~/>(\S+)</;
[color=red]write;[/color]
}else{;
}
}
close IN or die "$!";[/quote]
不过实际上好点的方法是在 while 的每一次循环中将 <sequence>...</sequence> 做为一块读进一个变量,然后用正则匹配
这种处理 xml 文档的事情其实交给处理 xml 的模块来做最简单了,比如 XML::Simple 模块
你可以在 search.cpan.org 上找到该模块的文档




   

是的,每个<sequence>对里都是相对格式的内容,如此改过,OK了?还有什么好方法吗,让程序更简洁些
输入出如下[quote]
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
G530106A19 G530106A19 106195 PS00034I09 AK149923 BY484353
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
D030025E18 D030025E18 56468 PX00180G19 AK083478
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
3110050P16 3110050P16 12921 ZX00072K12 AK083478
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
9130012A08 9130012A08 89261 PX00026G02 AK033581
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
6530403F14 6530403F14 26364 PX00048P22 AK032655
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
7120469O16 7120469O16 107095 PX00707A11 AK148977 BB695647
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
D030046N05 D030046N05 95663 PX00698H21 AK141731 BB444619
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
2810047I04 2810047I04 12005 ZX00065L21 AK012917
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
I830029K16 I830029K16 84214 PL00055J06 AK151418
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
G530135C07 G530135C07 94343 PS00009P08 AK150129 BY486226
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
B830039K22 B830039K22 50063 PX00073M17 AK140457 BB335572
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
I530027E06 I530027E06 82138 PL00049I11 AK167654
------------------------------------------------------------------------------
sequence id | cloneid | seqid | rearrayid | accession | estaccession
F830004P11 F830004P11 99695 PL00004M12 AK156032
[/quote]不用XML::Simple就没别的办法吗?请赐教?因为我经常要处理这类数据,有些并不是XML格式的