求助-如何去掉FASTA格式中的说明字符呢

求助-如何去掉FASTA格式中的说明字符呢

求助-如何去掉FASTA格式中的说明字符呢
各位大侠能否帮忙用PERL写一个小程序,去除fasta文件中基因序列前的那些说明文字呢,然后得到一个只有基因序列(ATCG等)的文件呢
比如文件名为NC_002745.fna
>gi|72282330|gb|DT053425.1|DT053425 COT_FQ_B08 Fiber and Ovule of Xu-142 Lambda Zap Express Library Gossypium hirsutum cDNA 3' similar to (AF286647) cinnamate-4-hydroxylase [Gossypium arboreum], mRNA sequence
GGAAGGGAAAATATTAATCTTATTGCAAATCTTGATTGATTCATGGATTGCGGTTGTCACACCTCACAAATATTTCTCAACTGAATAATGTTTCATACTGTCTTTCAAGAGTTCACTAAAAGTGGATAGAAGAGAACAAGTTAAAATTGCCTTGGCTTAGCAACAATGGTGGAATGCTTCAAAATATGAAGACTGAACTGTCCCTTTCTCCGTGGTATCAATTTGAGATTGCCCAGGGGGAGGCAAGAGCTCAAAATTCTGTACCAAACGACCCAAAGTAATACCAAGGATGGGCAATGCAAGAATAATTCCTGGGCAACTTCTTCTCCCCACGCCAAAGGGGAGGTAGCGGAAATCATTGCCGTTGGCCTCAACCTTGGCTTCCTCTTCAAAGAACCTTTCAGGCCTAAATTCTTCGGGATTTTTCCAGTTAGCAGGGGTTGTTGGCAAGCCACACATGCATTTACCAAGATTTTGCCTCTCAGCAGGGATATCATAGCCACCCAATTTCGGCATCATGCAGGTTCATGTGGGGCACGAGTAGAGGAAT

转化成为:
GGAAGGGAAAATATTAATCTTATTGCAAATCTTGATTGATTCATGGATTGCGGTTGTCACACCTCACAAATATTTCTCAACTGAATAATGTTTCATACTGTCTTTCAAGAGTTCACTAAAAGTGGATAGAAGAGAACAAGTTAAAATTGCCTTGGCTTAGCAACAATGGTGGAATGCTTCAAAATATGAAGACTGAACTGTCCCTTTCTCCGTGGTATCAATTTGAGATTGCCCAGGGGGAGGCAAGAGCTCAAAATTCTGTACCAAACGACCCAAAGTAATACCAAGGATGGGCAATGCAAGAATAATTCCTGGGCAACTTCTTCTCCCCACGCCAAAGGGGAGGTAGCGGAAATCATTGCCGTTGGCCTCAACCTTGGCTTCCTCTTCAAAGAACCTTTCAGGCCTAAATTCTTCGGGATTTTTCCAGTTAGCAGGGGTTGTTGGCAAGCCACACATGCATTTACCAAGATTTTGCCTCTCAGCAGGGATATCATAGCCACCCAATTTCGGCATCATGCAGGTTCATGTGGGGCACGAGTAGAGGAAT
的文件
 偶是菜鸟,先谢谢各位达人的帮助了




   

perl -ne 'print if /^[^&.
perl -ne 'print if /^[^>]/;' NC_002745.fna
问题已经解决了
luoviolet 朋友,你的方法好象不行的,
不过得到你和其他人的热心帮助, 最终还是解决了问题

open INPUT, 'NC_002745.fna' or die $!;
open STDOUT, '>result.fna' or die $!;
while(<INPUT>) {
chomp;
s/^\s+|\s+$//g;
next unless /^[ACTG]+$/;
print $_, "\n";
}
close STDOUT;

希望给其他人也有帮助!
close INPUT;