处理序列上的问题

处理序列上的问题

处理序列上的问题
请教一个问题
################################################################

>sacCer1.chr1 684 82 + 230208
CAUAUACUUACCACUCCAUUUAUAUACACUUAUGUCAAUAUUACAGAAAAAUCCCCACAAAAAU
.............................((((((....................................))))))..... ( -0.32)
>sacMik.contig_210 1483 81 - 1979
CGUAUACUAACCACUCAAUUUAUAUACACUUAUGUUACUAUUUCAGAAAAAUCACCACCAAAAU
.(((((..((........))..))))).....(((((.(((((.................)))))...)))))........ ( -1.63)
>sacPar.contig_38 7 82 - 73522
CAAAUAUCCACUACCCAUUUUAUAUGUACUAAUAUAAAUACCACCAAAUAAUCACCACUAAAAU
..................(((((((......)))))))............................................ ( -2.90)
>sacKud.Contig1996 601 81 + 21481
AAAUAUCCACCACCUAUUUUAUAUAUGCUAAUAUAUACACCACCAAAUAAUCACCACUAAAAUU
...................(((((((....)))))))............................................ (

我要得到第四行应该怎么办?CAUAUACUUACCACUCCAUUUAUAUACACUUAUGUCAAUAUUACAGAAAAAUCCCCACAAAAAU

正则.
我的程序,得不到第四行,请高手帮忙!(下一行怎么编?)
use strict;

open FILENAME, "1.txt" or die "cannot open the file:$!\n ";
open OUTPUT, ">RNAz.xls" or die "cannot write the file:$!\n ";

my $line;
my @array;
my @arrayA;



while($line=<FILENAME>)
{chomp;



if($line=~ />sacCer/)
{@array=(split /\s+/,$line);
$arrayA[0]=$array[0];
$arrayA[1]=$array[1];
$arrayA[2]=$array[2];
$arrayA[3]=$array[3];
$arrayA[4]=$array[4];

foreach my $element (@arrayA){
print OUTPUT "$element\t";
}
print OUTPUT "\n";
}
}

试试这个--#!/usr/bin/p.
试试这个
#!/usr/bin/perl
use strict;
use warnings;

open F, "$ARGV[0]" || die;
$/ = ">";
while (my $line = <F>){
my ($note, $sequence, $structure) = split /\n/, $line, 3;
if ($note =~ /sacCer1\.chr1/){
print $sequence;
}
}
待处理序列
########################### RNAz 1.0pre #############################

Sequences: 2
Columns: 497
Reading direction: forward
Mean pairwise identity: 49.70
Mean single sequence MFE: -24.16
Consensus MFE: -0.88
Energy contribution: 1.87
Covariance contribution: -2.75
Combinations/Pair: 1.64
Mean z-score: 3.62
Structure conservation index: 0.04
SVM decision value: -1.13
SVM RNA-class probability: 0.102343
Prediction: OTHER
WARNING: Mean pairwise identity too low.
WARNING: Mean z-score out of range.
WARNING: Sequence 1 too long.
WARNING: Sequence 2 too long.

######################################################################

>sacCer1.chr1 6 456 + 230208
CACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACAUCCUAACACUACCCUAACACAGCCCUAAUCUAACCCUGGCCAACCUGUCUCUCAACUUACCCUCCAUUACCCUGCCUCCACUCGUUACCCUGUCCCAUUCAACCAUACCACUCCGAACCACCAUCCAUCCCUCUACUUACUACCACUCACCCACCGUUACCCUCCAAUUACCCAUAUCCAACCCACUGCCACUUACCCUACCAUUACCCUACCAUCCACCAUGACCUACUCACCAUACUGUUCUUCUACCCACCAUAUUGAAACGCUAACAAAUGAUCGUAAAUAACACACACGUGCUUACCCUACCACUUUAUACCACCACCACAUGCCAUACUCACCCUCACUUGUAUACUGAUUUUACGUACGCACACGGAUGCUACAGUAUAUACCAUCUCAAACU
......................................................................((((...(((.............))).....((.....)).............................))))..((((..(((((.......................................................((((.....................................................(((.....)))...((((((.......((..(((.............)))..)).......)))..)))..))))......((((.(((.......(..(((((..............................)))))..).......))).))))..)))))..)))).................. ( -20.33)
>sacKud.Contig180 396 424 + 1084
CACACCCAGCCGUUCAUACCUAAUGUCAUUACACCAAACUCAUACUAUCUCACAUCUAACACUUCAUACCCCUUAUCUGCAUUCUAAUCAACAUAUUCAUCCACUCGUCAUUACCACACCUCCUUUAUAUCCUUUAAACACUACACUCUAUUCACCUACUAUAUACAAUAAUCGUAAACUAUAAACAUACACCACCCAUCUUACUUAGACCAUCACCCUAUAUCCAUCAUGUAUCCCGCCUCAUUAAACCGACUCUUCCCCAUUGAAUCAUAAUAUCAUAUCCAGCACAUAGCCACACGGACACUAUAGAAUACUGCAAUACCACACCUCAUCAUUAUCAUCACUGCUAUUAACAUCACUACCAUAUCACCAUGAUACUACCACUCACCUUGCUAUUACGCUAUCCACCAUUUGAACGGGCUGU
......((((((((((.......((((((................................................(((((((((............................................................................(((((......(((...........)))........(((.....)))....................))))).(((............((.((.........)).))...............((.....))....)))......)))))...))))..........................((.......))................))))))..............((......))...........))))).))))). ( -28.00)
>consensus
CACACCCACACACCCACACACAACACCAC____CACACCAAACCCACACA__CACACAUC_UAACACUAC______ACACCCCUAAUCUA_ACCCUAACCAACAUAUCCAUCAACU____CGCCAUUACCACACCUCC_________CCUAUACCAUUCAAACACAACACUCCA________________UCCACCUACUAC_ACACAACAACCGUAAACCACAAA_____CAUACACAACCCA_UCCCACUUA___GACCAUCACCCUAC_AUCCACCAUG___ACCGACUCACCAAACCG_CUCUUCCAC_______AUUGAA______________UCAUAA____AUAACAAACACAUACCCACACGAACACUAUAGAACACC_____ACCACACCCCAUAAUCACCAUCACUGCUAU__________________AC__UGAUACUAC_CACGCACACGGAUACUACACUAUACACCAUCUCAAAC_____U
....................................................................................((((........(((......)))............................................................................................................................................................................................................................................................((((......................................................)))).......................))))................................................ ( -0.88 = 1.87 + -2.75)

########################### RNAz 1.0pre #############################

Sequences: 2
Columns: 208
Reading direction: forward
Mean pairwise identity: 67.31
Mean single sequence MFE: -19.94
Consensus MFE: -8.36
Energy contribution: -8.36
Covariance contribution: 0.00
Combinations/Pair: 1.16
Mean z-score: 0.07
Structure conservation index: 0.42
SVM decision value: -5.18
SVM RNA-class probability: 0.000000
Prediction: OTHER

######################################################################

>sacCer1.chr1 462 205 + 230208
UACCCUACUCUCAGAUUCCACUUCACUCCAUGGCCCAUCUCUCACUGAAUCAGUACCAAAUGCACUCACAUCAUUAUGCACGGCACUUGCCUCAGCGGUCUAUACCCUGUGCCAUUUACCCAUAACGCCCAUCAUUAUCCACAUUUUGAUAUCUAUAUCUCAUUCGGCGGUCCCAAAUAUUGUAUAACUGCCCUUAAUACAUACG
............((((...............(((.........((((...))))...((((((((............((..(((....)))...))(((....)))..))).))))).........))).((((..((....))..))))))))...........((((((...............))))))............. ( -26.86)
>sacKud.Contig1996 396 188 + 21481
UACCCCACGUUUUCUCACCCCACCACAUGCUUUCCAUCUCUCAUUCAUCACAUCACUAAAUACGGCCCUUACCUCAGCGGUUUAUACCCUGUGCCAUUUGCCCAUAACACUCAUGAUUAUCCACUUUUUAAUAUCUAUAAUUAGUUAAACACUCCCAAAUAUCAUAUAAAUACUCUUAACUCCACGCU
............................((.................................(((........(((.(((....))))))........)))..........(((((..............((.(((....))).)).............)))))....................)). ( -13.02)
>consensus
UACCCCACGCUC___UCCCACCCCACCACAUG__GCCCAUCUCUCACUCA_________________UCACAUCACUAAACACGGCACUUACCUCAGCGGUCUAUACCCUGUGCCAUUUACCCAUAACACCCAUCAUUAUCCACAUUUUAAUAUCUAUAACUAAUUAAACACUCCCAAAUAUCAUAUAAAUACCCUUAA_UACACACG
............................................................................(((((..((((.......(((.(((....))))))))))....................((((.........))))...............................))))).................... ( -8.36 = -8.36 + 0.00)

########################### RNAz 1.0pre #############################

Sequences: 4
Columns: 82
Reading direction: forward
Mean pairwise identity: 75.61
Mean single sequence MFE: -2.01
Consensus MFE: 0.00
Energy contribution: 0.00
Covariance contribution: 0.00
Combinations/Pair: -1.#J
Mean z-score: 8.56
Structure conservation index: 0.00
SVM decision value: -0.53
SVM RNA-class probability: 0.279943
Prediction: OTHER
WARNING: Mean z-score out of range.

######################################################################

>sacCer1.chr1 684 82 + 230208
CAUAUACUUACCACUCCAUUUAUAUACACUUAUGUCAAUAUUACAGAAAAAUCCCCACAAAAAUCACCUAAACAUAAAAAUA
.............................((((((....................................))))))..... ( -0.32)
>sacMik.contig_210 1483 81 - 1979
CGUAUACUAACCACUCAAUUUAUAUACACUUAUGUUACUAUUUCAGAAAAAUCACCACCAAAAUAACCUAACACAGAAAUA
.(((((..((........))..))))).....(((((.(((((.................)))))...)))))........ ( -1.63)
>sacPar.contig_38 7 82 - 73522
CAAAUAUCCACUACCCAUUUUAUAUGUACUAAUAUAAAUACCACCAAAUAAUCACCACUAAAAUCACCUAAACAUAAAAAUA
..................(((((((......)))))))............................................ ( -2.90)
>sacKud.Contig1996 601 81 + 21481
AAAUAUCCACCACCUAUUUUAUAUAUGCUAAUAUAUACACCACCAAAUAAUCACCACUAAAAUUACCCAAACAUAAAAAUA
...................(((((((....)))))))............................................ ( -3.20)
>consensus
CAAAUACCCACCACCCAAUUUAUAUACACUAAUAUAAAUACCACAAAAAAAUCACCACUAAAAUCACCUAAACAUAAAAAUA
.................................................................................. ( 0.00 = 0.00 + 0.00)

########################### RNAz 1.0pre #############################

Sequences: 2
Columns: 139
Reading direction: forward
Mean pairwise identity: 68.35
Mean single sequence MFE: -23.15
Consensus MFE: -16.05
Energy contribution: -14.05
Covariance contribution: -2.00
Combinations/Pair: 1.37
Mean z-score: -0.25
Structure conservation index: 0.69
SVM decision value: -2.81
SVM RNA-class probability: 0.003716
Prediction: OTHER
WARNING: Sequence 2: Base composition out of range.

######################################################################

>sacCer1.chr1 1060 138 + 230208
AUGCAUCUUUAAUCUUGUAUGUGACACUACUCAUACGAAGGGACUAUAUCUAGUCAAGACGAUACUGUGAUAGGUACGUUAUUUAAUAGGAUCUAUAACGAAAUGUCAAAUAAUUUUACGGUAAUAUAACUUAUCAGC
..((.((((...(((((((((.(......).))))).))))(((((....)))))))))........((((((((.((((((............))))))...(((.(((....))))))........)))))))))) ( -23.00)
>sacMik.contig_1217 15872 136 + 22758
AUUUAUUUUCAUACACUUGUGUGAUGCUACAUAUAAGAAUAGAUUGUAUUUUGUUAAGAGGAUACAGUGAUAUAUACGUUACUUUUAUAGACUUUAUAAGGAAAUAUCUAAUCAUCUUAUUAUAUAAUGUAUUAAC
........((((((....))))))...(((((((((((...(((((((((((.(((((((..((.((((((......))))))....))..)))).))).)))))))..)))).))))))......)))))..... ( -23.30)
>consensus
AUGCAUCUUCAAACACGUAUGUGACACUACACAUAAGAAGAGACUAUAUCUAGUCAAGACGAUACAGUGAUAGAUACGUUA_UUUAAUAGAAUCUAUAACGAAAUAUCAAAUAAUCUUA___UAAUAUAACGUAUCAAC
....((((((.....((((((((......))))))))....(((((....)))))..))))))....((((((((.........((((((...)))))).((....)).....................)))))))).. (-16.05 = -14.05 + -2.00)

########################### RNAz 1.0pre #############################

Sequences: 3
Columns: 95
Reading direction: forward
Mean pairwise identity: 54.04
Mean single sequence MFE: -16.84
Consensus MFE: 0.00
Energy contribution: 0.00
Covariance contribution: 0.00
Combinations/Pair: -1.#J
Mean z-score: -0.39
Structure conservation index: 0.00
SVM decision value: -3.31
SVM RNA-class probability: 0.000005
Prediction: OTHER

######################################################################

>sacCer1.chr1 1342 88 + 230208
UUACGUGUCAAAAAAUGAGGGUCUCUAAAUGAGAGUUUGGUACCAUGACUUGUAACUCGCACUGCCCUGAUCUGCAAUCUUGUUCUUA
(((((.((((....((.(((.((((.....)))).))).))....)))).)))))...(((...........)))............. ( -15.40)
>sacPar.contig_450 7908 81 - 68823
CAUGUAGCAGGAGACGAAAAGUCUAUAUGUUUACUAUGAGUUGUGGCUCACAUAGUCCCUGAUCUACAACCUUUAAUCUUA
..(((((((((((((.....))))........(((((((((....)))))..)))).))))..)))))............. ( -19.40)
>sacBay.contig_426 5839 81 - 7472
UCACGGGACAAAACAUGAGGGUUCUAUUAUUGUGGCGGGUAGGCAAUUCGAGUAAACCCUGAUCUACAAACCUUAAUUGUA
............(((..((((((.......(((((.((((..((.......))..))))....)))))))))))...))). ( -15.71)
>consensus
U_ACGUGACAAAAAAUGAGG_GUCUAUA__U__________UACUAUGACUUGUAG____CUCGCAUAG_CCCUGAUCUACAAACUUUAAUCUUA
............................................................................................... ( 0.00 = 0.00 + 0.00)
结合你给的程序,我重新编写了一下,失败了,主要是不懂原理。能否再帮忙一下???
use strict;

open FILENAME, "RNAzchr1.txt" or die "cannot open the file:$!\n ";
open OUTPUT, ">test.xls" or die "cannot write the file:$!\n ";

my $line;
my @array;
my @arrayA;
my $seq;
my $i;
my $j;

print OUTPUT "Name Several Columns ReadingDirection MeanIdentity MeanMFE ConsensusMFE EnergyContribution CovarianceContribution Combinations/Pair MeanZ-score StructureConservation SVMValue RNAprobability Prediction Start Length GeneDirection SpeciesNum Sequence\n";

while($line=<FILENAME>)
{chomp;

if($line=~ /^(.*Sequences:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[1]=$array[1];}

if($line=~ /^(.*Columns:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[2]=$array[1];}

if($line=~ /^(.*Reading direction:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[3]=$array[1];}

if($line=~ /^(.*Mean pairwise identity:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[4]=$array[1];}

if($line=~ /^(.*Mean single sequence MFE:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[5]=$array[1];}

if($line=~ /^(.*Consensus MFE:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[6]=$array[1];}

if($line=~ /^(.*Energy contribution:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[7]=$array[1];}

if($line=~ /^(.*Covariance contribution:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[8]=$array[1];}

if($line=~ /^(.*Combinations.*)$/)
{@array=(split /:\s+/,$1);$arrayA[9]=$array[1];}

if($line=~ /^(.*Mean z-score:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[10]=$array[1];}

if($line=~ /^(.*Structure conservation.*)$/)
{@array=(split /:\s+/,$1);$arrayA[11]=$array[1];}

if($line=~ /^(.*SVM decision value:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[12]=$array[1];}

if($line=~ /^(.*SVM RNA-class probability:.*)$/)
{@array=(split /:\s+/,$1);$arrayA[13]=$array[1];}

if($line=~ /^(.*Prediction: .*)$/)
{@array=(split /:\s+/,$1);$arrayA[14]=$array[1];}


#$/ = ">";
#while (my $line = <FILENAME>)
if($line=~ />/) #修改了一下
{my ($note, $sequence, $structure) = split /\n/, $line, 3; #不懂这里“3”
if ($note =~ /sacCer1\.chr1/){
@array=(split /\s+/,$line);
$arrayA[0]=$array[0];
$arrayA[15]=$array[1];
$arrayA[16]=$array[2];
$arrayA[17]=$array[3];
$arrayA[18]=$array[4];
$arrayA[19]=$sequence;

foreach my $element (@arrayA){
print OUTPUT "$element\t";
}

print OUTPUT "\n";
}

}
}
[CCB]2[/CCB][CCB]2[/CCB][CCB]2[/CCB][CCB]2[/CCB][CCB]2[/CCB][color=red][/color][color=#DC143C][/color]
学习!.
不是很清楚你的要求。怎么.
不是很清楚你的要求。怎么能写出那么长的程序。。。

我的理解,逐行读入后

while (<IN>){
print unless /^[UACG]/;
}

应该就可以读出所有序列内容了。以上正则有一定漏洞,做完要手工check一遍的。

用shell里的正则就更简单了,如果没有记错的话用:
sed '/[UACG]/d' yourfile

#!/usr/bin/perl--use s.
#!/usr/bin/perl
use strict;
use warnings;

open F, "$ARGV[0]" || die;
$/ = ">";
while (my $line = <F>){
my ($note, $sequence, $structure) = split /\n/, $line, 3;
if ($note =~ /sacCer1\.chr1/){
print $sequence;
}
}

写的真好,启发很大,谢谢了。
#!/usr/bin/perl -w--us.
#!/usr/bin/perl -w
use strict;
open IN1,"rna.txt";
my @whole=<IN1>;
chomp @whole;
my @seq=();
for(my $i=0;$i<scalar @whole;$i++)
{
if($whole[$i]=~/^[AUGC]/)
{
push(@seq,$whole[$i])
}
}
print $seq[0];