帮忙优化一下程序

hitsubunnu

UID: 19600
帖子: 1
积分: 2
在线时间: 10 分钟

1^# hitsubunnu 发表于 2008-03-05 17:55

帮忙优化一下程序

处理一个文件大概有1万2千行有很多行的内容是重复的

我用了个脚本实现去掉重复的行（即有2个以上的只保留一行）我虽然实现了但是效率很差运行了n长时间才出结果

能帮忙优化一下吗

[Copy to clipboard] [ - ]

CODE:

#!/usr/bin/perl
my @array;
my @newarray;
my $str;

open(FF,"data.dat");
while(<FF>){
chomp;
my ($t1,$t2) = split(/\t/);
push @array,"$t1\n";
}
close(FF);

foreach(0..12000){
$str = shift @array;
my $leng = @array;
foreach(0..$leng){
if($str eq $array[$_]){
$array[$_] = "";
}
}
push (@newarray,$str);
}

open(FF,">newdata.dat");
print FF @newarray;
close(FF);

scuhkr

UID: 28347
帖子: 1
积分: 2
在线时间: 10 分钟

2^# scuhkr 发表于 2008-03-05 18:55

是学习还是实际运用？
实际运用的话，试试这个

[Copy to clipboard] [ - ]

CODE:

sort data.dat | uniq > newdata.dat

Lonki

UID: 21498
帖子: 1
积分: 2
在线时间: 10 分钟

3^# Lonki 发表于 2008-03-05 18:57

1.2W行还好, hash.

Lonki

UID: 21498
帖子: 1
积分: 2
在线时间: 10 分钟

4^# Lonki 发表于 2008-03-05 19:00

QUOTE:

原帖由 scuhkr 于 2008-3-5 18:55 发表
是学习还是实际运用？
实际运用的话，试试这个

sort data.dat | uniq > newdata.dat

保留原顺序?

churchmice

UID: 31829
帖子: 2
积分: 4
在线时间: 10 分钟

5^# churchmice 发表于 2008-03-05 19:41

QUOTE:

原帖由 hitsubunnu 于 2008-3-5 17:55 发表
处理一个文件大概有1万2千行有很多行的内容是重复的

我用了个脚本实现去掉重复的行（即有2个以上的只保留一行）我虽然实现了但是效率很差运行了n长时间才出结果

能帮忙优化一下吗

#!/usr/bin/ ...

try the following code

[Copy to clipboard] [ - ]

CODE:

#!/usr/bin/perl
use strict;
use warnings;
my %unique;
$\ = "\n";
$^I = ".bak"; #make a backup of the orginal file
while(<>){
chomp;
next if $unique{$_};
$unique{$_} = 1;
print ;
}

运行结果

QUOTE:

<lig@other-server:~/chinaunix>$ cat data
one    two
two    one
this is
that is
two    one
<lig@other-server:~/chinaunix>$ ./optimize data
<lig@other-server:~/chinaunix>$ cat data
one    two
two    one
this is
that is
<lig@other-server:~/chinaunix>$ cat data.bak
one    two
two    one
this is
that is
two    one

原文件备份为data.bak

hitsubunnu

UID: 19600
帖子: 1
积分: 2
在线时间: 10 分钟

6^# hitsubunnu 发表于 2008-03-06 08:58

谢谢各位！