一个很奇怪的问题.......

xkingkiller

UID: 38632
帖子: 186
积分: 427
在线时间: 1 天 23 小时

1^# xkingkiller 发表于 2008-09-08 10:13

一个很奇怪的问题.......

一个很奇怪的问题.......
代码如下, 读取一个文件里的每一行,再作点操作:
#!/usr/bin/perl
use strict;
my $pName;
my $filename;
my $newFile;
my $fileListName = $ARGV[0];
if($fileListName eq ""){
     print "Input a srcListFileName. e.g. demodata.scr\n"     ;
     exit(1);
}
open(FILELIST,"$fileListName") || die "Failed to load $fileListName";
my $icount = 0;
my $command;
while(<FILELIST>){
          $filename = $_;
          chomp($filename);
          $filename =~ s|\\|/|g;
          $newFile = $filename;
          $pName = $filename;
          $pName =~ s|bmp\\|head_|;
          $pName =~ s|\\|_|;
          $pName =~ s|\.bmp||;
          $newFile =~ s|\.bmp|_split\.bmp|;
          $command =join(" ", $newFile,$pName);
          print "$command\n";
}
close(FILELIST);
输入的文件内容如下:
bmp\Head\sc_02_03.bmp
bmp\rrrrrrrrr\OnOff.bmp
wav\xxx.wav
但输出结果非常奇怪:
bmp/Head/sc_02_03split.bmp
bmp/rrrrrrrrr/OnOffsplit.bmp
wav/xxx.wav wav/xxx.wav
好像只有最后一行是正确的, 前两行,好像加入$pName的时候是在字符串开始加入的

beckheng

UID: 3408
帖子: 87
积分: 200
在线时间: 10 小时

2^# beckheng 发表于 2008-09-11 14:02

不奇怪了.chomp又不会把\r.
不奇怪了.chomp又不会把\r字符给去掉的.
所以我现在都改为
s/\r?\n$//;

bravesoul

UID: 12638
帖子: 49
积分: 112
在线时间: 3 小时

3^# bravesoul 发表于 2008-09-12 23:55

chomp只处理\n，但这应该不是问题所在，Perl在处理不同编码方式的文本时，用了一个简单而有效的方法，即内部使用一种统一的通用而又高效的编码方式（选择了UTF-8），在IO层插入PerlIO Layer实现编码方式转换或其他转换，详细的介绍可以查看相应的文档。这样的好处很明显，不管源或目标文本采用什么编码方式，或处在什么样的平台下，内部的文本格式始终是统一的，这样一方面可以减少频繁的转换提高效率，另一方面很多API，正则表达式都可以做到与文本编码方式无关。众所周知行结束符由于各种原因，各个平台上是不一样的，PerlIO layer ':crlf'就是用来处理这个问题的，如在windows上输入时"\r\n"转为"\n"，而输出时正好相反。但可惜的是，Perl windows版本中用起来有点问题，主要是各个PerlIO layers的排列顺序引起的，默认的顺序是':unix:crlf'.":encoding($enc)"，其中$enc是文本使用的编码方式，而正确的顺序应该是':unix'.":encoding($enc)".':crlf'，而我更喜欢用':perlio:raw'.":encoding($enc)".':crlf:utf8'。以下是代码供参考：

use Encode;
use Encode::Guess;

my %opt = (
  encode     =>     'UTF-16',
     buf_sz     =>     1024,          # the buffer size when is used to guess file encoding
);

sub guess_file_encoding($);
sub open_file($$;$);

# main routine

Encode::Guess->add_suspects( qw/cp936/ );          # suspect encodings

# guess file encoding first
my $src           = shift or die "input source file";
my $decoding = guess_file_encoding( $src );
my $FH           = open_file( $src, '<', $decoding );
# do what ever you want
close $FH;

# sub definitions

sub guess_file_encoding($) {
     my $file = shift or die "input file first\n$!";

  open my $FH, '<', $file or die "$!";
  my @default_layers = PerlIO::get_layers($FH);
  binmode($FH, ':pop') if $default_layers[-1] eq 'crlf';
  binmode($FH, ':raw');
     my ( $buf, $buf_sz );
     $buf_sz = $opt{buf_sz} < -s $file ? $opt{buf_sz} : -s _;
     read( $FH, $buf, $buf_sz );
     my $decoder = Encode::Guess->guess( $buf );
     die $decoder unless ref($decoder);
     close $FH;

     return $decoder->name;
}

sub open_file ($$;$) {
  my ( $file, $mode, $encoding ) = @_;

  $encoding ||= $opt{encode};    # default
  unless ( Encode::perlio_ok($encoding) ) {
    die "the target encoding: $encoding is not support in PerlIO\n";
  }

  open my $FH, $mode, $file or die "$!";

  my @default_layers = PerlIO::get_layers($FH);
  # caution: we need to pop the :crlf layer, because if we push :raw
  # layer the :crlf is only disabled, and if we need to push :crlf
  # on the top of the stack, it will not be what we want, the previous
  # :crlf is enabled again, and the later one is not pushed.
  binmode($FH, ':pop') if $default_layers[-1] eq 'crlf';

  my $add_bom_flag = 0;
  if ( $^O eq 'MSWin32'
     && ( $mode eq '>' || $mode eq 'w' ) # we only play the trick for write only
     && $encoding eq 'UTF-16' ) {
    # UTF-16 in win32 perl will make BOM to be BE
    # which is not convenient, so we play a trick to solve it
    $encoding = 'UTF-16LE';
    $add_bom_flag = 1;
  }

  my $new_layers = ':perlio:raw' . ":encoding($encoding)" . ':crlf:utf8';

  binmode($FH, $new_layers);
#  print "$_\t" foreach PerlIO::get_layers($FH);

  print $FH "\x{feff}" if $add_bom_flag;    # BOM

  return $FH;
}