perl能否处理UCS2编码的文件?

perl能否处理UCS2编码的文件?

perl能否处理UCS2编码的文件?
有个UCS2编码的文件,需要按正则表达式替换内容,不知Perl是否能够实现?要安装特别的组件吗?谢谢!
应该不需要的----你看.
应该不需要的

你看看perluniintro - Perl Unicode introduction 文档吧
可以的
Perl内部使用的是一种类似utf-8的编码方式,通过perlio layer,实现外部编码和内部编码的透明转换,而正则表达式是基于内部编码的,所以UCS2是完全支持的,只不过win32下面对UTF-16的格式会有点问题,只支持UTF-16BE和UTF-16LE。而且要注意crlf的转换,如下方式open文件就没有问题了:
my $new_layers = ':perlio:raw' . ":encoding($encoding)" . 'crlf:utf8';
binmode($FH, $new_layers);
这是我以前写的些代码,可以参考一下
use Encode;
use Encode::Guess;

my %opt = (
encode => 'UTF-16',
buf_sz => 1024, # the buffer size when is used to guess file encoding
);

sub guess_file_encoding($);
sub convert($$;$);
sub open_file($$;$);

sub convert ($$;$) {
my $src_file = shift or die "input the file to convert encoding\n";
my $encoding = shift or die "input the target encoding\n";
my $dest_file = shift or ( my $dest_file = $src_file ) =~ s{\.}{_$encoding\.}xms;

# guess file encoding first
my $decoding = guess_file_encoding( $src_file );

my $FH_IN = open_file( $src_file, '<', $decoding );
my $FH_OUT = open_file( $dest_file, '>', $encoding );

while ( <$FH_IN> ) {
print $FH_OUT $_;
}

close $FH_OUT;
close $FH_IN;
}

sub guess_file_encoding($) {
my $file = shift or die "input file first\n$!";

open my $FH, '<', $file or die "$!";
my @default_layers = PerlIO::get_layers($FH);
binmode($FH, ':pop') if $default_layers[-1] eq 'crlf';
binmode($FH, ':raw');
my ( $buf, $buf_sz );
$buf_sz = $opt{buf_sz} < -s $file ? $opt{buf_sz} : -s _;
read( $FH, $buf, $buf_sz );
my $decoder = Encode::Guess->guess( $buf );
die $decoder unless ref($decoder);
close $FH;

return $decoder->name;
}

sub open_file ($$;$) {
my ( $file, $mode, $encoding ) = @_;

$encoding ||= $opt{encode}; # default
unless ( Encode::perlio_ok($encoding) ) {
die "the target encoding: $encoding is not support in PerlIO\n";
}

open my $FH, $mode, $file or die "$!";

my @default_layers = PerlIO::get_layers($FH);
binmode($FH, ':pop') if $default_layers[-1] eq 'crlf';

my $add_bom_flag = 0;
if ( $^O eq 'MSWin32'
&& ( $mode eq '>' || $mode eq 'w' ) # we only play the trick for write only
&& $encoding eq 'UTF-16' ) {
# UTF-16 in win32 perl will mistakenly make BOM to be BE
# which is not convenient, so we make a trick to solve it
$encoding = 'UTF-16LE';
$add_bom_flag = 1;
}

my $new_layers = ':perlio:raw' . ":encoding($encoding)" . 'crlf:utf8';

binmode($FH, $new_layers);
# print "$_\t" foreach PerlIO::get_layers($FH);

print $FH "\x{feff}" if $add_bom_flag; # BOM

return $FH;
}
非常感谢bravesoul 的帮助.
非常感谢bravesoul 的帮助!
我的确遇到了crlf的问题,采用你提供的方法后就可以正常读写了。
只是几个子函数的参数写法,我没明白:
其中的"$"和"$$;$"各有什么含义?
it's function prototyping
这是原型声明,目的是为了在编译期间进行函数的参数检查,以防止函数被错误的调用,尽早发现和解决问题。$代表该参数是scalar类型,$$;$代表前两个参数是必须的,均是scalar类型,而第三个参数是可选的,也是scalar类型。具体帮助可以参看perlsub
找到了,谢谢!.