统计文件中单词的出现频率

kgo_yoi

UID: 16096
帖子: 3
积分: 6
在线时间: 10 分钟

1^# kgo_yoi 发表于 2007-09-14 00:47

统计文件中单词的出现频率

这是我在http://www2.matrix.org.cn/的论坛（一个关于php?name=Java" onclick="tagshow(event)" class="t_tag">Java的论坛）遇到的一个问题，使用Java解决初步估计要50多行代码，自定义3个Classes。而用Ruby解决，只用了18行，自定义0个Class！。其实无论使用Java还是Ruby，我都没有完全解决问题，凭我的能力无法解决这个问题。

URL: http://www2.matrix.org.cn/thread.shtml?topicId=318049fc-5ea8-11dc-b06d-09b637715141&forumId=19&fid=19

问题：

统计输入文件中出现的不同的单词个数以及每个单词出现的频率，并且将这些单词按照词典顺序排列好输出到文件中。例如：
样例输入文件（word.in）
This is a book.Its name is “C Programming”.
输出：
结果应该存放在一个文件中，该文件中的第一行为不同的单词个数，从第二行开始则为每一个单词和其相应出现的频率，单词与频率数之间用空格符分割，单词需按字典顺序排列。
样例输出文件（word.out）
9
a 1
book 1
c 1
is 1
its 1
name 1
programming 1
this 1
备注：
1、单词和单词之间的分隔符为：空白符、回车符、TAB、&
2、单词中如果出现连字符，则认为是一个单词，比如：double-track
3、注意处理由单引号组成的缩写词，比如：he’s （这一点还没有实现，因为不通过上下文和语义，无法判断he's是he is，he was还是he has）
4、单词不区分大小写

输入文件：
A class can inherit or derive characteristics from another class. That means that a
child class or subclass can inherit the methods and data from a parent class. This
parent is also referred to as the superclass. This parent-child chain forms a
hierarchy of classes, with the base class at the root, top, or base of the hierarchy.
For Ruby, the base class of the hierarchy is Object.
Ruby-on-Rails
Ruby
Tom&Jerry

Ruby实现

[Copy to clipboard] [ - ]

xavier

UID: 28943
帖子: 155
积分: 356
在线时间: 1 天 9 小时

2^# xavier 发表于 2007-09-14 20:11

哈哈，挺好。能不能支持中文？这个有点难度了吧

admin

UID: 6902
帖子: 131
积分: 301
在线时间: 23 小时

3^# admin 发表于 2007-09-14 22:49

Ruby Cookbook上的一个案例。