求助(在线等)

zzllabcd

UID: 25972
帖子: 1
积分: 2
在线时间: 10 分钟

1^# zzllabcd 发表于 2008-06-30 13:49

求助(在线等)

刚开始学python,写了个简单的小程序,想练练正则表达式,但是出了点问题,请高手帮忙看看是哪里出了问题
import urllib,re
url = 'http://www.24-hotel.com.cn'
file = urllib.urlopen(url, proxies = None)

source = file.read()
source = source[source.find('<option value=012330>'):source.find('</select>')]
out = open('b.txt', 'w')
out.write(source)
out.close()
f = open('b.txt', 'r')
s = f.read()
numlist = re.findall(r'\d{6}', s)
citylist = re.findall(r'[^<]/(.*?)<', s)
print city
print source

在我输出city的时候,抓到的source里面的汉字都变成"\xd7\xcd\xb2\xa9"种转过码的了,是我findall里面没有加flag吗?还是我的字符集的问题?我系统的字符集是cp1252的,网页的字符集是gb2312的,保存的b.txt文件也是gb2312的

satoru

UID: 26720
帖子: 172
积分: 395
在线时间: 1 天 16 小时

2^# satoru 发表于 2008-06-30 14:28

>>> print ['我',u'我']
['\xce\xd2', u'\u6211']
>>> print repr('我')
'\xce\xd2'
>>> print repr(u'我')
u'\u6211'
print操作符对于非字串的操作数先对其调用str(),而list对应于str的操作就是对其中的对象调用repr,所以只要是用到str(aListWithStrings)的地方,都会出现你这种情况.
所以才会出现你说的情况
你要输出可以换成
for city in citys:
print city,
else:
print
顺便说一下,正则错了

zzllabcd

UID: 25972
帖子: 1
积分: 2
在线时间: 10 分钟

3^# zzllabcd 发表于 2008-06-30 14:47

是啦,谢谢啦
呵呵,我那个正则表达式不完善的,只要能提取出汉字就行