这个怎么翻译？

cclong

UID: 17886
帖子: 1
积分: 2
在线时间: 10 分钟

1^# cclong 发表于 2007-12-05 20:28

这个怎么翻译？

倡导讨论，团队合体，但有时候我觉得，我这样发文章出来，然后说我这里翻译不行，就不像是在讨论，感觉自己在问老师php?name=%D7%F7%D2%B5" onclick="tagshow(event)" class="t_tag">作业.不知道其他伙伴有这感觉没有？这个团队合作，还真的需要多一点才行
希望大家给点意见,下面粗体的就是我不知道如何译的:-(

Processing HTML with Hpricot
使用Hpricot处理HTML

In this world of Web2.0 mashups and easy API access, it is quite refreshing how easy it is to pull data for third party sites and re-mash it into something new. Unfortunately, not everyone has been bitten by this bug, so we as developers sometimes have to do a little more leg work to get the information we need. A common technique is called a screen scrape where your application acts like a browser and parses the HTML returned from the third party server.

Although this should be simple enough, anyone who has ever tried to do this knows the pain of dancing with regular expressions in an attempt to find the the tags that you need. Luckily, us rubyists have the Hpricot library which takes the hard work out of parsing HTML. Hpricot allows developers to access html elements via CSS-selectors and X-Path, so you can target specific tags really easily. And because it is written in C, it is pretty fast too.

在Web2.0世界里聚合(mashups)和简单API访问使从第三方站点提取数据来重新聚合(re-mash)到新项目中变得多么容易。不幸的是，不是每个人都被它的bug所害。我们开发者不得不做多一点工作来获取我们需要的信息。通常称这技术为“网页抓取”（screen scrape）,把你的应用程序担任为浏览器去解释从第三方站点服务器返回的Html。
尽管这应该相当简单，但那些曾经尝试通过正则表达式(regular expressions)来抓取网页里的指定标记会发现这是非常麻烦的。幸运的是，我们ruby程序员有解决麻烦的HTML解释的库:Hpricot library。Hpricot允许开发者通过CSS-selectors和X-Path访问html元素，因此你很轻松就可以明确目标标记，还有它是用C语言写的，因此相当快。

Hpricot is a gem, so installation is as easy as:
Hpricot 是一个gem,因此安装很简单。

The just require the library at the top of the ruby file:
需要调用这个库的时候只需要在ruby文件的顶部添加下面代码:

Lets take this HTML snippet:
让我们看看下面这个HTML片断:

We can easily pull out the content of the paragraphs by doing this (Let’s assume the HTML is already stored in the variable @html)
我们能轻易从段落中抓取数据(假设这个HTML网页数据已经储藏在@html变量中)

Yep - that’s it. You now have an array with two elements that are the same as the copy in the two p tags. Notice that the p tag in the sub-content div isn’t pulled in?

It doesn’t end there though, you can also manipulate the HTML - which can come in handy if you wanted to, say, create a quick and dirty mobile version. Let’s say we wanted to remove the sub-content div from the mobile version, we could do this:
是的，这样就解决了上面的问题。现在你数组里已经有2个元素就像从两个p标记里复制的一样。注意到在sub-content层的p标记没有抓取？
不但如此，你还能操纵HTML-如果你需要，它就能派上用场。比如，创建一个快速和肮脏移动版本。如果我们想从移动版本去掉sub-content层，我们可以这样做：

This is just the tip of the iceberg - the library is really powerful and simple to use. Go and check out the official page for more (less trivial) examples.
Disclaimer: You should make sure you have permission for the website owner before screen-scraping their site.
这只是冰山一角-这个库真的非常强大并且容易使用。官方网页有更多(有意义)的例子。
免责声明：你在抓取别人网页数据之前请确认已得到对方负责人的准许。

drive2me

UID: 29989
帖子: 30
积分: 69
在线时间: 1 小时

2^# drive2me 发表于 2007-12-05 21:22

好建议，我们多些沟通、激励和少些指责，互相尊重，大家就有干劲和不气馁了，对吧。

我有告诉你吗？其实你进步得很快，比你以前问我问题时，提高了好多，要了解你自己，要有信心。哈！

顺便说一下，你第一个英文粗体的句子：Unfortunately, not everyone has been bitten by this bug, 意译得很到位。看到了吧，你在进步呢！呵呵！

先让大家帮你看看。

谢谢cclong的努力。继续加油。

maninred

UID: 35369
帖子: 2
积分: 4
在线时间: 10 分钟

3^# maninred 发表于 2007-12-06 10:01

哈哈，其实大家要大胆一点提问题出来讨论。

cclong现在你不就提出来大家来讨论了嘛，这样就不会有那种在问老师作业的感觉了吧？

drive2me

UID: 29989
帖子: 30
积分: 69
在线时间: 1 小时

4^# drive2me 发表于 2007-12-08 09:49

没有人来参加讨论吗？

playing5460

UID: 698
帖子: 49
积分: 112
在线时间: 3 小时

5^# playing5460 发表于 2007-12-08 22:17

In this world of Web2.0 mashups and easy API access, it is quite refreshing how easy it is to pull data for third party sites and re-mash it into something new. Unfortunately, not everyone has been bitten by this bug, so we as developers sometimes have to do a little more leg work to get the information we need. A common technique is called a screen scrape where your application acts like a browser and parses the HTML returned from the third party server.

Although this should be simple enough, anyone who has ever tried to do this knows the pain of dancing with regular expressions in an attempt to find the the tags that you need. Luckily, us rubyists have the Hpricot library which takes the hard work out of parsing HTML. Hpricot allows developers to access html elements via CSS-selectors and X-Path, so you can target specific tags really easily. And because it is written in C, it is pretty fast too.

在Web2.0世界里聚合(mashups)和简单API访问使从第三方站点提取数据来重新聚合(re-mash)到新项目中变得多么容易。不幸的是，不是每个人都被它的bug所害。
----
并不是所有人都会
----

我们开发者不得不做多一点工作来获取我们需要的信息。通常称这技术为“网页抓取”（screen scrape）,把你的应用程序担任为浏览器去解释从第三方站点服务器返回的Html。
-------
你的程序扮演浏览器角色访问第三方服务器并转换接收的HTML
-------

尽管这应该相当简单，但那些曾经尝试通过正则表达式(regular expressions)来抓取网页里的指定标记会发现这是非常麻烦的。

---
按道理，这应当非常简单，但只有那些曾经试图通过正则表达式检索标签的人来说，才明白这是件很痛苦的事情
----

幸运的是，我们ruby程序员有解决麻烦的HTML解释的库:Hpricot library。Hpricot允许开发者通过CSS-selectors和X-Path访问html元素，因此你很轻松就可以明确目标标记，还有它是用C语言写的，因此相当快。