自己写一个小型的Proxy,Part I(1)
缘起:由于经常在bbsuubird上看片,但是bbsuubird的广告实在太长,
再加上一堆超长的置顶主题,往往要scroll到自己感兴趣的话题的时候就要滚半天,
所以想写一个server端处理程序,可以把取回来的html页面进行一定的处理。
那么架构就是,自己用php?name=rails" onclick="tagshow(event)" class="t_tag">rails写一个服务器端,代理客户端的请求并返回给客户端,
如果需要处理,加上一些处理的handler,然后再返回给客户端。
ok,开始,首先安装rails, gem install rails(我用的是rails 2.2.2)
其次用rails生成一个应用:
rails proxying
新建一个controller, ruby script\generate controller proxying
然后在ProxyingController里头
def index
... 这里就是我们要的主要处理逻辑
end
更改路由,因为我们不需要action:
map.connect 'proxying/:id', :controller=>"proxying", :action=>"index"
require 'open-uri'
def index
url = params[:id] || params[:q]
file = open(url)
doc = Hpricot(file.read)
end
这样,
返回的html页面,要对html进行处理,最自然的第一步,
就是改写链接了:
def rewrite_link_for_doc(doc)
rewrite_link(doc, "//img[@src]", "src", false)
#css link
rewrite_link(doc, "//link[@href]", "href", false)
rewrite_link(doc, "//script[@src]", "src", false)
rewrite_link(doc, '//*[@background]', "background", false)
#or background:url?
rewrite_link(doc, '*[@style*=background]', "style.background-url", false)
#replace every link with relative link to base_url
rewrite_link(doc, '//a[@href]', "href", true)
rewrite_link(doc, '//form[@action]', "action", true)
end
def rewrite_link(doc, selector, attribute, prefixing_proxy)
doc.search(selector).to_a.each do |link|
if attribute.index "."
attr, attr2 = attribute.split(".")
attr2.gsub!("-", ":")
url = link.attributes[attr].scan(/#{attr2}\((.*)\)/)[0]
#puts "wa:#{url.inspect},#{link}"
next if url.nil?
url = url[0]
else
url = link.attributes[attribute]
end
href = URI(url) rescue URI("#") #we met URI("###"),weird
if !href.host
#relative url
doc_url = URI(@page.uri.to_s) #already URI::###
if url[0] == ?/
to_url = doc_url.scheme + "://" + doc_url.host + url #todo
else
to_url = doc_url.scheme + "://" + doc_url.host
to_url += "/" if doc_url.path == ""
str = "doc_url.path:#{doc_url.path},url:#{url}"
if doc_url.path == ""
to_url += url
else
to_url += doc_url.path.gsub!(/\/[^\/]*$/, "/#{url}")
end
logger.info "#{str}, to_url:#{to_url}"
end
else
to_url = link.attributes[attribute]
end
if prefixing_proxy
to_url = ERB::Util.url_encode(to_url).gsub!(".", "%2E")
end
if attribute.index "."
attr, attr2 = attribute.split(".")
attr2.gsub!("-", ":")
if prefixing_proxy
#puts "before link:#{link}"
to_url = "http://localhost:3000/proxying/"+to_url
link.set_attribute(attr, link.attributes[attr].gsub!(/(#{attr2})\((.*)\)/, "\\1(#{to_url})"))
#puts "after link:#{link}"
else
link.set_attribute(attr, link.attributes[attr].gsub!(/(#{attr2})\((.*)\)/, "\\1(#{to_url})"))
end
else
if prefixing_proxy
link.set_attribute(attribute, "http://localhost:3000/proxying/"+to_url)
else
link.set_attribute(attribute, to_url)
end
end
end
end
最后渲染会客户端:
在index action最后加上:
render :text=>doc.to_html,
:content_type => file.content_type
其实这里我们还需要对content_type稍微做一些处理