Google Desktop for Linux With Apache2 On LAN
i770880
|
1#
i770880 发表于 2008-03-26 21:42
Google Desktop for Linux With Apache2 On LAN
Google Desktop for Linux With Apache2 On LAN
前言: 在两年前第一次试作将google desktop与apache结合用于LAN的文件搜索,原文见这里《第一次原创:使用Google桌面搜索打造企业搜索服务器》http://blog.chinaunix.net/u/13472/showart.php?id=73880 当时for linux的google desktop还没有出来,让我的samba文件服务器没有了集成的搜索服务可用,可谓望眼欲穿。等到for linux出来后,发现居然不支持搜索MS专有格式文档,又失望了很一段时间。终于,终于等到了google desktop for linux v1.1.1.0075,支持DOC、XLS、PPT的索引支持了,所以就捣鼓着一定要将它放置在我的samba服务器上,在提供samba服务的同时也提供一个简单的搜索服务器。 正文: 原理和前文一样,依靠apache来代理google desktop。前文中提到需要端口映射器,经过后来的搜索,原来是缺少了设置反向代理所致,即在ProxyPass后面再接一个ProxyPassReverse代理就可以避免了。所以,现在与apache结合的google desktop已经不需要客户端做任何设置了,有一个浏览器就足够了,而文件浏览器足够充当这个角色了。 如果这个apache没有其他用途,如前文,给服务器分配第2个ip专门用来处置这个google desktop代理,简单的配置文件如下:
[Copy to clipboard] [ - ]
CODE:
NameVirtualHost 192.168.1.120:80
<VirtualHost 192.168.1.120:80> ServerAdmin webmaster@localhost ServerName 192.168.1.120 ProxyPass // [url]http://127.0.0.1:30043/[/url] #注:这里的30043端口每个linux用户是不同的,需要提前在桌面上记录google desktop的起始页面。 ProxyPassReverse // [url]http://127.0.0.1:30043/[/url] <Proxy [url]http://127.0.0.1:30043[/url]> Allow from all </Proxy> <Directory /> Options FollowSymLinks AllowOverride None Allow from all </Directory> <Location /redir> Deny from all </Location> <Location /openfolder> Deny from all </Location> </VirtualHost> 在重启apache前还需要修改apache的运行用户为google desktop的运行用户,这是因为google desktop的索引文件都是针对单个linux用户可读的,其他用户都不可读,所以用其他用户启动的apache是不能读取google desktop的数据的,也就无法代理了。 修订好这一切,apache重启后,通过http://192.168.1.120/XXXXXXXX(后面省略的是google desktop的起始地址,每个linux桌面用户的都不同)就可以访问On LAN上的google desktop。 下一步,我试作将这个On LAN的google desktop集成进文件服务器,毕竟去记住那串后缀地址还是很困难的,所以有必要把这个首页文件存放在文件服务器上,通过文件服务器访问到这个文件后就可以点击首页文件打开搜索代理服务器了。 这里需要注意的是,简单的将首页保存下来的文件中由于相对地址的原因,通过文件服务器启动的首页文件不能进行搜索,所以我做了这样的改动:
[Copy to clipboard] [ - ]
CODE:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <meta http-equiv="cache-control" content="no-cache"> <meta http-equiv="pragma" content="no-cache"> <meta http-equiv="expires" content="-1"> <title>Google 桌面</title> <style> body,p,td{font-family:arial,sans-serif;color:#000}body{background-color:#fff;margin:4px}img{border:0}table,td{border:0;margin:0;padding:0}.nowrap{white-space:nowrap}.none{display:none}.inline{display:inline}.float_left{float:left}.logo3{margin-top:9px;padding-bottom:10px}a:visited{color:#551a8b}a:link{color:#00c}a:active{color:#00c}a:hover{color:#00c} .q{color:#00c;padding:4px 0 4px 4px;margin:0;white-space:nowrap}.q a:visited{color:#00c}.q a:link{color:#00c}.q a:hover{color:#00c}.q a:active{color:#00c}span{margin:0px}div{border:0;margin:0;padding:0}div#basic{margin:7px}div#advanced{margin:7px}div#search_box{padding-top:30px;padding-bottom:30px}div#line{background-color:#39c;height:1px}div#bottomquery{background-color:#e8f4f7} div#querybuttons{padding-top:20px;padding-bottom:20px;text-align:center}div#bottom_links{text-align:center;font-size:small;padding-bottom:80px;white-space:nowrap}p#copyright{padding-top:3px;font-size:x-small;white-space:nowrap}div#home_bottom span#homelink{display:none}div#pref_bottom div#bottom_links{padding-bottom:10px}h1{color:#335cec;font-size:large;font-weight:bold} div.centerwarning{text-align:center}h4#fixmsg,h4#lowdisk{color:#f60} input#q { margin-bottom:1px } div#idxprogress { text-align: center; color: #f60; } h4#idxongoing { padding-top: 5px; padding-bottom: 5px; } </style> <script> <!-- function sf() { document.f.q.focus(); } function sw() { window.location = "http://www.google.com/search?sourceid=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&sa=N&tab=xw&q=" + encodeURIComponent(document.f.q.value); } --> </script> </head> <body onLoad=sf()> <center> <br> <img src="image/hp-logo.gif?hl=zh_CN" width=276 height=110 alt="Google 桌面"> <br><br> <form name=f action="http://192.168.1.120/search" method=get> <input type="hidden" name="hl" value="zh_CN"> <input type="hidden" name="s" value="IKfIRNbuy8oqOJPMZBNzffceB6c"> <div class="q"> <style>TD.q {white-space: nowrap}</style><style>#lgpd{display:none}</style><script defer><!-- function qs(el){if(window.RegExp&&window.encodeURIComponent){var ue=el.href,qe=encodeURIComponent(document.f.q.value);if(ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;} //--> </script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><a class=q href="http://www.google.com/webhp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xw" onclick="return qs(this)">网页</a> <a class=q href="http://images.google.com/imghp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xi" onclick="return qs(this)">图片</a> <a class=q href="http://groups.google.com/grphp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xg" onclick="return qs(this)">论坛</a> <a class=q href="http://news.google.com/nwshp?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xn" onclick="return qs(this)">资讯</a> <a class=q href="http://ditu.google.com/maps?source_id=GGXD&rlz=1L1GGXD&hl=zh-CN&oe=UTF-8&q=GOOOOG&tab=xl" onclick="return qs(this)">地图</a> <b>桌面</b> <!--ENTERPRISE--><b><a href="http://www.google.com/intl/zh-CN/options/" class=q>更多 »</a></b></font></td></tr></table></div> <table cellspacing=0 cellpadding=0><tr><td width=25%> </td> <td align=center> <input id="q" maxlength=512 size=55 name=q value="" title="Google 桌面"><br> <input type=submit value="搜索桌面"> <input type=button value="搜索网络" onclick=sw()> </td> <td valign=top nowrap width=25%><font size=-2> <a href="http://192.168.1.120/options?hl=zh_CN&s=KfaPHSQRTXpTQAiCn0SSk8QuG2U">桌面使用偏好</a><br> <a href="http://192.168.1.120/adv?hl=zh_CN&s=RInjEvQpp_ec0xuHJfV2RMld_nM">高级搜索</a><br> </font></td> </tr></table> </form> <br> <div class="centerwarning"> <br> </div> <br> <div id="home_bottom"> <div id="bottom_links"> <span id="homelink"> Google 桌面主页 - </span> <a href="http://192.168.1.120/status?hl=zh_CN&s=1lSl07UkINdDG1B8XKs5981ICpM">索引状态</a> - <a href="http://192.168.1.120/privacy?hl=zh_CN&s=7MTauqWsrjfyih2nmolpmXA1mfc">隐私权</a> - <a href="http://192.168.1.120/about?hl=zh_CN&s=9H5yl44biBXVBAjJUC77vy7RfPY">关于</a> <p id="copyright">©2007 Google</p> </div> </div> </center> </body> </html> 上面含有的“http://192.168.1.120”字串都是我改动添加上去的,如此启动的首页文件便可以触发搜索代理服务器。 我的基本要求达到后,还是没有达到我的预期。因为我的服务器本身启用apache的原因是为了提供samba文件服务器的跨网段web访问,所以前面那个首页文件也可以被原来的apache访问到,但是却不能提供搜索服务(我的ip地址有限,不能够把内网地址全部映射出去的)。所以,接下来,对上面的设置适当加以改造,让它适合互联网应用。 显然,不能代理成根目录了,因为根目录要用来当作文件服务器的首页,所以就把它代理到/googlesearch,所以代理部分的内容就变成了:
[Copy to clipboard] [ - ]
CODE:
ProxyPass /googlesearch/ [url]http://127.0.0.1:30043/[/url]
ProxyPassReverse /googlesearch/ [url]http://127.0.0.1:30043/[/url] <Proxy [url]http://127.0.0.1:30043[/url]> Allow from all </Proxy> <Location /googlesearch/redir> Deny from all </Location> <Location /googlesearch/openfolder> Deny from all </Location> 这样的代理可以打开主页但是根本不能展开搜索,原因是google desktop启动搜索的时候的url地址都是从根/search开始的,所以,需要进行URL重写,如下:
[Copy to clipboard] [ - ]
CODE:
RewriteEngine On
<Directory /Fileserver> # /Fileserver目录是DocumentRoot目录; Options Indexes FollowSymLinks MultiViews AllowOverride None Order allow,deny allow from all RedirectMatch ^/search /googlesearch/search </Directory> 终于,google desktop被集成进apache了。最后一步,修改主页文件,另存为/Fileserver/文件搜索/目录下的index.html,以保证apache访问到该目录时直接打开首页文件。 首页文件的修改很简单,把上面的http://192.168.1.120全部替换成“/googlesearch”就可以了。 尾注: 目前残留的问题就是将搜索出来的文件打开的问题,上面的处理都是简单的屏蔽,要实现如DNKA一般的效果需要采用输出重新,我这里简单把mod_sar的说明贴在这儿。
QUOTE:
NAME
mod_sar - apache2 module which works as output filter and it's purpose is to Search And Replace strings found in web content before it's sending to the client. COMPILE mod_sar can be compiled with apxs( or manually by hand. 1. Using apxs for compilation: apxs -c mod_sar.c If everything goes fine, you will find mod_sar.so under .libs in your current directory. 2. Compiling mod_sar manually: gcc -pthread -I/usr/include/httpd -c mod_sar.c gcc -shared mod_sar.o -Wl,-soname -Wl,mod_sar.so -o mod_sar.so If needed, modify path to your httpd include directory and if everything goes fine, you will find mod_sar.so in your current directory. INSTALL mod_sar can be installed with apxs( or manually by hand. 1. Using apxs for instalation: This command will compile and install your mod_sar module. apxs -i -a -c mod_sar.c Restart apache by first stopping it and then starting it: apachectl stop apachectl start 2. Installing mod_sar manually: cp mod_sar.so /usr/lib/httpd/modules chown root: /usr/lib/httpd/modules/mod_sar.so chmod 755 /usr/lib/httpd/modules/mod_sar.so If needed, modify path to your httpd modules directory. Now, you have to modify your httpd.conf file. Find the bunch of LoadModule directives and append your own line under them: LoadModule sar_module modules/mod_sar.so Restart apache by first stopping it and then starting it: apachectl stop apachectl start DESCRIPTION mod_sar ("sar" stands for Search And Replace) is apache2 module which works as output filter. It's purpose is to search and replace strings found in web content before it's sending to the client. Search performed can be case sensitive or case insensitive, depending on configuration. Perfect example of common usage of this module is reverse proxy. Reverse proxy is proxy in front of the local server, which can be accessed from Internet only trough that proxy. In some cases such configuration can be used effectively to prevent worms and other unwanted guests but most commonly it just present a false layer of security for those who do not understand server - client communication. Whatever reason you have, for usable reverse proxy you will have to solve two problems: modification of headers and modification of content before it's sending to client. 1. Header modification Header modification is not problem at all. It can be achieved two ways. You can use mod_proxy_http: <IfModule mod_proxy.c> <roxy *> Order deny,allow Allow from all </Proxy> ProxyRequests On ProxyPass / http://some-domain.local/ ProxyPassReverse / http://some-domain.local/ ProxyErrorOverride On </IfModule> Or, you can use mod_rewrite: <IfModule mod_rewrite.c> RewriteEngine on RewriteRule ^/(.*) http://some-domain.local/$1 [P] RewriteOptions inherit </IfModule> 2. Content modification Header modification will make all relative links look like they are coming from external domain some-domain.com instead of real, local domain some-domain.local. But if server behind reverse proxy the serves pages with absolute links, we will have to modify content of that pages on the fly, using apache2 output filter mechanism. There are three choices: mod_proxy_html, mod_ext_filter and mod_sar. The first uses a libxml2 and because of that, it is not good for purpose such as reverse proxy. For example, libxml2 will seriously corrupt HTML code in case of a minor errors in HTML such as missing quote. mod_proxy_html inherits that nasty habit from libxml2 but if you want to try it your own, you can find that module at http://apache.webthing.com/mod_proxy_html/ The second one is not a third party module, it comes with apache2 and it can suite needs for reverse proxy but it is not good for heavy loaded sites because external command is executed for every request. Here is example of mod_ext_filter usage: <IfModule mod_ext_filter.c> ExtFilterDefine fixtext mode=output intype=text/html \ cmd="/bin/sed s/some-domain\.local/some-domain\.com/g" <Location /> SetOutputFilter fixtext </Location> </IfModule> And the third one is the one you are just looking at: mod_sar. See the DIRECTIVES and EXAMPLES sections for usage information. mod_sar will do one simple thing. It will replace one string with another, depending on configuration. It can perform case insensitive search if needed. It has been tested under heavy load without performance impact. DIRECTIVES SarStrings <search_string> <replace_string> This directive requires two parameters, search string and replace string enclosed with double quotes. It can be used in server config and virtual host context. SarCaseInsensitive <On|Off> If set to On, case insensitive search will be performed instead of exact string match. Default is Off. It can be used in server config and virtual host context. SarVerbose <On|Off> If set to On, every time mod_sar is used as filter, message is printed into apache error logs. Default is Off. It can be used in server config and virtual host context. EXAMPLES <IfModule mod_sar.c> AddOutputFilterByType sar_filter text/html SarStrings "http://some-domain.local" "http//some-domain.com" SarCaseInsensitive Off SarVerbose Off </IfModule> REQUIREMENTS Apache-2.0. COMPATIBILITY It has been tested on Linux but there is no obvious reason why it would'n work on other unix platforms supported by apache2. OS: Linux compiler: gcc-2.9x, gcc-3.x apache: apache-2.0.x BUGS Current version of mod_sar does not contain known bugs. SEE ALSO apxs(, http://www.apache.org/ AUTHOR Josip Deanovic <djosip@linuxpages.org> 由于新版本的google desktop的输出url规则比较复杂,重写很困难,加上linux文件系统中太多的权限,许多目录都不会允许apache访问的,所以就懒得再折腾了,毕竟输出的信息中已经有文件位置的详细地址,通过文件服务器找寻下去也是很方便的。 最后,希望看到这篇文章的达人能够帮助写出mod-sar的输出规则,帮我完善这个Google Desktop For Linux with Apache2 On LAN,谢谢。 |