使用 shell 写 cgi 程序中的 urldecode

使用 shell 写 cgi 程序中的 urldecode

使用 shell 写 cgi 程序时, 从 QUERY_STRING 获取的由 url 或者 form 表单提交的字串已经是经过 urlencode 的了.

php 的 中有 urlencode 和 urldecode 来对字串进行解码. 但 shell 中没有.

今天恰好要用, 就搜索了一下, 找到一个使用 awk 来进行 urldecode 的脚本, 参见http://www.chinaunix.net/bbsjh/11/617.html , 复制下来无法直接进行, 做了一点调整之后好用了, 另外其中的 空格" " 转 "+" 的做反了.

因此, 在处理这个字串时得先进行 urldecode

urldecode.awk
[code:1]
#!/usr/bin/awk -f

BEGIN {

       hextab="0123456789ABCDEF"
       for ( i=1; i<=255; ++i ) ord [i] = sprintf("%c",i);
}
{
       decoded = ""
       for ( i=1; i<=length ($0); ++i ) {
           c = substr ($0, i, 1)
           if ( c ~ /[a-zA-Z0-9.-]/ ) {
               decoded = decoded c             # safe character
           } else if ( c == "+" ) {
               decoded = decoded " "   # special handling
           } else if ( c == "%" ) {
               hi= substr($0,i+1,1);
               low=substr($0,i+2,1);
               i++;i++
               decoded = decoded ord[(index(hextab,hi)-1)*16+index(hextab,low)-1]
           }
        }
}
END {print decoded}
[/code:1]

注意, 如果你的 awk 程序路径不一样, 请按实际情况修改.

实例:
[code:1]
$> echo "abc%2Babc+abc" | urldecode.awk
abc+abc abc
$> echo "%D6%D0%B9%FA%D6%D0%B9%FA " | urldecode.awk
中国中国
[/code:1]

呆会把 shell 写 cgi 的相关东西弄点上来. 等着.   
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

The CGI Specification
This is the specification for CGI version 1.1, or CGI/1.1. Further revisions of this protocol are guaranteed to be backward compatible.

The server and the CGI script communicate in four major ways. Each of the following is a hotlink to graphic detail.

    * Environment variables
    * The command line
    * Standard input
    * Standard output
<H1><IMG ALT="" SRC="/images/CGIlogo.gif"> CGI Environment Variables</H1>
<HR>

<P>

In order to pass data about the information request from the server to
the script, the server uses command line arguments as well as
environment variables. These environment variables are set when the
server executes the gateway program. <P>

<HR>
<H2>Specification</H2>

<P>

The following environment variables are not request-specific and are
set for all requests: <P>

<UL>
<LI> <CODE>SERVER_SOFTWARE</CODE> <P>

    The name and version of the information server software answering
    the request (and running the gateway). Format: name/version <P>

<LI> <CODE>SERVER_NAME</CODE> <P>
    The server's hostname, DNS alias, or IP address as it would appear
    in self-referencing URLs. <P>

<LI> <CODE>GATEWAY_INTERFACE</CODE> <P>
    The revision of the CGI specification to which this server
    complies. Format: CGI/revision<P>

</UL>

<HR>

The following environment variables are specific to the request being
fulfilled by the gateway program: <P>

<UL>
<LI> <A NAME="protocol"><CODE>SERVER_PROTOCOL</CODE></A> <P>

    The name and revision of the information protcol this request came
    in with. Format: protocol/revision <P>

<LI> <CODE>SERVER_PORT</CODE>  <P>
    The port number to which the request was sent. <P>

<LI> <CODE>REQUEST_METHOD</CODE> <P>
    The method with which the request was made. For HTTP, this is
    "GET", "HEAD", "POST", etc. <P>

<LI> <CODE>PATH_INFO</CODE> <P>
    The extra path information, as given by the client. In other
    words, scripts can be accessed by their virtual pathname, followed
    by extra information at the end of this path. The extra
    information is sent as PATH_INFO. This information should be
    decoded by the server if it comes from a URL before it is passed
    to the CGI script.<P>

<LI> <CODE>PATH_TRANSLATED</CODE> <P>
    The server provides a translated version of PATH_INFO, which takes
    the path and does any virtual-to-physical mapping to it. <P>

<LI> <CODE>SCRIPT_NAME</CODE> <P>

    A virtual path to the script being executed, used for
    self-referencing URLs. <P>

<LI> <A NAME="query"><CODE>QUERY_STRING</CODE></A> <P>
    The information which follows the ? in the <A
    HREF="http://www.ncsa.uiuc.edu/demoweb/url-primer.html">URL</A>
    which referenced this script. This is the query information. It
    should not be decoded in any fashion. This variable should always
    be set when there is query information, regardless of <A
    HREF="cl.html">command line decoding</A>. <P>

<LI> <CODE>REMOTE_HOST</CODE> <P>
    The hostname making the request. If the server does not have this
    information, it should set REMOTE_ADDR and leave this unset.<P>

<LI> <CODE>REMOTE_ADDR</CODE> <P>
    The IP address of the remote host making the request. <P>

<LI> <CODE>AUTH_TYPE</CODE> <P>

    If the server supports user authentication, and the script is
    protects, this is the protocol-specific authentication method used
    to validate the user. <P>

<LI> <CODE>REMOTE_USER</CODE> <P>
    If the server supports user authentication, and the script is
    protected, this is the username they have authenticated as. <P>
<LI> <CODE>REMOTE_IDENT</CODE> <P>
    If the HTTP server supports RFC 931 identification, then this
    variable will be set to the remote user name retrieved from the
    server. Usage of this variable should be limited to logging only.
    <P>

<LI> <A NAME="ct"><CODE>CONTENT_TYPE</CODE></A> <P>
    For queries which have attached information, such as HTTP POST and
    PUT, this is the content type of the data. <P>

<LI> <A NAME="cl"><CODE>CONTENT_LENGTH</CODE></A> <P>
    The length of the said content as given by the client. <P>

</UL>


<a name="headers"><hr></a>

In addition to these, the header lines received from the client, if
any, are placed into the environment with the prefix HTTP_ followed by
the header name. Any - characters in the header name are changed to _
characters. The server may exclude any headers which it has already
processed, such as Authorization, Content-type, and Content-length. If
necessary, the server may choose to exclude any or all of these
headers if including them would exceed any system environment
limits. <p>

An example of this is the HTTP_ACCEPT variable which was defined in
CGI/1.0. Another example is the header User-Agent.<p>

<ul>
<LI> <CODE>HTTP_ACCEPT</CODE> <P>

    The MIME types which the client will accept, as given by HTTP
    headers. Other protocols may need to get this information from
    elsewhere. Each item in this list should be separated by commas as
    per the HTTP spec. <P>

    Format: type/subtype, type/subtype <P>


<li> <code>HTTP_USER_AGENT</code><p>

    The browser the client is using to send the request. General
format: <code>software/version library/version</code>.<p>

</ul>

<HR>
<H2>Examples</H2>

Examples of the setting of environment variables are really much better
<A HREF="examples.html">demonstrated</A> than explained. <P>

<HR>

<A HREF="interface.html"><IMG ALT="[Back]" SRC="/images/back.gif">Return to the
interface specification</A> <P>

CGI - Common Gateway Interface
<ADDRESS><A HREF="mailtocgi.html">cgi@ncsa.uiuc.edu</A></ADDRESS>