ISAPI_rewrite中文手册附多站点配置方法第3/4页

更新时间：2007年07月19日 00:00:00 作者：

在NT 2000 XP和2003平台上，在系统帐户下应该INETINFO程序应该与IIS5以共存模式过滤器运行。所以系统帐户应该给予对所有的ISAPI-REWIRITE DLLS 和所有的HTTPD

alnum Any alpha numeric character.
alpha Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
blank Any blank character, either a space or a tab.
cntrl Any control character.
digit Any digit 0-9.
graph Any graphical character.
lower Any lower case character a-z. Other characters may also be included depending upon the locale.
print Any printable character.
punct Any punctuation character.
space Any whitespace character.
upper Any upper case character A-Z. Other characters may also be included depending upon the locale.
xdigit Any hexadecimal digit character, 0-9, a-f and A-F.
word Any word character - all alphanumeric characters plus the underscore.
unicode Any character whose code is greater than 255, this applies to the wide character traits classes only.

There are some shortcuts that can be used in place of the character classes:

\w in place of [:word:]
\s in place of [:space:]
\d in place of [:digit:]
\l in place of [:lower:]
\u in place of [:upper:]
Collating elements
Collating elements take the general form [.tagname.] inside a set declaration, where tagname is either a single character, or a name of a collating element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is equivalent to [,]. ISAPI_Rewrite supports all the standard POSIX collating element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", "nj", "dz", "lj", each in lower, upper and title case variations. Multi-character collating elements can result in the set matching more than one character, for example [[.ae.]] would match two characters, but note that [^[.ae.]] would only match one character.

Equivalence classes
Equivalenceclassestakethegeneralform[=tagname=] inside a set declaration, where tagname is either a single character, or a name of a collating element, and matches any character that is a member of the same primary equivalence class as the collating element [.tagname.]. An equivalence class is a set of characters that collate the same, a primary equivalence class is a set of characters whose primary sort key are all the same (for example strings are typically collated by character, then by accent, and then by case; the primary sort key then relates to the character, the secondary to the accentation, and the tertiary to the case). If there is no equivalence class corresponding to tagname, then [=tagname=] is exactly the same as [.tagname.].

To include a literal "-" in a set declaration then: make it the first character after the opening "[" or "[^", the endpoint of a range, a collating element, or precede it with an escape character as in "[\-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, a collating element, or precede with an escape character.

Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.

Back references
A back reference is a reference to a previous sub-expression that has already been matched, the reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "\" followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2" to the second etc. For example the expression "(.*)\1" matches any string that is repeated about its mid-point for example "abcabc" or "xyzxyz". A back reference to a sub-expression that did not participate in any match, matches the null string. In ISAPI_Rewrite all back references are global for entire RewriteRule and corresponding RewriteCond directives. Sub matches are numbered up to down and left to right beginning from the first RewriteCond directive of the corresponding RewriteRule directive, if there is one.

Forward Lookahead Asserts
There are two forms of these; one for positive forward lookahead asserts, and one for negative lookahead asserts:

"(?=abc)" matches zero characters only if they are followed by the expression "abc".
"(?!abc)" matches zero characters only if they are not followed by the expression "abc".

Word operators
The following operators are provided for compatibility with the GNU regular expression library.

"\w" matches any single character that is a member of the "word" character class, this is identical to the expression "[[:word:]]".
"\W" matches any single character that is not a member of the "word" character class, this is identical to the expression "[^[:word:]]".
"\<" matches the null string at the start of a word.
"\>" matches the null string at the end of the word.
"\b" matches the null string at either the start or the end of a word.
"\B" matches a null string within a word.
Escape operator
The escape character "\" has several meanings.

The escape operator may introduce an operator for example: back references, or a word operator.
The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.
Single character escape sequences:
The following escape sequences are aliases for single characters:

Escape sequence Character code Meaning
\a 0x07 Bell character.
\t 0x09 Tab character.
\v 0x0B Vertical tab.
\e 0x1B ASCII Escape character.
\0dd 0dd An octal character code, where dd is one or more octal digits.
\xXX 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits.
\x{XX} 0xXX A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character.
\cZ z-@ An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'.

Miscellaneous escape sequences:
The following are provided mostly for perl compatibility, but note that there are some differences in the meanings of \l \L \u and \U:

Escape sequence Meaning
\w Equivalent to [[:word:]].
\W Equivalent to [^[:word:]].
\s Equivalent to [[:space:]].
\S Equivalent to [^[:space:]].
\d Equivalent to [[:digit:]].
\D Equivalent to [^[:digit:]].
\l Equivalent to [[:lower:]].
\L Equivalent to [^[:lower:]].
\u Equivalent to [[:upper:]].
\U Equivalent to [^[:upper:]].
\C Any single character, equivalent to '.'.
\X Match any Unicode combining character sequence, for example "a\x 0301" (a letter a with an acute).
\Q The begin quote operator, everything that follows is treated as a literal character until a \E end quote operator is found.
\E The end quote operator, terminates a sequence begun with \Q.
What gets matched?
The regular expression will match the first possible matching string, if more than one string starting at a given location can match then it matches the longest possible string. In cases where their are multiple possible matches all starting at the same location, and all of the same length, then the match chosen is the one with the longest first sub-expression, if that is the same for two or more matches, then the second sub-expression will be examined and so on. Note that ISAPI_Rewrite uses MATCH algorithm. The result is matched only if the expression matches the whole input sequence. For example:

RewriteCond URL ^/somedir/.* #will match any request to somedir directory and subdirectories, while
RewriteCond URL ^/somedir/ #will match only request to the root of the somedir.
Special note about "pathological" regular expressions
ISAPI_Rewrite uses a very powerful regular expressions engine Regex++ written by Dr. John Maddock. But as any real thing it's not ideal: There exists some "pathological" expressions which may require exponential time for matching; these all involve nested repetition operators, for example attempting to match the expression "(a*a)*b" against N letter a's requires time proportional to 2N. These expressions can (almost) always be rewritten in such a way as to avoid the problem, for example "(a*a)*b" could be rewritten as "a*b" which requires only time linearly proportional to N to solve. In the general case, non-nested repeat expressions require time proportional to N2, however if the clauses are mutually exclusive then they can be matched in linear time - this is the case with "a*b", for each character the matcher will either match an "a" or a "b" or fail, where as with "a*a" the matcher can't tell which branch to take (the first "a" or the second) and so has to try both.

Boost 1.29.0 Regex++ could detect "pathological" regular expressions and terminate theirs matching. When a rule fails ISAPI_Rewrite sends "500 Internal Server error - Rule Failed" status to a client to indicate configuration error. Also the failed rule is disabled to prevent performance losses
Format string syntax
In format strings, all characters are treated as literals except: "(", ")", "$", "\", "?", ":".

To use any of these as literals you must prefix them with the escape character
The following special sequences are recognized:

Grouping:
Use the parenthesis characters ( and ) to group sub-expressions within the format string, use $ and $ to represent literal '(' and ')'.

Sub-expression expansions:
The following perl like expressions expand to a particular matched sub-expression:

$` Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match.
$' Expands to all the text from the end of the match to the end of the input string.
$& Expands to all of the current match.
$0 Expands to all of the current match.
$N Expands to the text that matched sub-expression N.

Conditional expressions:
Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:

?Ntrue_expression:false_expression

Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.

Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.

Escape sequences:
The following escape sequences are also allowed:

\a The bell character.
\f The form feed character.
\n The newline character.
\r The carriage return character.
\t The tab character.
\v A vertical tab character.
\x A hexadecimal character - for example \x0D.
\x{} A possible unicode hexadecimal character - for example \x{1A0}
\cx The ASCII escape character x, for example \c@ is equivalent to escape-@.
\e The ASCII escape character.
\dd An octal character constant, for example \10

Examples例子

Emulating host-header-based virtual sites on a single site
例如你在两个域名注册www.site1.com 和 www.site2.com，现在你可以创建两个不同的站点而使用单一的物理站点。把以下规则加入到你的httpd.ini 文件

Code:
[ISAPI_Rewrite]

#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]

#Emulate site1
RewriteCond Host: (?:www\.)?site1\.com
RewriteRule (.*) /site1$1 [I,L]

#Emulate site2
RewriteCond Host: (?:www\.)?site2\.com
RewriteRule (.*) /site2$1 [I,L]

现在你可以把你的站点放在/site1 和 /site2 目录中.

或者你可以应用更多的类规则：

Code:
[ISAPI_Rewrite]

#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,R]

RewriteCond Host: (www\.)?(.+)
RewriteRule (.*) /$2$3

为站点应该命名目录为 /somesite1.com, /somesite2.info, etc.
Using loops (Next flag) to convert request parameters
假如你希望有物理URL如 http://www.myhost.com/foo.asp?a=A&b=B&c=C 使用请求如 http://www.myhost.com/foo.asp/a/A/b/B/c/C 参数数量可以从两个请求之间变化

至少有两个解决办法。你可以简单的为每一可能的参数数量添加一个分隔规则或者你可以使用一个技术说明如下面的例子

Code:
ISAPI_Rewrite]
RewriteRule (.*?\.asp)(\?[^/]*)?/([^/]*)/([^/]*)(.*) $1(?2$2&:\?)$3=$4$5 [NS,I]

这个规则将从请求的URL中抽取一个参数追加在请求字符的末尾并且从头重启规则进程。所以它将循环直到所有参数被移动到适当的位置，或者直到超过RepeatLimit
也存在许多这个规则的变种。但使用不同的分隔字符，例如。使用URLS如http://www.myhost.com/foo.asp~a~A~b~B~c~C 可以应中下面的规则：