Bug #911
closedNeed for URL encoding in mod_redirect and possibly mod_rewrite
Description
As reported earlier this year on the mailing list, opening this ticket now for tracking purposes. I'll try to provide a patch later.
The issue: when $n and %n references are used within mod_redirect, the resulting value should be properly encoded to be used in a (Location) header. I believe this is not being done by mod_redirect (and possibly mod_rewrite) currently.
Example:
$HTTP["host"] =~ ""^www\.(.*)" { $HTTP["referer"] =~ "(.*)" { url.redirect = ( "^/(.*)" => "http://www2.mydomain.com/myscript.php?path=$1&ref=%1" ) } }
Testing the example:
~ $ curl -e 'http://myreferer/' -I 'http://www.domain.com/' HTTP/1.1 301 Moved Permanently Location: http://www2.mydomain.com/myscript.php?path=domain.com/&ref=http://myreferer/ Content-Length: 0 Date: Tue, 21 Mar 2006 10:08:46 GMT Server: lighttpd/1.4.11
As you can see, the 'Location' URL is not properly encoded. It should look something like:
.../myscript.php?path=domain.com%2F&ref=http%3A%2F%2Fmyreferer%2F
So I think the decoding pass is made by Lighttpd, but mod_redirect should make sure to make the encoding pass when placing parameters in a URL.
Files
Updated by spillgroup almost 17 years ago
Added patch allows $n and %n references to be actively urlencoded by using the # character.
Example:
$HTTP["referer"] =~ "(.+)" { url.redirect = ( "(.*)" => "http://www.mydomain.com/myscript.php?&ref=%#1" ) }
So $n and %n are original values, while $#n and %#n are urlencoded values.
This affects both mod_redirect and mod_rewrite.
Updated by gstrauss over 8 years ago
- Description updated (diff)
- Assignee deleted (
jan)
The suggestion is sound and the issue is real. However, the patch needs further discussion.
Apache mod_rewrite (http://httpd.apache.org/docs/current/mod/mod_rewrite.html) provides an extensive number of flags, including:
B Escape non-alphanumeric characters in backreferences before applying the transformation. backrefnoplus|BNP If backreferences are being escaped, spaces should be escaped to %20 instead of +. Useful when the backreference will be used in the path component rather than the query string. noescape|NE Prevent mod_rewrite from applying hexcode escaping of special characters in the result of the rewrite.For each back-reference, should we provide the admin the ability to:
- replace with literal match (no encoding)
- replace with url-encoded match, but without url-encoding '/' char
- replace with url-encoded match, but with url-encoding '/' char
- replace with literal match, except also replace '+' with %20 (e.g. if the replacement was from query string condition match)
(query string is not url-decoded by lighttpd before regex matches, so taking from query string and replacing into path might need this) - ...other?
What syntax could we use to allow specification of some or all of the above, while remaining backwards compatible with existing syntax?
Updated by gstrauss over 8 years ago
- Related to Bug #1720: Rewrite/redirect rules and URL encoding added
Updated by AlxT over 8 years ago
Doesn't work in follwing config:
$SERVER["socket"] == ":3000" { $HTTP["host"] != "192.168.1.1:3000" { $HTTP["host"] =~ "(.*)" { url.redirect = ( "^/(.*)" => "http://192.168.1.1:3000/login?original_url=http://%1/$#1" ) url.redirect-code = 302 debug.log-request-handling = "enable" debug.log-request-header = "enable" debug.log-response-header = "enable" debug.log-condition-handling = "enable" accesslog.filename = "/usr/local/www/rewrite_access.log" server.errorlog = "/usr/local/www/rewrite_error.log" } } else $HTTP["host"] == "192.168.1.1:3000" { server.document-root = "/usr/local/www/script" index-file.names = ("cgi.pl") debug.log-request-handling = "enable" debug.log-request-header = "enable" debug.log-response-header = "enable" debug.log-condition-handling = "enable" accesslog.filename = "/usr/local/www/access.log" server.errorlog = "/usr/local/www/error.log" fastcgi.server = ( "" => ( "" => ( "socket" => "/usr/local/www/app/fastcgi.socket", "check-local" => "disable", "idle-timeout" => 100, ) ) ) } }
for version 1.4.39 i was using the folwing patch, I am not sure that it is correct:
*** mod_redirect.c.orig 2015-11-22 22:03:10.000000000 +0300 --- mod_redirect.c 2016-05-20 14:14:09.000000000 +0300 *************** *** 16,21 **** --- 16,37 ---- unsigned short redirect_code; } plugin_config; + /* Helper function */ + int config_append_cond_match_buffer_encoded(connection *con, data_config *dc, buffer *buf, int n) + { + cond_cache_t *cache = &con->cond_cache[dc->context_ndx]; + if (n > cache->patterncount) { + return 0; + } + + n <<= 1; /* n *= 2 */ + buffer_append_string_encoded(buf, + cache->comp_value->ptr + cache->matches[n], + cache->matches[n + 1] - cache->matches[n], + ENCODING_REL_URI_PART); + return 1; + } + typedef struct { PLUGIN_DATA; buffer *match_buf; *************** *** 221,230 **** start = 0; for (k = 0; k + 1 < pattern_len; k++) { ! if (pattern[k] == '$' || pattern[k] == '%') { ! /* got one */ ! size_t num = pattern[k + 1] - '0'; buffer_append_string_len(p->location, pattern + start, k - start); --- 237,254 ---- start = 0; for (k = 0; k + 1 < pattern_len; k++) { ! if ((pattern[k] == '$' || pattern[k] == '%') && ! (isdigit((unsigned char)pattern[k + 1]) || ! (pattern[k + 1] == '#' && pattern_len > k+2 && isdigit((unsigned char)pattern[k + 2])))) { ! /* got one */ ! ! size_t num = 0; ! if(pattern[k + 1] == '#') { ! num = pattern[k + 2] - '0'; ! } else { ! num = pattern[k + 1] - '0'; ! } buffer_append_string_len(p->location, pattern + start, k - start); *************** *** 234,240 **** } else if (pattern[k] == '$') { /* n is always > 0 */ if (num < (size_t)n) { ! buffer_append_string(p->location, list[num]); } } else if (p->conf.context == NULL) { /* we have no context, we are global */ --- 258,268 ---- } else if (pattern[k] == '$') { /* n is always > 0 */ if (num < (size_t)n) { ! if(pattern[k + 1] == '#') { ! buffer_append_string_encoded(p->location, list[num], strlen(list[num]), ENCODING_REL_URI_PART); ! } else { ! buffer_append_string(p->location, list[num]); // HERE ! } } } else if (p->conf.context == NULL) { /* we have no context, we are global */ *************** *** 242,251 **** "used a rewrite containing a %[0-9]+ in the global scope, ignored:", kv->value); } else { ! config_append_cond_match_buffer(con, p->conf.context, p->location, num); } ! k++; start = k + 1; } } --- 270,288 ---- "used a rewrite containing a %[0-9]+ in the global scope, ignored:", kv->value); } else { ! if(pattern[k + 1] == '#') { ! config_append_cond_match_buffer_encoded(con, p->conf.context, p->location, num); ! } else { ! config_append_cond_match_buffer(con, p->conf.context, p->location, num); ! } } ! if(pattern[k + 1] == '#') { ! k += 2; ! } else { ! k++; ! } ! start = k + 1; } }
Updated by gstrauss over 8 years ago
Thank you for reporting that the user-submitted patch is not completely working for you.
Separately, I posted two weeks ago to discuss an alternative syntax.
The suggestion is sound and the issue is real. However, the patch needs further discussion.
Apache mod_rewrite (http://httpd.apache.org/docs/current/mod/mod_rewrite.html) provides an extensive number of flags, including:
For each back-reference, should we provide the admin the ability to:
[...snip...]
- replace with literal match (no encoding)
- replace with url-encoded match, but without url-encoding '/' char
- replace with url-encoded match, but with url-encoding '/' char
- replace with literal match, except also replace '+' with %20 (e.g. if the replacement was from query string condition match)
(query string is not url-decoded by lighttpd before regex matches, so taking from query string and replacing into path might need this)- ...other?
What syntax could we use to allow specification of some or all of the above, while remaining backwards compatible with existing syntax?
Aside: the query string could be matched with a $HTTP["querystring"] condition wrapping the rewrite directive, and the matching captures could be used in the rewrite directive.
I am currently leaning towards a ${...} and %{...} syntax to be able to extensible support more than 9 captures, as well as to invent some new syntax inside the curly-braces. The new syntax would instruct various encodings. No new syntax would be the same as current behavior, which is literal replacement without encoding. While the new syntax could be non-numeric characters, I am partial to having a delimeter to allow the parser to validate the user intention. I like the extended pcre syntax (?...:) where ... is replaced with regex modifiers, though I am wondering whether or not reusing ?: will be confusing, and if different chars should be used, since the modifiers within ?...: would have different meaning from when (?...:...) is used within capturing parens. ...I'll ruminate further on this. Other opinions welcome.
Updated by gstrauss over 8 years ago
FYI: Apache supports syntax like:
RewriteCond %{QUERY_STRING} .* RewriteMap unescape int:unescape RewriteRule ^(.*)$ $1?${unescape:%{QUERY_STRING}}
Updated by AlxT over 8 years ago
I think that adding a new optional url.redirect-encoding option with different behaver depend on the option value, instead of adding new special character for different types of encoding is a good idea (like apache rewrite flags modifiers http://httpd.apache.org/docs/current/rewrite/flags.html ). Unfortunately I am not a good C-coder and haven't enough studied lighttpd mod_rewrite.c source code to provide a proper path.
One more question: about possibility to get full url with protocol like http://mysite.local/help/info.html I had to make "http://%1/$#1" and it is not a good idea because the protocol could be also https and slashes are not encoded and my sсript would work incorrect, is it possible to place all request including protocol (http://mysite.local/help/info.html) in $0 or %0 or any other special variable?
Updated by gstrauss over 8 years ago
Please don't pig-pile feature requests or other support requests into an existing bug. That just makes the discussion messy. Support forums (https://redmine.lighttpd.net/projects/lighttpd/boards/2) are the place for tangential "one more question".
Regarding your suggestion of using a "new optional url.redirect-encoding option ... (like apache rewrite flags modifiers ...)", those are two different things. A standalone directive is not the same as an extra parameter to an existing directive. While have a coarse mechanism to apply a single encoding to all string substitutions in a given regex is something, I expressed above that there can be quite a bit of complexity in required encodings for different parts of a URL, especially when the sources against which the regex (or parent conditional regex) were matched might be encoded (query string) or unencoded (uri path) depending on what it is. Therefore, I do not think that an extra parameter to the regex directives to apply a single encoding is sufficient.
Updated by AlxT over 8 years ago
Thanks! Sorry for pig-pileing, anyway, imho, the current patch is not working in my case as expected, thanks for comments and suggestion, I agree that your idea about apache-like condition rule syntax is much more flexible :) If you could provide any patch I'll test it in my case and report back.
Sorry for my English.
Updated by gstrauss over 8 years ago
- Related to Feature #443: [PATCH] mod_redirect: Add support for url-encoding backreferences, map %%n->%n, $$n->$n added
Updated by gstrauss over 6 years ago
- Status changed from Need Feedback to Patch Pending
- Target version set to 1.4.50
Updated by gstrauss over 6 years ago
- Status changed from Patch Pending to Fixed
- % Done changed from 0 to 100
Applied in changeset 255269d799185186bda64c04bf7a2efeeffbb8d6.
Updated by gstrauss over 6 years ago
Documentation for url-encoding backreferences can be found on mod_rewrite in section Extended Replacement Patterns
Also available in: Atom