Project

General

Profile

Bug #911

Need for URL encoding in mod_redirect and possibly mod_rewrite

Added by spillgroup over 10 years ago. Updated about 1 year ago.

Status:
Need Feedback
Priority:
Normal
Assignee:
-
Category:
mod_redirect
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Missing in 1.5.x:

Description

As reported earlier this year on the mailing list, opening this ticket now for tracking purposes. I'll try to provide a patch later.

The issue: when $n and %n references are used within mod_redirect, the resulting value should be properly encoded to be used in a (Location) header. I believe this is not being done by mod_redirect (and possibly mod_rewrite) currently.

Example:


$HTTP["host"] =~ ""^www\.(.*)" {
  $HTTP["referer"] =~ "(.*)" {
    url.redirect = (
      "^/(.*)" => "http://www2.mydomain.com/myscript.php?path=$1&ref=%1" 
    )
  }
}

Testing the example:


~ $ curl -e 'http://myreferer/' -I 'http://www.domain.com/'
HTTP/1.1 301 Moved Permanently
Location:
http://www2.mydomain.com/myscript.php?path=domain.com/&ref=http://myreferer/
Content-Length: 0
Date: Tue, 21 Mar 2006 10:08:46 GMT
Server: lighttpd/1.4.11

As you can see, the 'Location' URL is not properly encoded. It should look something like:


.../myscript.php?path=domain.com%2F&ref=http%3A%2F%2Fmyreferer%2F

So I think the decoding pass is made by Lighttpd, but mod_redirect should make sure to make the encoding pass when placing parameters in a URL.


Related issues

Related to Bug #1720: Rewrite/redirect rules and URL encodingNeed Feedback

Related to Feature #443: [PATCH] mod_redirect: Add support for url-encoding backreferences, map %%n->%n, $$n->$nAssigned

History

#1 Updated by spillgroup over 9 years ago

Added patch allows $n and %n references to be actively urlencoded by using the # character.

Example:


$HTTP["referer"] =~ "(.+)" {
  url.redirect = (
    "(.*)" => "http://www.mydomain.com/myscript.php?&ref=%#1" 
  )
}

So $n and %n are original values, while $#n and %#n are urlencoded values.
This affects both mod_redirect and mod_rewrite.

#2 Updated by Anonymous about 9 years ago

Test thius

#3 Updated by gstrauss about 1 year ago

  • Description updated (diff)
  • Assignee deleted (jan)

The suggestion is sound and the issue is real. However, the patch needs further discussion.

Apache mod_rewrite (http://httpd.apache.org/docs/current/mod/mod_rewrite.html) provides an extensive number of flags, including:

B                     Escape non-alphanumeric characters in backreferences before applying the transformation.
backrefnoplus|BNP     If backreferences are being escaped, spaces should be escaped to %20 instead of +. Useful when the backreference will be used in the path component rather than the query string.
noescape|NE           Prevent mod_rewrite from applying hexcode escaping of special characters in the result of the rewrite.

For each back-reference, should we provide the admin the ability to:
  • replace with literal match (no encoding)
  • replace with url-encoded match, but without url-encoding '/' char
  • replace with url-encoded match, but with url-encoding '/' char
  • replace with literal match, except also replace '+' with %20 (e.g. if the replacement was from query string condition match)
    (query string is not url-decoded by lighttpd before regex matches, so taking from query string and replacing into path might need this)
  • ...other?

What syntax could we use to allow specification of some or all of the above, while remaining backwards compatible with existing syntax?

#4 Updated by gstrauss about 1 year ago

  • Status changed from New to Need Feedback

#5 Updated by gstrauss about 1 year ago

  • Related to Bug #1720: Rewrite/redirect rules and URL encoding added

#6 Updated by AlxT about 1 year ago

Doesn't work in follwing config:

$SERVER["socket"] == ":3000" {
    $HTTP["host"] != "192.168.1.1:3000" {
       $HTTP["host"] =~ "(.*)" {
           url.redirect = ( "^/(.*)" => "http://192.168.1.1:3000/login?original_url=http://%1/$#1" )
           url.redirect-code = 302
           debug.log-request-handling = "enable" 
           debug.log-request-header = "enable" 
           debug.log-response-header = "enable" 
           debug.log-condition-handling = "enable" 
           accesslog.filename = "/usr/local/www/rewrite_access.log" 
           server.errorlog = "/usr/local/www/rewrite_error.log" 

        }
    }

    else $HTTP["host"] == "192.168.1.1:3000" {

        server.document-root = "/usr/local/www/script" 
        index-file.names = ("cgi.pl")
        debug.log-request-handling = "enable" 
        debug.log-request-header = "enable" 
        debug.log-response-header = "enable" 
        debug.log-condition-handling = "enable" 
        accesslog.filename = "/usr/local/www/access.log" 
        server.errorlog = "/usr/local/www/error.log" 
        fastcgi.server = ( "" =>
            ( "" =>
                (
                    "socket" => "/usr/local/www/app/fastcgi.socket",
                    "check-local" => "disable",
                    "idle-timeout" => 100,
                )
            )
        )
    }
}

for version 1.4.39 i was using the folwing patch, I am not sure that it is correct:


*** mod_redirect.c.orig 2015-11-22 22:03:10.000000000 +0300
--- mod_redirect.c      2016-05-20 14:14:09.000000000 +0300
***************
*** 16,21 ****
--- 16,37 ----
        unsigned short redirect_code;
  } plugin_config;

+ /* Helper function */
+ int config_append_cond_match_buffer_encoded(connection *con, data_config *dc, buffer *buf, int n)
+ {
+       cond_cache_t *cache = &con->cond_cache[dc->context_ndx];
+       if (n > cache->patterncount) {
+               return 0;
+       }
+
+       n <<= 1; /* n *= 2 */
+       buffer_append_string_encoded(buf,
+                       cache->comp_value->ptr + cache->matches[n],
+                       cache->matches[n + 1] - cache->matches[n],
+                       ENCODING_REL_URI_PART);
+       return 1;
+ }
+
  typedef struct {
        PLUGIN_DATA;
        buffer *match_buf;
***************
*** 221,230 ****

                        start = 0;
                        for (k = 0; k + 1 < pattern_len; k++) {
!                               if (pattern[k] == '$' || pattern[k] == '%') {
!                                       /* got one */

!                                       size_t num = pattern[k + 1] - '0';

                                        buffer_append_string_len(p->location, pattern + start, k - start);

--- 237,254 ----

                        start = 0;
                        for (k = 0; k + 1 < pattern_len; k++) {
!                               if ((pattern[k] == '$' || pattern[k] == '%') &&
!                                    (isdigit((unsigned char)pattern[k + 1]) ||
!                                    (pattern[k + 1] == '#' && pattern_len > k+2 && isdigit((unsigned char)pattern[k + 2])))) {

!                                       /* got one */
!
!                                        size_t num = 0;
!                                        if(pattern[k + 1] == '#') {
!                                                num = pattern[k + 2] - '0';
!                                        } else {
!                                                num = pattern[k + 1] - '0';
!                                        }

                                        buffer_append_string_len(p->location, pattern + start, k - start);

***************
*** 234,240 ****
                                        } else if (pattern[k] == '$') {
                                                /* n is always > 0 */
                                                if (num < (size_t)n) {
!                                                       buffer_append_string(p->location, list[num]);
                                                }
                                        } else if (p->conf.context == NULL) {
                                                /* we have no context, we are global */
--- 258,268 ----
                                        } else if (pattern[k] == '$') {
                                                /* n is always > 0 */
                                                if (num < (size_t)n) {
!                                                       if(pattern[k + 1] == '#') {
!                                                               buffer_append_string_encoded(p->location, list[num], strlen(list[num]), ENCODING_REL_URI_PART);
!                                                       } else {
!                                                               buffer_append_string(p->location, list[num]); // HERE
!                                                       }
                                                }
                                        } else if (p->conf.context == NULL) {
                                                /* we have no context, we are global */
***************
*** 242,251 ****
                                                                "used a rewrite containing a %[0-9]+ in the global scope, ignored:",
                                                                kv->value);
                                        } else {
!                                               config_append_cond_match_buffer(con, p->conf.context, p->location, num);
                                        }

!                                       k++;
                                        start = k + 1;
                                }
                        }
--- 270,288 ----
                                                                "used a rewrite containing a %[0-9]+ in the global scope, ignored:",
                                                                kv->value);
                                        } else {
!                                               if(pattern[k + 1] == '#') {
!                                                       config_append_cond_match_buffer_encoded(con, p->conf.context, p->location, num);
!                                               } else {
!                                                       config_append_cond_match_buffer(con, p->conf.context, p->location, num);
!                                               }
                                        }

!                                       if(pattern[k + 1] == '#') {
!                                               k += 2;
!                                       } else {
!                                               k++;
!                                       }
!
                                        start = k + 1;
                                }
                        }

#7 Updated by gstrauss about 1 year ago

Thank you for reporting that the user-submitted patch is not completely working for you.

Separately, I posted two weeks ago to discuss an alternative syntax.

The suggestion is sound and the issue is real. However, the patch needs further discussion.

Apache mod_rewrite (http://httpd.apache.org/docs/current/mod/mod_rewrite.html) provides an extensive number of flags, including:
[...snip...]

For each back-reference, should we provide the admin the ability to:
  • replace with literal match (no encoding)
  • replace with url-encoded match, but without url-encoding '/' char
  • replace with url-encoded match, but with url-encoding '/' char
  • replace with literal match, except also replace '+' with %20 (e.g. if the replacement was from query string condition match)
    (query string is not url-decoded by lighttpd before regex matches, so taking from query string and replacing into path might need this)
  • ...other?

What syntax could we use to allow specification of some or all of the above, while remaining backwards compatible with existing syntax?

Aside: the query string could be matched with a $HTTP["querystring"] condition wrapping the rewrite directive, and the matching captures could be used in the rewrite directive.

I am currently leaning towards a ${...} and %{...} syntax to be able to extensible support more than 9 captures, as well as to invent some new syntax inside the curly-braces. The new syntax would instruct various encodings. No new syntax would be the same as current behavior, which is literal replacement without encoding. While the new syntax could be non-numeric characters, I am partial to having a delimeter to allow the parser to validate the user intention. I like the extended pcre syntax (?...:) where ... is replaced with regex modifiers, though I am wondering whether or not reusing ?: will be confusing, and if different chars should be used, since the modifiers within ?...: would have different meaning from when (?...:...) is used within capturing parens. ...I'll ruminate further on this. Other opinions welcome.

#8 Updated by gstrauss about 1 year ago

FYI: Apache supports syntax like:

RewriteCond %{QUERY_STRING} .*
RewriteMap unescape int:unescape  
RewriteRule ^(.*)$          $1?${unescape:%{QUERY_STRING}}

#9 Updated by AlxT about 1 year ago

I think that adding a new optional url.redirect-encoding option with different behaver depend on the option value, instead of adding new special character for different types of encoding is a good idea (like apache rewrite flags modifiers http://httpd.apache.org/docs/current/rewrite/flags.html ). Unfortunately I am not a good C-coder and haven't enough studied lighttpd mod_rewrite.c source code to provide a proper path.

One more question: about possibility to get full url with protocol like http://mysite.local/help/info.html I had to make "http://%1/$#1" and it is not a good idea because the protocol could be also https and slashes are not encoded and my sсript would work incorrect, is it possible to place all request including protocol (http://mysite.local/help/info.html) in $0 or %0 or any other special variable?

#10 Updated by gstrauss about 1 year ago

Please don't pig-pile feature requests or other support requests into an existing bug. That just makes the discussion messy. Support forums (https://redmine.lighttpd.net/projects/lighttpd/boards/2) are the place for tangential "one more question".

Regarding your suggestion of using a "new optional url.redirect-encoding option ... (like apache rewrite flags modifiers ...)", those are two different things. A standalone directive is not the same as an extra parameter to an existing directive. While have a coarse mechanism to apply a single encoding to all string substitutions in a given regex is something, I expressed above that there can be quite a bit of complexity in required encodings for different parts of a URL, especially when the sources against which the regex (or parent conditional regex) were matched might be encoded (query string) or unencoded (uri path) depending on what it is. Therefore, I do not think that an extra parameter to the regex directives to apply a single encoding is sufficient.

#11 Updated by AlxT about 1 year ago

Thanks! Sorry for pig-pileing, anyway, imho, the current patch is not working in my case as expected, thanks for comments and suggestion, I agree that your idea about apache-like condition rule syntax is much more flexible :) If you could provide any patch I'll test it in my case and report back.

Sorry for my English.

#12 Updated by gstrauss 12 months ago

  • Related to Feature #443: [PATCH] mod_redirect: Add support for url-encoding backreferences, map %%n->%n, $$n->$n added

Also available in: Atom