Project

General

Profile

Actions

Feature #2340

closed

decoding %2F in fastcgi PATH_INFO

Added by eddie over 12 years ago. Updated about 8 years ago.

Status:
Fixed
Priority:
Normal
Category:
mod_fastcgi
Target version:
ASK QUESTIONS IN Forums:

Description

Currently lighttpd, like Apache decodes %2F to '/' in the PATH_INFO. This really brakes a lot of functionality in the fastcgi applications as they don't have the ability to distinguish between '/' and %2F. Say I have a path /one/two%2Fthree. If a fastcgi app tokenizes the PATH_INFO string between '/' before doing a url-decoding, this allows the "directory" names to themselves have '/' characters (one and two/three).

From what I understand, the purpose of url-encoding a character is to prevent it from being parsed in the url. I understand that a lot of applications out there may be dependent on this being broken, but apache has allowed this "brokenness" to be turned off with a "AllowEncodedSlashes NoDecode" directive in the configuration.

I develop the fastcgi++ library and am trying to move toward having the path info stored in a tokenized container as explained above. Unfortunately, this is totally broken when used with lighttpd because of this issue.

Is there any chance this will ever be changed? Does the architecture make such a change difficult or would it be simple? Any advice on pursuing such a modification?

Actions #1

Updated by eddie over 12 years ago

I should note that this

http://bulknews.typepad.com/blog/2009/09/path_info-decoding-horrors.html

is a good little blurb on the issue.

Actions #2

Updated by tmds over 9 years ago

With this patch, lighttpd (v1.4.35) no longer decodes slashes.

--- a/src/buffer.c      2014-08-12 11:57:10.170456141 +0200
+++ b/src/buffer.c      2014-08-12 11:58:35.783672399 +0200
@@ -883,8 +883,13 @@ static int buffer_urldecode_internal(buf
                                        /* map control-characters out */
                                        if (high < 32 || high == 127) high = '_';

-                                       *dst = high;
-                                       src += 2;
+                                       if (high == '/') {
+                                               *++dst = *++src;
+                                               *++dst = *++src;
+                                       } else {
+                                               *dst = high;
+                                               src += 2;
+                                       }
                                }
                        }
                } else {
Actions #3

Updated by gstrauss about 8 years ago

  • Tracker changed from Bug to Feature

One possibility: lighttpd sets the non-standard variable REQUEST_URI with the unencoded URI. You can re-split that into path and path_info in your application and then do a modified url-decode.

Actions #4

Updated by gstrauss about 8 years ago

See related tickets https://redmine.lighttpd.net/issues/2702 and https://redmine.lighttpd.net/issues/1828 and pull request https://github.com/lighttpd/lighttpd1.4/pull/36 where the non-standard REQUEST_URI is discussed.

Actions #5

Updated by gstrauss about 8 years ago

  • Status changed from New to Fixed
  • Target version set to 1.4.40

PATH_INFO is defined in the CGI spec to not be url-encoded.

As an alternative to get the information you seek, lighttpd provides REQUEST_URI (non-standard) along with standard CGI variables. The REQUEST_URI contains the raw request URI, before any url-decoding has been done, and before any information has been discarded, such as the distinction between '/' and '%2F' in the request.

To determine PATH_INFO, lighttpd url-decodes the URI, and then normalizes a path to the filesystem. "." and ".." URL path segments are resolved in the virtual path, and multiple consecutive slashes (e.g. "////") are reduced to a single slash. This normalized virtual URL path is used in config conditional matching so that conditions are applied consistently. This path is then tested against the filesystem and the longest existing path is used as the request target, with the remainder of the path treated as PATH_INFO. Since lighttpd has no way in advance to know what is part of the PATH_INFO, the entire path is normalized. (The patch you suggested above is not recommended since it might allow a malicious URL crafted with %2F earlier in the URL path to potentially cause config conditional match to fail to match a condition when it should match that condition (false-negative).)

If you have special needs to preserve the distinction between '/' and '%2F' in the request, then you can obtain that from the REQUEST_URI, which is still url-encoded. It is recommended that you put a tag at the beginning of the path_info in the URL to make it easy to find, e.g. http://example.com/physical/path/to/file/mytag/rest/of/path_info, and then you can url-decode the path following "mytag", skipping '%2F' if you wish.

Actions

Also available in: Atom