Project

General

Profile

Google reports 404 Error when using the percent-hex codes in the url links - Lighttpd bug??

Added by robertrade about 13 years ago

Hi all,

I would like to know if it's me who did not try hard enough to find the rewrite rules, or it's the Lightty bug that may be related to percent with hex codes that would be converted to correct characters. Google complained about that which resulted in 404 error (reported by Google webmaster).

Here's the code snippet used to validate the 404 error:

parsetest.php Code:

$url1 = 'http://rademacher.org/parsetest.php?name=robert&street=main_street&zip=12324';
$url2 = 'http://rademacher.org/parsetest.php%3Fname%3Drobert&street%3Dmain_street&zip%3D12324';

echo 'URL1 Test ' + $url1;
print_r(parse_url($url1));

echo 'URL2 Test ' + $url2;
print_r(parse_url($url2));

echo parse_url($url1, PHP_URL_PATH);
echo parse_url($url2, PHP_URL_PATH);

echo 'DumpRequest:';
var_dump($_REQUEST);
?>

Here are two test URLs used for testing the outputs:

(Works fine)
http://rademacher.org/parsetest.php?name=robert&street=main_street&zip=12324

(Got 404 Error)
http://rademacher.org/parsetest.php%3Fname%3Drobert&street%3Dmain_street&zip%3D12324

I am unable to find the rewrite rules anywhere on Internet (Google, Yahoo, Bing) when dealing with %3F (?) hex code that would force to be changed to '?' to allow for proper URL parsing used by parse_url, etc.

Please tell me it's my fault for not finding the rewrite rule needle in the haystack, OR it's really the lighttpd bug when dealing with rewrite rule to handle the percent-hex codes.

Robert


Replies (4)

RE: Google reports 404 Error when using the percent-hex codes in the url links - Lighttpd bug?? - Added by nitrox about 13 years ago

see issue #1720 - this might be the cause of your problem.

Or look at the debug options to parse lightys requests-handling.

RE: Google reports 404 Error when using the percent-hex codes in the url links - Lighttpd bug?? - Added by robertrade about 13 years ago

Hi,

#1720 issue does not look like the url decode issue has been resolved. I'm referring to the "root" directory, not subdirectories.

Several unresolved issues:
http://redmine.lighttpd.net/projects/lighttpd/repository/revisions/2362

And, this part:
http://redmine.lighttpd.net/projects/lighttpd/repository/revisions/2362/entry/branches/lighttpd-1.4.x/doc/rewrite.txt
shows that rewrite should not be used to deal with %hexcodes, which got me very confused.

I believe there must be a way to get around that problem.

RE: Google reports 404 Error when using the percent-hex codes in the url links - Lighttpd bug?? - Added by nitrox about 13 years ago

I think the status on #1720 (which is related or the maybe even your main problem here) has been open for quite some while because we have no solution that fits everyones needs. We tried a fix which we had to pull back shortly afterwards.

Discussion might be easier on irc (#lighttpd at irc.freenode.net) but this thread should be updated afterwards.

First we should see if #1720 comes into play or if your problem is a different one.

RE: Google reports 404 Error when using the percent-hex codes in the url links - Lighttpd bug?? - Added by icy about 13 years ago

I don't see where the problem is. If you encode the '?' to %3F, then it's not the querystring seperator anymore.

  • /parsetest.php?name=robert&street=main_street&zip=12324
    • filename is "parsetest.php" and querystring is "name=robert&street=main_street&zip=12324"
  • /parsetest.php%3Fname%3Drobert&street%3Dmain_street&zip%3D12324
    • filename is "parsetest.php?name=robert&street=main_street&zip=1234" (literally, no querystring here!)
    (1-4/4)