Project

General

Profile

Feature #1982

[PATCH] mod_redirect improvement: simple (non-RE) redirection

Added by mkoloberdin over 8 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mod_redirect
Target version:
-
Start date:
2009-05-11
Due date:
% Done:

100%

Estimated time:
Missing in 1.5.x:

Description

This patch adds "url.redirect-simple" configuration option which works like "url.redirect" but instead of regular expressions it takes plain strings as arguments (exact matching). It is very fast as it uses glib hash table to look up.
Example:

url.redirect-simple = (
    "/source_dir/source_file.html" => "http://destination.com/destination_file.html",
)

IMPORTANT: This patch relies on my earlier patch: http://redmine.lighttpd.net/issues/1981 <- Apply this one first.

History

#2 Updated by icy over 8 years ago

Thanks for contributing.
After a quick glance at it I think you could precalculate the hash on startup to gain even more speed.
Featurewise: have you thought about supporting %n (from conditionals) in the redirect destination?

#3 Updated by mkoloberdin over 8 years ago

Thanks for the feedback but I'm not sure what do you mean.
The hash table is already populated on start up (in mod_redirect_set_defaults).
And by %n you mean backreferences like in regexps?

#4 Updated by icy over 8 years ago

If you pass your hashing function as hash func to the glib hashtable, it will compute the hash each time you use any operation like lookup. Instead you can pass NULL as hash func (which will default to g_direct_hash) and use precalculated hashes of the strings in lookup/delete/insert.
If you don't do this, I think regex could be faster because it only needs to traverse the string once but your implementation twice. (OK, probably not true for these small strings :))

And with %n I mean backreferences to previous regex matches, yes.

#5 Updated by mkoloberdin over 8 years ago

icy wrote:

If you pass your hashing function as hash func to the glib hashtable, it will compute the hash each time you use any operation like lookup. Instead you can pass NULL as hash func (which will default to g_direct_hash) and use precalculated hashes of the strings in lookup/delete/insert.

You must be confusing something. How can I precalculate hashes for requested URIs? They are obviously unpredictable. The only (I checked/traced it) hash calculation that happens during a request is the one for the requested URI which can't be avoided. And g_direct_hash is useless in this case as I'm using strings as keys, not pointers.

If you don't do this, I think regex could be faster because it only needs to traverse the string once but your implementation twice. (OK, probably not true for these small strings :))

Let's say you have thousands of redirection rules (for example a large website with complicated URL scheme migrated to new complicated URL scheme and you want to 301 all old URLs to new ones). How single hash calculation for one string with single hash table lookup can be possibly slower than looping over thousands of regexps until one of them matches? (that's what config_exec_pcre_keyvalue_buffer function does on each request)

And with %n I mean backreferences to previous regex matches, yes.

Implementing this feature would require looping over every element of the hash table.

#6 Updated by icy over 8 years ago

mkoloberdin wrote:

You must be confusing something. How can I precalculate hashes for requested URIs? They are obviously unpredictable. The only (I checked/traced it) hash calculation that happens during a request is the one for the requested URI which can't be avoided.

You are right, I totally confused something here.

Let's say you have thousands of redirection rules (for example a large website with complicated URL scheme migrated to new complicated URL scheme and you want to 301 all old URLs to new ones). How single hash calculation for one string with single hash table lookup can be possibly slower than looping over thousands of regexps until one of them matches? (that's what config_exec_pcre_keyvalue_buffer function does on each request)

Again, you are right.

And with %n I mean backreferences to previous regex matches, yes.

Implementing this feature would require looping over every element of the hash table.

At the risk of exposing my stupidity one more time today: Why would you have to do that?
You would parse the redirect target string at start up and then when a redirect is issued, you assemble the string.

Sorry for the noise.

#7 Updated by mkoloberdin over 8 years ago

icy wrote:

And with %n I mean backreferences to previous regex matches, yes.

Implementing this feature would require looping over every element of the hash table.

At the risk of exposing my stupidity one more time today: Why would you have to do that?
You would parse the redirect target string at start up and then when a redirect is issued, you assemble the string.

Perhaps I exposed my stupidity with this one ;) I kind of assumed (or shall I say "got used to"?) that backreferences involve some kind of wildcard matching.
Indeed backreferencing of parts of strings can easily be done, but IMHO it does not make much sense without wildcard matching. The only "feature" it would introduce is saving a bit of typing in config file (and it is unlikely that one will write a several-thousand-redirect ruleset by hand anyway). Consider the equivalent of the above example with backreferencing:

"/source_dir/source_(file.html)" => "http://destination.com/destination_%1",

And adding wildcards will break this whole approach of quick finding of redirect destinations (one lookup in the hash). If you need wildcard matching you are better of with regexps anyway.

#8 Updated by icy over 8 years ago

No, I didn't mean the $n ones for the "current regex" as there is none with static strings. Let me illustrate it with an example:

$HTTP["host"] =~ "^(.*)$" {
    url.redirect-simple = ("/...." => "http://%1/")
}

#9 Updated by gstrauss over 1 year ago

  • Target version deleted (1.5.0)

Also available in: Atom