Project

General

Profile

Feature #1982

open

RFE: mod_redirect exact-match map: simple (non-RE) redirection

Added by mkoloberdin over 11 years ago. Updated 2 months ago.

Status:
New
Priority:
Low
Category:
mod_redirect
Target version:
-
ASK QUESTIONS IN Forums:
No

Description

[original subject: "mod_redirect improvement: simple (non-RE) redirection"] [see comments below from Nov 2020]

This patch adds "url.redirect-simple" configuration option which works like "url.redirect" but instead of regular expressions it takes plain strings as arguments (exact matching). It is very fast as it uses glib hash table to look up.
Example:

url.redirect-simple = (
    "/source_dir/source_file.html" => "http://destination.com/destination_file.html",
)

IMPORTANT: This patch relies on my earlier patch: http://redmine.lighttpd.net/issues/1981 <- Apply this one first.


Files

#2

Updated by icy over 11 years ago

Thanks for contributing.
After a quick glance at it I think you could precalculate the hash on startup to gain even more speed.
Featurewise: have you thought about supporting %n (from conditionals) in the redirect destination?

#3

Updated by mkoloberdin over 11 years ago

Thanks for the feedback but I'm not sure what do you mean.
The hash table is already populated on start up (in mod_redirect_set_defaults).
And by %n you mean backreferences like in regexps?

#4

Updated by icy over 11 years ago

If you pass your hashing function as hash func to the glib hashtable, it will compute the hash each time you use any operation like lookup. Instead you can pass NULL as hash func (which will default to g_direct_hash) and use precalculated hashes of the strings in lookup/delete/insert.
If you don't do this, I think regex could be faster because it only needs to traverse the string once but your implementation twice. (OK, probably not true for these small strings :))

And with %n I mean backreferences to previous regex matches, yes.

#5

Updated by mkoloberdin over 11 years ago

icy wrote:

If you pass your hashing function as hash func to the glib hashtable, it will compute the hash each time you use any operation like lookup. Instead you can pass NULL as hash func (which will default to g_direct_hash) and use precalculated hashes of the strings in lookup/delete/insert.

You must be confusing something. How can I precalculate hashes for requested URIs? They are obviously unpredictable. The only (I checked/traced it) hash calculation that happens during a request is the one for the requested URI which can't be avoided. And g_direct_hash is useless in this case as I'm using strings as keys, not pointers.

If you don't do this, I think regex could be faster because it only needs to traverse the string once but your implementation twice. (OK, probably not true for these small strings :))

Let's say you have thousands of redirection rules (for example a large website with complicated URL scheme migrated to new complicated URL scheme and you want to 301 all old URLs to new ones). How single hash calculation for one string with single hash table lookup can be possibly slower than looping over thousands of regexps until one of them matches? (that's what config_exec_pcre_keyvalue_buffer function does on each request)

And with %n I mean backreferences to previous regex matches, yes.

Implementing this feature would require looping over every element of the hash table.

#6

Updated by icy over 11 years ago

mkoloberdin wrote:

You must be confusing something. How can I precalculate hashes for requested URIs? They are obviously unpredictable. The only (I checked/traced it) hash calculation that happens during a request is the one for the requested URI which can't be avoided.

You are right, I totally confused something here.

Let's say you have thousands of redirection rules (for example a large website with complicated URL scheme migrated to new complicated URL scheme and you want to 301 all old URLs to new ones). How single hash calculation for one string with single hash table lookup can be possibly slower than looping over thousands of regexps until one of them matches? (that's what config_exec_pcre_keyvalue_buffer function does on each request)

Again, you are right.

And with %n I mean backreferences to previous regex matches, yes.

Implementing this feature would require looping over every element of the hash table.

At the risk of exposing my stupidity one more time today: Why would you have to do that?
You would parse the redirect target string at start up and then when a redirect is issued, you assemble the string.

Sorry for the noise.

#7

Updated by mkoloberdin over 11 years ago

icy wrote:

And with %n I mean backreferences to previous regex matches, yes.

Implementing this feature would require looping over every element of the hash table.

At the risk of exposing my stupidity one more time today: Why would you have to do that?
You would parse the redirect target string at start up and then when a redirect is issued, you assemble the string.

Perhaps I exposed my stupidity with this one ;) I kind of assumed (or shall I say "got used to"?) that backreferences involve some kind of wildcard matching.
Indeed backreferencing of parts of strings can easily be done, but IMHO it does not make much sense without wildcard matching. The only "feature" it would introduce is saving a bit of typing in config file (and it is unlikely that one will write a several-thousand-redirect ruleset by hand anyway). Consider the equivalent of the above example with backreferencing:

"/source_dir/source_(file.html)" => "http://destination.com/destination_%1",

And adding wildcards will break this whole approach of quick finding of redirect destinations (one lookup in the hash). If you need wildcard matching you are better of with regexps anyway.

#8

Updated by icy over 11 years ago

No, I didn't mean the $n ones for the "current regex" as there is none with static strings. Let me illustrate it with an example:

$HTTP["host"] =~ "^(.*)$" {
    url.redirect-simple = ("/...." => "http://%1/")
}
#9

Updated by gstrauss over 4 years ago

  • Target version deleted (1.5.0)
#10

Updated by gstrauss over 2 years ago

  • Priority changed from Normal to Low

For a large number of strings to match, the input is probably generated, as mkoloberdin noted. I'd add that, based on examples given by mkoloberdin, that the strings are exact matches, not prefix matches. As exact matches, it would probably be better named as url.redirect-map rather than url.redirect-simple Also, given that the tables will likely be generated, the replacements will likely be generated, too. (If not, then mod_redirect with regexes is probably a better solution.)

mod_redirect operates on the url-path and query-string, and would need to be an exact match in the map. Should the redirect-map match only the url-path? Should it omit or add back the original query-string, if present?

#11

Updated by gstrauss over 1 year ago

  • Subject changed from [PATCH] mod_redirect improvement: simple (non-RE) redirection to mod_redirect improvement: simple (non-RE) redirection
#12

Updated by gstrauss 11 months ago

It should be noted that what mkoloberdin is requesting can be implemented with mod_magnet and some custom lua code. See also the wiki AbsoLUAtion

#13

Updated by gstrauss 2 months ago

  • Subject changed from mod_redirect improvement: simple (non-RE) redirection to RFE: mod_redirect exact-match map: simple (non-RE) redirection
  • Description updated (diff)
  • ASK QUESTIONS IN Forums set to No

[Title (subject) renamed to provide some scope.]

Should anyone find this and wish to try extending mod_redirect or creating a new special-purpose module, please post in the lighttpd Development forum

An exact (non-regex) prefix match can already be done using mod_redirect using a plain regex, though having the ability to do an exact prefix match on a long list would be more efficient than performing a regex for each item in the list. Using mod_magnet is a reasonable option to achieve this.

For a generated list of mappings read from a file and which are exact matches, a new directive would be useful, and could operate on a sorted map. The map would contain only the url-path, not including the query-string, and the result of the mapping would then append the query-string, if query-string were present in the client request. Using mod_magnet is also a reasonable option to achieve this, though extending mod_redirect with a new directive might still be desirable for large sites managing or transitioning from a history of site redesigns.

If extending mod_redirect, directives which perform exact matches should be checked prior to directives which perform regexes. Also, if there is a match, then that should end redirect evaluations; a request should not continue to be matched against other mappings or regexes after there has been a match.

Also available in: Atom