Project

General

Profile

Actions

CacheMetaLanguage » History » Revision 6

« Previous | Revision 6/14 (diff) | Next »
jan, 2005-07-18 09:05
delaying rechecks


= CML aka Cache Meta Language =

What Is It

CML tries to move the decision about a cache-hit and cache-miss for a dynamic website
out of the dynamic application, removing the need to start the application or dynamic
language at all.

Especially PHP is know to have a huge overhead before the script is started to be executed.

How To Install

The language used by CML is LUA which you can find at http://www.lua.org/

The get some background how to write LUA check out:

Benifits

The main benifit of CML is its performance.

A very simple benchmark showed:

  • about 1000 req/s for the static 'output.html' which is generated
  • about 600 req/s if index.cml is called (cache-hit)
  • about 50 req/s if index.php is called (cache-miss)

Using CML improves the performance for the tested page by a factor of 12, getting
near enough to the possible maximum of the static file transfer.

Usage Patterns

http://www.lighttpd.net/ is using CML to reduce the load (even if the load is minimal).

The layout of the front page depends on a few files:

  • content-1
  • content-6
  • the template /main.tmpl

If one of the files gets changed the cached version of the page has to be changed too.

{{{
output_contenttype = "text/html"

trigger_handler = "index.php"

-- this file updated by the trigger
output_include = { "output.html" }

docroot = request["DOCUMENT_ROOT"]
cwd = request["CWD"]

-- the dependencies
files = { cwd .. "content-1", cwd .. "content-6", docroot .. "main.tmpl" }

cached_mtime = file_mtime(cwd .. "output.html")

-- if one of the source files is newer than the generated files
-- call the trigger
for i,v in ipairs(files) do
if file_mtime(v) > cached_mtime then return 1 end
end

return 0
}}}

Delaying recheck

If you are building a news aggregator it is usefull to delay the rebuild of the content for some seconds as you can assume that the news are not changing with each request. Instead of revalidating on each request you just delay the check.

{{{
-- same as above

-- check again in 5 minutes
delay_recheck = 3600

if cached_mtime + delay_recheck > os.time() then return 0 end

-- we are behind the delayed recheck, check the cache as usual

for i,v in ipairs(files) do
if file_mtime(v) > cached_mtime then return 1 end
end

return 0
}}}

And to tell the proxies inbetween not to check again in the next 5 minutes after they received this content use the setenv module and add some cache-control or expire headers.

CML and Databases

CML doesn't provide direct access to databases like MySQL or PostgreSQL. And to make sure that we don't get the request later: It will never get it.

There is a better/faster way to interface CML with Databases: MemCache

All you have to do is keeping the interesting information to decide if a page has to regenerated in a memcached storage. Let's say what whenever you store a entry in the database you associate a Version-ID with it. The Version-ID is incremented as soon as you make a change to the resource.

This version Version-ID is now stored in the Database and in memcache at the same time. CML can now fetch the Version-ID, check if content has been generated for it, generate it if necessary.

{{{
output_contenttype = "text/html"

key = md5(request["PATH_INFO"])
version = memcache_get_long(key)
cwd = request["CWD"]

trigger_handler = "generate.php"

if version >= 0 then
output_include = { cwd .. key .. "-" .. version .. ".html" }
return 0
else
return 1
end
}}}

generate.php will have to:

  • get PATH_INFO
  • fetch information from database about it
  • generate content for the page and write it to disk
  • deliver it to the client

To interface the database with the memcached you can use a UDF:

In MySQL and the UDF you just do: {{{
BEGIN;
UPDATE content SET @v := (version = version + 1) WHERE id = <id>;
SELECT memcache_set("127.0.0.1:11211", <id>, @v);
COMMIT;
}}}

To check which version is currently used by the cache: {{{
SELECT memcache_get("127.0.0.1:11211", <id>);
}}}

Updated by jan almost 19 years ago · 6 revisions