Project

General

Profile

CacheMetaLanguage » History » Revision 8

Revision 7 (conny, 2006-02-07 10:38) → Revision 8/14 (moo, 2006-05-21 07:54)

= CML aka Cache Meta Language = 

 == What Is It == 

 CML tries to move the decision about a cache-hit and cache-miss for a dynamic website 
 out of the dynamic application, removing the need to start the application or dynamic 
 language at all. 

 Especially PHP is known to have a huge overhead before the script is started to be executed. 

 == How To Install == 

 The language used by CML is LUA which you can find at http://www.lua.org/ 

 To get some background on how to write LUA code check out: 

  * http://lua-users.org/wiki/LuaAddons 
  * http://luaforge.net/ 

 == Benefits == 

 The main benefit of CML is its performance.  

 A very simple benchmark showed: 

  * about 1000 req/s for the static 'output.html' which is generated output from the PHP script 
  * about    600 req/s if index.cml is called (cache-hit) 
  * about     50 req/s if index.php is called (cache-miss) 

 Using CML improves the performance for the tested page by a factor of 12, getting  
 near enough to the possible maximum of the static file transfer. 

 == Usage Patterns == 

 http://www.lighttpd.net/ is using CML to reduce the load (even if the load is minimal). 

 The layout of the front page depends on a few files: 

  * content-1 
  * content-6 
  * the template /main.tmpl 

 If any of the files are modified the cached version of the page must change as well. 

 {{{ 
 output_contenttype = "text/html" 

 trigger_handler = "index.php" 

 -- this file updated by the trigger  
 output_include = { "output.html" } 

 docroot = request["DOCUMENT_ROOT"] 
 cwd = request["CWD"] 

 -- the dependencies 
 files = { cwd .. "content-1", cwd .. "content-6", docroot .. "main.tmpl" } 

 cached_mtime = file_mtime(cwd .. "output.html") 

 -- if one of the source files is newer than the generated files 
 -- call the trigger 
 for i,v in ipairs(files) do 
   if file_mtime(v) > cached_mtime then return 1 end 
 end 

 return 0 
 }}} 

 == Delaying recheck == 

 If you are building a news aggregator it is useful to be able to delay the rebuild of the cached content for a period of time, as you can assume that the news are not changing with each request. So instead of revalidating on each request you delay the validation check. 

 {{{ 
 -- same as above 

 -- check again in 5 minutes 
 delay_recheck = 3600 

 if cached_mtime + delay_recheck > os.time() then return 0 end 

 -- we are behind the delayed recheck, check the cache as usual 

 for i,v in ipairs(files) do 
   if file_mtime(v) > cached_mtime then return 1 end 
 end 

 return 0 
 }}} 

 And to tell the proxies inbetween not to check again in the next 5 minutes after they received this content, use the setenv module and add some cache-control or expire headers. 


 == CML and Databases == 

 CML does not provide direct access to databases like MySQL or PostgreSQL, and probably never will. 

 There is a better/faster way to interface CML with Databases: MemCache 

 All you have to do is keep the information needed to decide whether a page has to be regenerated in a memcached storage itself. Let's say that whenever you store an entry in the database, you associate a Version-ID with it. The Version-ID is incremented as soon as you make a change to the resource. 

 This Version-ID is now stored in the Database and in memcache at the same time. CML can now fetch the Version-ID, check if content already has been generated for it, and generate it if necessary. 

 {{{ 
 output_contenttype = "text/html" 

 content_key = md5(request["PATH_INFO"]) 
 version = memcache_get_long(content_key) 
 cwd = request["CWD"] 

 trigger_handler = "generate.php" 

 if version >= 0 then 
   output_include = { cwd .. content_key .. "-" .. version .. ".html" } 
   return 0 
 else 
   return 1 
 end 
 }}} 

 generate.php will have to: 

  * get PATH_INFO 
  * fetch information from database about it 
  * generate content for the page and write it to disk 
  * deliver it to the client 

 To interface the database with the memcached you can use a UDF: 

  * for [http://www.mysql.com/ MySQL] can get the mysql udf at [http://jan.kneschke.de/projects/mysql/udf/ jans mysql page] 
  * for [http://www.postgresql.org/ PostgreSQL] Sean Chittenden has written [http://people.freebsd.org/~seanc/pgmemcache/ pgmemcache] 

 In MySQL and the UDF you just do: 
 {{{ 
 #!sql 
 BEGIN; 
 UPDATE content SET @v := (version = version + 1) WHERE id = <id>; 
 SELECT memcache_set("127.0.0.1:11211", <id>, @v); 
 COMMIT; 
 }}} 

 To check which version is currently used by the cache: 
 {{{ 
 #!sql 
 SELECT memcache_get("127.0.0.1:11211", <id>); 
 }}}