Project

General

Profile

CacheMetaLanguage » History » Revision 6

Revision 5 (jan, 2005-07-18 08:58) → Revision 6/14 (jan, 2005-07-18 09:05)

= CML aka Cache Meta Language = 

 == What Is It == 

 CML tries to move the decision about a cache-hit and cache-miss for a dynamic website 
 out of the dynamic application, removing the need to start the application or dynamic 
 language at all. 

 Especially PHP is know to have a huge overhead before the script is started to be executed. 

 == How To Install == 

 The language used by CML is LUA which you can find at http://www.lua.org/ 

 The get some background how to write LUA check out: 

  * http://lua-users.org/wiki/LuaAddons 
  * http://luaforge.net/ 

 == Benifits == 

 The main benifit of CML is its performance.  

 A very simple benchmark showed: 

  * about 1000 req/s for the static 'output.html' which is generated 
  * about    600 req/s if index.cml is called (cache-hit) 
  * about     50 req/s if index.php is called (cache-miss) 

 Using CML improves the performance for the tested page by a factor of 12, getting  
 near enough to the possible maximum of the static file transfer. 

 == Usage Patterns == 

 http://www.lighttpd.net/ is using CML to reduce the load (even if the load is minimal). 

 The layout of the front page depends on a few files: 

  * content-1 
  * content-6 
  * the template /main.tmpl 

 If one of the files gets changed the cached version of the page has to be changed too. 

 {{{ 
 output_contenttype = "text/html" 

 trigger_handler = "index.php" 

 -- this file updated by the trigger  
 output_include = { "output.html" } 

 docroot = request["DOCUMENT_ROOT"] 
 cwd = request["CWD"] 

 -- the dependencies 
 files = { cwd .. "content-1", cwd .. "content-6", docroot .. "main.tmpl" } 

 cached_mtime = file_mtime(cwd .. "output.html") 

 -- if one of the source files is newer than the generated files 
 -- call the trigger 
 for i,v in ipairs(files) do 
   if file_mtime(v) > cached_mtime then return 1 end 
 end 

 return 0 
 }}} 

 == Delaying recheck == 

 If A simple way to reduce the load especially if you are building can't add a news aggregator it fine-grained caching to your app is usefull to delay cache the rebuild of the content output for some seconds as you can assume that a longer time even if the news are not changing with each request. Instead of revalidating on each request you just delay the check. backend has new changes to display: 

 {{{ 
 -- same as above 

 -- # check again in 5 minutes 
 delay_recheck = 3600 

 if cached_mtime + delay_recheck > os.time() then return 0 end 

 -- we are behind the delayed recheck, check the cache ## same as usual 

 above 
 for i,v in ipairs(files) do 
   if file_mtime(v) > ( cached_mtime - delay_recheck) then return 1 end 
 end 

 return 0 
 }}} 

 And to tell the proxies inbetween not to check again in the next 5 minutes after they received this content use the setenv module and add some cache-control or expire headers. 


 == CML and Databases == 

 CML doesn't provide direct access to databases like MySQL or PostgreSQL. And to make sure that we don't get the request later: It will never get it. 

 There is a better/faster way to interface CML with Databases: MemCache 

 All you have to do is keeping the interesting information to decide if a page has to regenerated in a memcached storage.    Let's say what whenever you store a entry in the database you associate a Version-ID with it. The Version-ID is incremented as soon as you make a change to the resource. 

 This version Version-ID is now stored in the Database and in memcache at the same time. CML can now fetch the Version-ID, check if content has been generated for it, generate it if necessary. 

 {{{ 
 output_contenttype = "text/html" 

 key = md5(request["PATH_INFO"]) 
 version = memcache_get_long(key) 
 cwd = request["CWD"] 

 trigger_handler = "generate.php" 

 if version >= 0 then 
   output_include = { cwd .. key .. "-" .. version .. ".html" } 
   return 0 
 else 
   return 1 
 end 
 }}} 

 generate.php will have to: 

  * get PATH_INFO 
  * fetch information from database about it 
  * generate content for the page and write it to disk 
  * deliver it to the client 

 To interface the database with the memcached you can use a UDF: 

  * for [http://www.mysql.com/ MySQL] can get the mysql udf at [http://jan.kneschke.de/projects/mysql/udf/ jans mysql page] 
  * for [http://www.postgresql.org/ PostgreSQL] Sean Chittenden has written [http://people.freebsd.org/~seanc/pgmemcache/ pgmemcache] 

 In MySQL and the UDF you just do: 
 {{{ 
 BEGIN; 
 UPDATE content SET @v := (version = version + 1) WHERE id = <id>; 
 SELECT memcache_set("127.0.0.1:11211", <id>, @v); 
 COMMIT; 
 }}} 

 To check which version is currently used by the cache: 
 {{{ 
 SELECT memcache_get("127.0.0.1:11211", <id>); 
 }}}