Project

General

Profile

CacheMetaLanguage » History » Revision 7

Revision 6 (jan, 2005-07-18 09:05) → Revision 7/14 (conny, 2006-02-07 10:38)

= CML aka Cache Meta Language = 

 == What Is It == 

 CML tries to move the decision about a cache-hit and cache-miss for a dynamic website 
 out of the dynamic application, removing the need to start the application or dynamic 
 language at all. 

 Especially PHP is known know to have a huge overhead before the script is started to be executed. 

 == How To Install == 

 The language used by CML is LUA which you can find at http://www.lua.org/ 

 To The get some background on how to write LUA code check out: 

  * http://lua-users.org/wiki/LuaAddons 
  * http://luaforge.net/ 

 == Benefits Benifits == 

 The main benefit benifit of CML is its performance.  

 A very simple benchmark showed: 

  * about 1000 req/s for the static 'output.html' which is generated output from the PHP script 
  * about    600 req/s if index.cml is called (cache-hit) 
  * about     50 req/s if index.php is called (cache-miss) 

 Using CML improves the performance for the tested page by a factor of 12, getting  
 near enough to the possible maximum of the static file transfer. 

 == Usage Patterns == 

 http://www.lighttpd.net/ is using CML to reduce the load (even if the load is minimal). 

 The layout of the front page depends on a few files: 

  * content-1 
  * content-6 
  * the template /main.tmpl 

 If any one of the files are modified gets changed the cached version of the page must change as well. has to be changed too. 

 {{{ 
 output_contenttype = "text/html" 

 trigger_handler = "index.php" 

 -- this file updated by the trigger  
 output_include = { "output.html" } 

 docroot = request["DOCUMENT_ROOT"] 
 cwd = request["CWD"] 

 -- the dependencies 
 files = { cwd .. "content-1", cwd .. "content-6", docroot .. "main.tmpl" } 

 cached_mtime = file_mtime(cwd .. "output.html") 

 -- if one of the source files is newer than the generated files 
 -- call the trigger 
 for i,v in ipairs(files) do 
   if file_mtime(v) > cached_mtime then return 1 end 
 end 

 return 0 
 }}} 

 == Delaying recheck == 

 If you are building a news aggregator it is useful usefull to be able to delay the rebuild of the cached content for a period of time, some seconds as you can assume that the news are not changing with each request. So instead Instead of revalidating on each request you just delay the validation check. 

 {{{ 
 -- same as above 

 -- check again in 5 minutes 
 delay_recheck = 3600 

 if cached_mtime + delay_recheck > os.time() then return 0 end 

 -- we are behind the delayed recheck, check the cache as usual 

 for i,v in ipairs(files) do 
   if file_mtime(v) > cached_mtime then return 1 end 
 end 

 return 0 
 }}} 

 And to tell the proxies inbetween not to check again in the next 5 minutes after they received this content, content use the setenv module and add some cache-control or expire headers. 


 == CML and Databases == 

 CML does not doesn't provide direct access to databases like MySQL or PostgreSQL, and probably PostgreSQL. And to make sure that we don't get the request later: It will never will. get it. 

 There is a better/faster way to interface CML with Databases: MemCache 

 All you have to do is keep keeping the interesting information needed to decide whether if a page has to be regenerated in a memcached storage itself. storage.    Let's say that what whenever you store an a entry in the database, database you associate a Version-ID with it. The Version-ID is incremented as soon as you make a change to the resource. 

 This version Version-ID is now stored in the Database and in memcache at the same time. CML can now fetch the Version-ID, check if content already has been generated for it, and generate it if necessary. 

 {{{ 
 output_contenttype = "text/html" 

 content_key key = md5(request["PATH_INFO"]) 
 version = memcache_get_long(content_key) memcache_get_long(key) 
 cwd = request["CWD"] 

 trigger_handler = "generate.php" 

 if version >= 0 then 
   output_include = { cwd .. content_key key .. "-" .. version .. ".html" } 
   return 0 
 else 
   return 1 
 end 
 }}} 

 generate.php will have to: 

  * get PATH_INFO 
  * fetch information from database about it 
  * generate content for the page and write it to disk 
  * deliver it to the client 

 To interface the database with the memcached you can use a UDF: 

  * for [http://www.mysql.com/ MySQL] can get the mysql udf at [http://jan.kneschke.de/projects/mysql/udf/ jans mysql page] 
  * for [http://www.postgresql.org/ PostgreSQL] Sean Chittenden has written [http://people.freebsd.org/~seanc/pgmemcache/ pgmemcache] 

 In MySQL and the UDF you just do: 
 {{{ 
 BEGIN; 
 UPDATE content SET @v := (version = version + 1) WHERE id = <id>; 
 SELECT memcache_set("127.0.0.1:11211", <id>, @v); 
 COMMIT; 
 }}} 

 To check which version is currently used by the cache: 
 {{{ 
 SELECT memcache_get("127.0.0.1:11211", <id>); 
 }}}