Project

General

Profile

CacheMetaLanguage » History » Revision 12

Revision 11 (davojan, 2006-06-27 08:51) → Revision 12/14 (davojan, 2012-08-11 10:42)

h1. CML aka Cache Meta Language 



 h2. What Is It 


 CML tries to move the decision about a cache-hit and cache-miss for a dynamic website 
 out of the dynamic application, removing the need to start the application or dynamic 
 language at all. 

 Especially PHP is known to have a huge overhead before the script is started to be executed. 


 h2. How To Install 


 The language used by CML is LUA which you can find at http://www.lua.org/ 

 To get some background on how to write LUA code check out: 

 * http://lua-users.org/wiki/LuaAddons 
 * http://luaforge.net/ 


 h2. Benefits 


 The main benefit of CML is its performance.  

 A very simple benchmark showed: 

 * about 1000 req/s for the static 'output.html' which is generated output from the PHP script 
 * about    600 req/s if index.cml is called (cache-hit) 
 * about     50 req/s if index.php is called (cache-miss) 

 Using CML improves the performance for the tested page by a factor of 12, getting  
 near enough to the possible maximum of the static file transfer. 


 h2. Usage Patterns 


 http://www.lighttpd.net/ is using CML to reduce the load (even if the load is minimal). 

 The layout of the front page depends on a few files: 

 * content-1 
 * content-6 
 * the template /main.tmpl 

 If any of the files are modified the cached version of the page must change as well. 


 <pre> 

 output_contenttype = "text/html" 

 trigger_handler = "index.php" 

 -- this file updated by the trigger  
 output_include = { "output.html" } 

 docroot = request["DOCUMENT_ROOT"] 
 cwd = request["CWD"] 

 -- the dependencies 
 files = { cwd .. "content-1", cwd .. "content-6", docroot .. "main.tmpl" } 

 cached_mtime = file_mtime(cwd .. "output.html") 

 -- if one of the source files is newer than the generated files 
 -- call the trigger 
 for i,v in ipairs(files) do 
   if file_mtime(v) > cached_mtime then return 1 end 
 end 

 return 0 
 </pre> 



 h2. Delaying recheck 


 If you are building a news aggregator it is useful to be able to delay the rebuild of the cached content for a period of time, as you can assume that the news are not changing with each request. So instead of revalidating on each request you delay the validation check. 


 <pre> 

 -- same as above 

 -- check again in 5 minutes 
 delay_recheck = 3600 

 if cached_mtime + delay_recheck > os.time() then return 0 end 

 -- we are behind the delayed recheck, check the cache as usual 

 for i,v in ipairs(files) do 
   if file_mtime(v) > cached_mtime then return 1 end 
 end 

 return 0 
 </pre> 


 And to tell the proxies inbetween not to check again in the next 5 minutes after they received this content, use the setenv module and add some cache-control or expire headers. 



 h2. CML and Databases 


 CML does not provide direct access to databases like MySQL or PostgreSQL, and probably never will. 

 There is a better/faster way to interface CML with Databases: MemCache 

 All you have to do is keep the information needed to decide whether a page has to be regenerated in a memcached storage itself. Let's say that whenever you store an entry in the database, you associate a Version-ID with it. The Version-ID is incremented as soon as you make a change to the resource. 

 This Version-ID is now stored in the Database and in memcache at the same time. CML can now fetch the Version-ID, check if content already has been generated for it, and generate it if necessary. 


 <pre> 

 output_contenttype = "text/html" 

 content_key = md5(request["PATH_INFO"]) 
 version = memcache_get_long(content_key) 
 cwd = request["CWD"] 

 trigger_handler = "generate.php" 

 if version >= 0 then 
   output_include = { cwd .. content_key .. "-" .. version .. ".html" } 
   return 0 
 else 
   return 1 
 end 
 </pre> 


 generate.php will have to: 

 * get PATH_INFO 
 * fetch information from database about it 
 * generate content for the page and write it to disk 
 * deliver it to the client 

 To interface the database with the memcached you can use a UDF: 

 * for "MySQL":http://www.mysql.com/ can get the mysql udf at "jans mysql page":http://jan.kneschke.de/projects/mysql/udf/ 
 * for "PostgreSQL":http://www.postgresql.org/ Sean Chittenden has written "pgmemcache":http://people.freebsd.org/~seanc/pgmemcache/ 

 In MySQL and the UDF you just do: 

 <pre> 

 #!sql 
 BEGIN; 
 UPDATE content SET @v := (version = version + 1) WHERE id = <id>; 
 SELECT memcache_set("127.0.0.1:11211", <id>, @v); 
 COMMIT; 
 </pre> 


 To check which version is currently used by the cache: 

 <pre> 

 #!sql 
 SELECT memcache_get("127.0.0.1:11211", <id>); 
 </pre>