W3TC Cache Preload for Version 0.9.2.5

Back in March 2012 I posted a set of feature enhancements and repairs to cache priming for the WordPress W3TC Total Cache plugin version 0.9.2.4. See my W3TC Cache Preload or Cache Prime posting.

A W3TC security fix was recently released. Unfortunately, the fix introduced a fault with basic mode disk caching. A cached page is now being prefixed with a nine byte ‘<?php /*’ text string. This text is inserted before the page expiry time and compressed page data is written. What this means is that the page expiry time is never picked up successfully when the cache page is referenced. The page is always thought to be expired and cached disk pages are never returned. There is also a problem with decoding of the returned data.

w3-total-cache/lib/W3/Cache/File.php

/**
  * Sets data
  *
  * @param string $key
  * @param mixed $var
  * @param integer $expire
  * @return boolean
  */
 function set($key, &$var, $expire = 0) {
.
.
.
    if ($fp) {
      if ($this->_locking) {
         @flock($fp, LOCK_EX);
      }
//    @fputs($fp, '<?php /* ');   // wrong place ...
      @fputs($fp, pack('L', $expire));
      @fputs($fp, '<?php /* ');   // it needs to go here ...
      @fputs($fp, @serialize($var));
      @fclose($fp);
.
.
.
 /**
  * Returns data
  *
  * @param string $key
  * @return mixed
  */
 function get($key) {
.
.
.
    if ($ftime > time() - $expire) {
      $data = '';
.
      while (!@feof($fp)) {
        $data .= @fread($fp, 4096);
      }
.                            
//    $var = substr($data, 9);   // and this is wrong ...
      $data = substr($data, 9);  // it needs to be this ...
      $var = @unserialize($data);
   }
.
.
.

 

A corrected version of W3TC 0.9.2.5 that also integrates my previous cache prime feature enhancements can be downloaded here.

Updated Elements

  • w3-total-cache/lib/W3/Config.php
  • w3-total-cache/lib/W3/PgCache.php
  • w3-total-cache/lib/W3/Plugin/PgCacheAdmin.php
  • w3-total-cache/lib/W3/Plugin/TotalCacheAdmin.php
  • w3-total-cache/lib/W3/Plugin/PgCache.php
  • w3-total-cache/lib/W3/Cache/Base.php
  • w3-total-cache/lib/W3/Cache/File.php
  • w3-total-cache/lib/W3/Cache/File/Generic.php
  • w3-total-cache/inc/functions/http.php
  • w3-total-cache/inc/options/pgcache.php

 

Update, January 6, 2013

Apparently, folks are actually using this feature?  This has prompted me to complete some outstanding items from my cache prime feature enhancement last March, 2012.

  1. The cache prime trace name is now configurable.  The trace resource is publicly accessible from your site URL.  You need a way to turn this on or off.
  2. Cache priming will now generate a new cached page before a previously cached page expires.  Previously, pages were not refreshed until after they had expired.  This change can reduce the probability that an uncached page reference occurs. Pages may now be generated within two update intervals before expected expiry.

The download file referenced above now contains these changes.  The recommended steps to install this feature enhancement on a working W3TC 0.9.2.4 or W3TC 0.9.2.5 implementation are as follows:

  1. Turn off cache priming.  In the Page Cache configuration page clear the checkbox to automatically prime the page cache.  Save all settings.
  2. Empty all cache.
  3. Deactivate the W3TC 0.9.2.4 plugin in WordPress.
  4. Update the plugin code.  Extract the zip file.  Replace the w3-total-cache plugin folder in WordPress with the w3-total-cache folder from the zip file.
  5. Activate the W3TC 0.9.2.5 plugin in WordPress.
  6. Empty all cache again.
  7. Review the Page Cache configuration.  Activate cache priming if desired.

w3tc_prime2

The default TTL (Time to Live) of page cache files is set via the “Expires header lifetime” field in the “HTML” section on Browser Cache Settings tab.  This is 3600 seconds, or one hour.

Specific page expiry times can be set in the Advanced section of the Page Cache configuration. W3TC limits maximum disk cache expiry time to 30 days, or 2592000 seconds.

The example below specifies that any page URL ending with “html” is to be cached for 30 days; any page URL that contain any two digit number between slashes is to be cached for 7 days; and any page URL that begins with “/v/” or “/tag/” or “/category/” is to be cached for 24 hours.  The first identified match in the listed order is used.  If no match is found then the default TTL is set.

w3tc_prime3

W3TC Cache Preload or Cache Prime

W3TC Total Cache is a WordPress plugin for page caching of WordPress sites.  Release version 0.9.2.4 has four faults in the cache preload function that prevent successful operation of this feature.  This post provides code corrections for these faults. One additional fault that can result in failure to process URL redirection for page requests is also fixed.

This post also introduces new features to ensure that pages are primed only when required; it provides an enhancement for visible monitoring and tracing cache preload activities; and it enables setting different cache retention times depending on the page URL.

This post applies to both basic mode and enhanced mode disk caching. Cache priming has been primarily tested for basic mode disk caching.

 

Index

This post has 4 pages.  Page navigation is at the bottom of each page.

Page 1.  Describes the known faults or bugs in W3TC Total Cache version 0.9.2.4 preload function.

Page 2.  Describes a feature enhancement to enable setting of specific page cache expiry times.

Page 3.  Describes a feature enhancement to allow administrator control and monitoring of prime activities.

Page 4.  Describes a feature enhancement to prime only uncached and valid pages.

 

 Download

A full download of W3TC Total Cache version 0.9.2.4 with corrected source code as shown is available here.

 

Introduction

Cache preload, also known as cache priming, is a caching feature to ensure that a page is always ready and stored in cache.  Cached pages improve site performance.  Preloading of cached pages ensures that any initial access to a page will respond as quickly as repeat access to a page.

W3TC cache preload operates as a background activity scheduled by the WordPress cron function.  The priming activity is invoked only after the cron time expires and the WordPress site is accessed.  This means that once the cron time expires the next prime activity runs only after the site is next accessed.

When the prime activity runs it will preload a specified number of pages.  Before preloading any pages it schedules a new instance of the preload activity on the WordPress cron timer.  Therefore, as the prime activity is executing while the next activity is waiting, it is necessary to ensure that the cron timer wait time is longer than the execution time of the prime activity.  Typically, the W3TC update interval or cron wait time should be in the order of 5 or 10 minutes, and the pages processed per interval should be around 10 or 20 so that the total page access time for all pages should be less than a minute or two.

 

Fault 1

In PgCacheAdmin.php the function ‘w3_url_format’ is called prior to initiating the HTTP request to load the page. The existing code does not include a reference to the location of this function for dynamic linking to the function definition. This means that the execution silently fails after the next periodic scheduling of the prime function. Page priming will appear to be operating, events will be queued, but no cache files will ever be loaded. The missing statement is shown in a different color.

 

w3-total-cache/lib/W3/Plugin/PgCacheAdmin.php

.
.
.
    /**
     * Prime cache
     *
     * @param integer $start
     * @return void
     */
    function prime($start = 0) {
.
.
.
        /**
         * Make HTTP requests and prime cache
         */
        require_once W3TC_INC_DIR . '/functions/http.php';
       	require_once W3TC_INC_DIR . '/functions/url.php';

        foreach ($queue as $url) {
            $url = w3_url_format($url, array('w3tc_preload' => 1));

            w3_http_get($url);
        }
.
.
.

 

Fault 2

The second problem with the released implementation is that the ‘w3_url_format’ function adds the query string ‘w3tc_preload = 1′ to the page URL. In many cases the W3TC implementation may be configured to not cache URLs with query strings, thus the requested page will never be cached.

One solution to this fault is to remove the query string from URL wherever it may cause incorrect behaviour. Page caching is determined in function PgCache.php, in two places. One is in function ‘process()’ where the page request is examined for caching, and the other is in function ‘_can_cache()’ where the URL is examined for a query string. The requred statements to eliminate the query string are shown in a different color.

w3-total-cache/lib/W3/PgCache.php

.
.
.
    /**
     * Do cache logic
     */
    function process() {
.
.
.
        if ($this->_caching && !$this->_enhanced_mode) {
            $cache = & $this->_get_cache();

            /**
             * Remove preload query string on URL to cache
             */
            $this->_request_uri = preg_replace('~[?&]w3tc_preload.*~i', '', $this->_request_uri);

            $mobile_group = $this->_get_mobile_group();
            $referrer_group = $this->_get_referrer_group();
            $encryption = $this->_get_encryption();
            $compression = $this->_get_compression();
            $raw = !$compression;
            $this->_page_key = $this->_get_page_key($this->_request_uri, $mobile_group, $referrer_group, $encryption, $compression);
.
.
.
    /**
     * Checks if can we do cache logic
     *
     * @return boolean
     */
    function _can_cache() {
.
.
.
        /**
         * Skip if there is query in the request uri
         */
        $uri = preg_replace('~[?&]w3tc_preload.*~i', '', $this->_request_uri);
        if (!$this->_config->get_boolean('pgcache.cache.query') && strstr($uri, '?') !== false) {
            $this->cache_reject_reason = 'Requested URI contains query';

            return false;
        }
.
.
.

 

Fault 3

The third fault with the released code relates to the ‘w3_http_get($url)’ function call shown above. This function is defined in w3-total-cache/inc/functions/http.php and it calls a more general ‘w3_http_request’ function that calls a WordPress function to get the requested page. However, the ‘w3_http_request’ function sets a ‘W3TC_POWERED_BY’ user agent which is subsequently recognized in ‘PgCache.php’ as a rejected user agent for caching.

What this means is that any request to cache a page will return an unprocessed or non-minimized page. This can reduce the effect of page caching and results in unprocessed text being stored for the cached page. This unprocessed text is not minimized and does not include W3TC information and is different than what is cached under normal operating conditions.

A solution to this problem is to ensure that the call to ‘w3_http_request’ overrides the ‘W3TC_POWERED_BY’ user agent. The corrected code is shown below.

 

w3-total-cache/lib/W3/Plugin/PgCacheAdmin.php

.
.
.
    /**
     * Prime cache
     *
     * @param integer $start
     * @return void
     */
    function prime($start = 0) {
.
.
.
        /**
         * Make HTTP requests and prime cache
         */
        require_once W3TC_INC_DIR . '/functions/http.php';
       	require_once W3TC_INC_DIR . '/functions/url.php';

        foreach ($queue as $url) {
            $url = w3_url_format($url, array('w3tc_preload' => 1));

            $result = w3_http_get($url, array('user-agent' => ''));

        }
.
.
.

 

Fault 4

The fourth fault relates to the ‘function prime($start = 0)’ parameter in the prime function shown above. The W3TC cache preload feature is designed to load sets of pages from URLs in a prioritized Google sitemap. The start parameter is a starting index into the sitemap list of URLs to prime. It is intended to identify the start of the next group of pages to be preloaded.

Due to the parameter omission in ‘w3-total-cache/lib/W3/Plugin/PgCache.php’ the start parameter is never being passed to the prime() function. Thus, by default, the start value is always being initialized to zero. This means that the prime function can never process all the required pages in the sitemap and if it actually ran as intended it would instead always reprocess only the first selected set of pages. The missing parameter is shown in a different color.

 

w3-total-cache/lib/W3/Plugin/PgCache.php

.
.
.
    /**
     * Prime cache
     *
     * @param integer $start
     * @return void
     */
    function prime($start = 0) {
        $this->get_admin()->prime($start);
    }
.
.
.

 

CURL Redirection Fault

Sites that are hosted on servers that use CURL when safe mode or open_basedir is enabled can experience URL redirection failure when trying to prime a page. Fortunately, a workaround for the CURL problem is known. This workaround must be included in the W3TC ‘w3_http_get()’ function to capture any WordPress redirection failure to recover from the problem. If this is not done then primed URLs in the sitemap that redirect may not successfully load. A WordPress ‘Too many redirects’ error may occur.

 

w3-total-cache/inc/functions/http.php

.
.
.
    /**
     * Sends HTTP GET request
     *
     * @param string $url
     * @param array $args
     * @return array|WP_Error
     */
    function w3_http_get($url, $args = array()) {
        $args = array_merge($args, array(
            'method' => 'GET'
        ));

        $result = w3_http_request($url, $args);

        // If server uses cURL and has open_basedir set then redirection may not work 
        if ( is_wp_error($result) && stripos($result->get_error_message(),'Too many redirects') !== false) { 
            if (!ini_get('safe_mode') && !ini_get('open_basedir')) { 
                return $result; 
            } 
            if ( function_exists( 'curl_init' ) && function_exists( 'curl_exec' ) ) { 
                $result = w3_curl($url, $args); 
            } 
        } 
        return $result; 
    } 

    /** 
     * Sends HTTP GET request with cURL 
     * 
     * @param string $url 
     * @param $args array 
     * @return array|WP_Error 
     */ 
    function w3_curl($url, $args = array()) { 
        $args = array_merge(array( 'redirection' => 5 ), $args); 
        $ch = curl_init($url); 
        curl_setopt ($ch, CURLOPT_URL, $url); 

        //follow on location problems 
        $syn = w3_curl_redir_exec($ch, $args); 
        curl_close($ch); 
        return $syn; 
    } 

    /** 
     * Redirects HTTP GET request with cURL when safe_mode or open_basedir is enabled 
     * 
     * @param string $url 
     * @param $args array 
     * @return array|WP_Error 
     * 
     * courtesy of http://au.php.net/manual/ro/function.curl-setopt.php#71313 
     */ 
    function w3_curl_redir_exec($ch, $args) {
	$defaults = array(
		'method' => 'GET', 'timeout' => 5,
		'redirection' => 5, 'httpversion' => '1.0'
	);

	$r = wp_parse_args( $args, $defaults );
	if ( $r['redirection']-- <= 0 ) {
		return new WP_Error('http_request_failed', __('W3 Too many redirects.'));
	}
	
	$timeout = (int) ceil( $r['timeout'] );
	curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
	curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
	curl_setopt($ch, CURLOPT_HEADER, true);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	$data = curl_exec($ch);
	
	list($header, $body) = explode("nr", $data, 2);
	$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
	
	if ($http_code == 301 || $http_code == 302) {
		$matches = array();
		preg_match('/Location:(.*?)n/', $header, $matches);
		$url = @parse_url(trim(array_pop($matches)));
		
		if (!$url) {
			//couldn't process the url to redirect to
			return new WP_Error('http_request_failed', __('W3 Redirect malformed URL'));
		}
		
		$last_url = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL));
		if (!$url['scheme'])
			$url['scheme'] = $last_url['scheme'];
		if (!$url['host'])
			$url['host'] = $last_url['host'];
		if (!$url['path'])
			$url['path'] = $last_url['path'];
		$new_url = $url['scheme'] . '://' . $url['host'] . $url['path'] . ($url['query']?'?'.$url['query']:'');
		
		curl_setopt($ch, CURLOPT_URL, $new_url);
		return w3_curl_redir_exec($ch,$r);
	}
	
	$response = array();
	$response['code'] = $http_code;
	$response['message'] = get_status_header_desc($response['code']);
	return array( 'headers' => array(), 'body' => $body, 'response' => $response, 'cookies' => array() );
    }
.
.
.