W3TC Cache Preload or Cache Prime

W3TC Total Cache is a WordPress plugin for page caching of WordPress sites.  Release version 0.9.2.4 has four faults in the cache preload function that prevent successful operation of this feature.  This post provides code corrections for these faults. One additional fault that can result in failure to process URL redirection for page requests is also fixed.

This post also introduces new features to ensure that pages are primed only when required; it provides an enhancement for visible monitoring and tracing cache preload activities; and it enables setting different cache retention times depending on the page URL.

This post applies to both basic mode and enhanced mode disk caching. Cache priming has been primarily tested for basic mode disk caching.

 

Index

This post has 4 pages.  Page navigation is at the bottom of each page.

Page 1.  Describes the known faults or bugs in W3TC Total Cache version 0.9.2.4 preload function.

Page 2.  Describes a feature enhancement to enable setting of specific page cache expiry times.

Page 3.  Describes a feature enhancement to allow administrator control and monitoring of prime activities.

Page 4.  Describes a feature enhancement to prime only uncached and valid pages.

 

 Download

A full download of W3TC Total Cache version 0.9.2.4 with corrected source code as shown is available here.

 

Introduction

Cache preload, also known as cache priming, is a caching feature to ensure that a page is always ready and stored in cache.  Cached pages improve site performance.  Preloading of cached pages ensures that any initial access to a page will respond as quickly as repeat access to a page.

W3TC cache preload operates as a background activity scheduled by the WordPress cron function.  The priming activity is invoked only after the cron time expires and the WordPress site is accessed.  This means that once the cron time expires the next prime activity runs only after the site is next accessed.

When the prime activity runs it will preload a specified number of pages.  Before preloading any pages it schedules a new instance of the preload activity on the WordPress cron timer.  Therefore, as the prime activity is executing while the next activity is waiting, it is necessary to ensure that the cron timer wait time is longer than the execution time of the prime activity.  Typically, the W3TC update interval or cron wait time should be in the order of 5 or 10 minutes, and the pages processed per interval should be around 10 or 20 so that the total page access time for all pages should be less than a minute or two.

 

Fault 1

In PgCacheAdmin.php the function ‘w3_url_format’ is called prior to initiating the HTTP request to load the page. The existing code does not include a reference to the location of this function for dynamic linking to the function definition. This means that the execution silently fails after the next periodic scheduling of the prime function. Page priming will appear to be operating, events will be queued, but no cache files will ever be loaded. The missing statement is shown in a different color.

 

w3-total-cache/lib/W3/Plugin/PgCacheAdmin.php

.
.
.
    /**
     * Prime cache
     *
     * @param integer $start
     * @return void
     */
    function prime($start = 0) {
.
.
.
        /**
         * Make HTTP requests and prime cache
         */
        require_once W3TC_INC_DIR . '/functions/http.php';
       	require_once W3TC_INC_DIR . '/functions/url.php';

        foreach ($queue as $url) {
            $url = w3_url_format($url, array('w3tc_preload' => 1));

            w3_http_get($url);
        }
.
.
.

 

Fault 2

The second problem with the released implementation is that the ‘w3_url_format’ function adds the query string ‘w3tc_preload = 1′ to the page URL. In many cases the W3TC implementation may be configured to not cache URLs with query strings, thus the requested page will never be cached.

One solution to this fault is to remove the query string from URL wherever it may cause incorrect behaviour. Page caching is determined in function PgCache.php, in two places. One is in function ‘process()’ where the page request is examined for caching, and the other is in function ‘_can_cache()’ where the URL is examined for a query string. The requred statements to eliminate the query string are shown in a different color.

w3-total-cache/lib/W3/PgCache.php

.
.
.
    /**
     * Do cache logic
     */
    function process() {
.
.
.
        if ($this->_caching && !$this->_enhanced_mode) {
            $cache = & $this->_get_cache();

            /**
             * Remove preload query string on URL to cache
             */
            $this->_request_uri = preg_replace('~[?&]w3tc_preload.*~i', '', $this->_request_uri);

            $mobile_group = $this->_get_mobile_group();
            $referrer_group = $this->_get_referrer_group();
            $encryption = $this->_get_encryption();
            $compression = $this->_get_compression();
            $raw = !$compression;
            $this->_page_key = $this->_get_page_key($this->_request_uri, $mobile_group, $referrer_group, $encryption, $compression);
.
.
.
    /**
     * Checks if can we do cache logic
     *
     * @return boolean
     */
    function _can_cache() {
.
.
.
        /**
         * Skip if there is query in the request uri
         */
        $uri = preg_replace('~[?&]w3tc_preload.*~i', '', $this->_request_uri);
        if (!$this->_config->get_boolean('pgcache.cache.query') && strstr($uri, '?') !== false) {
            $this->cache_reject_reason = 'Requested URI contains query';

            return false;
        }
.
.
.

 

Fault 3

The third fault with the released code relates to the ‘w3_http_get($url)’ function call shown above. This function is defined in w3-total-cache/inc/functions/http.php and it calls a more general ‘w3_http_request’ function that calls a WordPress function to get the requested page. However, the ‘w3_http_request’ function sets a ‘W3TC_POWERED_BY’ user agent which is subsequently recognized in ‘PgCache.php’ as a rejected user agent for caching.

What this means is that any request to cache a page will return an unprocessed or non-minimized page. This can reduce the effect of page caching and results in unprocessed text being stored for the cached page. This unprocessed text is not minimized and does not include W3TC information and is different than what is cached under normal operating conditions.

A solution to this problem is to ensure that the call to ‘w3_http_request’ overrides the ‘W3TC_POWERED_BY’ user agent. The corrected code is shown below.

 

w3-total-cache/lib/W3/Plugin/PgCacheAdmin.php

.
.
.
    /**
     * Prime cache
     *
     * @param integer $start
     * @return void
     */
    function prime($start = 0) {
.
.
.
        /**
         * Make HTTP requests and prime cache
         */
        require_once W3TC_INC_DIR . '/functions/http.php';
       	require_once W3TC_INC_DIR . '/functions/url.php';

        foreach ($queue as $url) {
            $url = w3_url_format($url, array('w3tc_preload' => 1));

            $result = w3_http_get($url, array('user-agent' => ''));

        }
.
.
.

 

Fault 4

The fourth fault relates to the ‘function prime($start = 0)’ parameter in the prime function shown above. The W3TC cache preload feature is designed to load sets of pages from URLs in a prioritized Google sitemap. The start parameter is a starting index into the sitemap list of URLs to prime. It is intended to identify the start of the next group of pages to be preloaded.

Due to the parameter omission in ‘w3-total-cache/lib/W3/Plugin/PgCache.php’ the start parameter is never being passed to the prime() function. Thus, by default, the start value is always being initialized to zero. This means that the prime function can never process all the required pages in the sitemap and if it actually ran as intended it would instead always reprocess only the first selected set of pages. The missing parameter is shown in a different color.

 

w3-total-cache/lib/W3/Plugin/PgCache.php

.
.
.
    /**
     * Prime cache
     *
     * @param integer $start
     * @return void
     */
    function prime($start = 0) {
        $this->get_admin()->prime($start);
    }
.
.
.

 

CURL Redirection Fault

Sites that are hosted on servers that use CURL when safe mode or open_basedir is enabled can experience URL redirection failure when trying to prime a page. Fortunately, a workaround for the CURL problem is known. This workaround must be included in the W3TC ‘w3_http_get()’ function to capture any WordPress redirection failure to recover from the problem. If this is not done then primed URLs in the sitemap that redirect may not successfully load. A WordPress ‘Too many redirects’ error may occur.

 

w3-total-cache/inc/functions/http.php

.
.
.
    /**
     * Sends HTTP GET request
     *
     * @param string $url
     * @param array $args
     * @return array|WP_Error
     */
    function w3_http_get($url, $args = array()) {
        $args = array_merge($args, array(
            'method' => 'GET'
        ));

        $result = w3_http_request($url, $args);

        // If server uses cURL and has open_basedir set then redirection may not work 
        if ( is_wp_error($result) && stripos($result->get_error_message(),'Too many redirects') !== false) { 
            if (!ini_get('safe_mode') && !ini_get('open_basedir')) { 
                return $result; 
            } 
            if ( function_exists( 'curl_init' ) && function_exists( 'curl_exec' ) ) { 
                $result = w3_curl($url, $args); 
            } 
        } 
        return $result; 
    } 

    /** 
     * Sends HTTP GET request with cURL 
     * 
     * @param string $url 
     * @param $args array 
     * @return array|WP_Error 
     */ 
    function w3_curl($url, $args = array()) { 
        $args = array_merge(array( 'redirection' => 5 ), $args); 
        $ch = curl_init($url); 
        curl_setopt ($ch, CURLOPT_URL, $url); 

        //follow on location problems 
        $syn = w3_curl_redir_exec($ch, $args); 
        curl_close($ch); 
        return $syn; 
    } 

    /** 
     * Redirects HTTP GET request with cURL when safe_mode or open_basedir is enabled 
     * 
     * @param string $url 
     * @param $args array 
     * @return array|WP_Error 
     * 
     * courtesy of http://au.php.net/manual/ro/function.curl-setopt.php#71313 
     */ 
    function w3_curl_redir_exec($ch, $args) {
	$defaults = array(
		'method' => 'GET', 'timeout' => 5,
		'redirection' => 5, 'httpversion' => '1.0'
	);

	$r = wp_parse_args( $args, $defaults );
	if ( $r['redirection']-- <= 0 ) {
		return new WP_Error('http_request_failed', __('W3 Too many redirects.'));
	}
	
	$timeout = (int) ceil( $r['timeout'] );
	curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout );
	curl_setopt($ch, CURLOPT_TIMEOUT, $timeout );
	curl_setopt($ch, CURLOPT_HEADER, true);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	$data = curl_exec($ch);
	
	list($header, $body) = explode("nr", $data, 2);
	$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
	
	if ($http_code == 301 || $http_code == 302) {
		$matches = array();
		preg_match('/Location:(.*?)n/', $header, $matches);
		$url = @parse_url(trim(array_pop($matches)));
		
		if (!$url) {
			//couldn't process the url to redirect to
			return new WP_Error('http_request_failed', __('W3 Redirect malformed URL'));
		}
		
		$last_url = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL));
		if (!$url['scheme'])
			$url['scheme'] = $last_url['scheme'];
		if (!$url['host'])
			$url['host'] = $last_url['host'];
		if (!$url['path'])
			$url['path'] = $last_url['path'];
		$new_url = $url['scheme'] . '://' . $url['host'] . $url['path'] . ($url['query']?'?'.$url['query']:'');
		
		curl_setopt($ch, CURLOPT_URL, $new_url);
		return w3_curl_redir_exec($ch,$r);
	}
	
	$response = array();
	$response['code'] = $http_code;
	$response['message'] = get_status_header_desc($response['code']);
	return array( 'headers' => array(), 'body' => $body, 'response' => $response, 'cookies' => array() );
    }
.
.
.

2 thoughts on “W3TC Cache Preload or Cache Prime

  1. Excellent, thanks for the hard but good work !

    I was wondering when they will ever fix Fault 1 especially, as it generates an error in my server error logs and filling it up …

  2. Thank you so much, this really improved the website, page speed went from 52 to 65 on desktop, so the improvement was greatly needed.

    I had to turn it off though, the lightbox effect didn’t function on the single portfolio pages (using the Angular theme and their premium support is no help, just said to use the W3TC plugin)

    W3TC doesn’t work correctly in WP 3.2.2, selecting the minify and page cache options breaks it the same way your version does, but with no speed improvement. also using better WP minify, google libraries and lazy loading, but not much more improvement. this theme doesn’t work with super cache and other plugins haven’t been updated, like smush.it, things would be helpful at this time.

    Yours is the only one that really helped, its a real shame. Would you have any idea why it does that? Any help would be greatly appreciated on that. But I wanted to let you know that your efforts made a major improvement and I’m sure would benefit others.
    Great work! Thank you

Comments are closed.