W3TC Cache Preload or Cache Prime

Feature Enhancement – Preload Only Uncached Pages

The current implementation of the W3TC preload function attempts to load all pages from the sitemap without considering whether the page is currently cached or not. This means that the preload will attempt to load every file. For a large site of thousands of pages this can mean that it takes hours or days to fully process all pages. For an initial prime of all pages this may be acceptable, but this is clearly not satisfactory for continuously refreshing expired primed pages that may have a short TTL time period.

A better solution is to prime only pages that are invalid, or not in the cache. We also do not want to prime pages that cannot be cached due to rejected name conditions or pages that contain query strings when these pages are not cachable.

The solution shown here redevelops the parse_sitemap() function and prime() function in TotalCacheAdmin.php. Each sitemap page is tested for caching. An HTTP request to fetch the page is only issued if the page requires caching. The maximum number of pages per interval is now the maximum number of HTTP requests to issue. When a site contains many cached pages many sitemap pages may be tested before this limit is reached. Therefore, this solution manages its execution profile by limiting the testing to a finite number of pages on each iteration event. The maximum count is based upon a fixed limit of 100 times the configured maximum number of pages per interval.

This solution also maintains a special cached page ‘w3tc_primetrace’ where event history and trace information is written. A fixed limit of one day of history information is retained. This history can be viewed from the Cache Preload section of the W3TC administrator menus or by accessing this page URL from your site. For example, see http://www.wmiles.com/w3tc_primetrace to view the trace file of this site.

Kisekae World has a one hour global TTL for all cache files. The home page and all other pages are refreshed on an hourly basis. WordPress category lists and tag lists have a one day cache expiry time. All Gallery directories are set to a one day cache expiry time. All WordPress posts have a one week cache expiry time. All Gallery photo pages have a one month cache expiry time.

 

w3-total-cache/lib/W3/Plugin/PgCacheAdmin.php

A new prime() function and parse_sitemap() function.

.
.
.
    /** 
     * Prime cache 
     * 
     * @param integer $start * @return void 
     */ 
    function prime($start = 0) { 
        $start = (int) $start; 

        /** 
         * Don't start cache prime if queues are still scheduled 
         */ 
        if ($start == 0) { 
            $crons = _get_cron_array(); 
            foreach ($crons as $timestamp => $hooks) { 
                foreach ($hooks as $hook => $keys) { 
                    foreach ($keys as $key => $data) { 
                        if ($hook == 'w3_pgcache_prime' && count($data['args'])) { 
                            return; 
                        } 
                    } 
                } 
            } 
        } 

        $interval = $this->_config->get_integer('pgcache.prime.interval'); 
        $limit = $this->_config->get_integer('pgcache.prime.limit'); 
        $sitemap = $this->_config->get_string('pgcache.prime.sitemap'); 

        /** 
         * Parse XML sitemap 
         */ 
        $urls = $this->parse_sitemap($sitemap,$start); 
        if ($start > 0) { 
            wp_schedule_single_event(time() + $interval, 'w3_pgcache_prime', array( $start )); 
        } 

        /** 
         * Make HTTP requests and prime cache 
         */ 
        if (count($urls) > 0) { 
            require_once W3TC_INC_DIR . '/functions/http.php'; 
            require_once W3TC_INC_DIR . '/functions/url.php'; 

            $data = ''; 
            $queue = array_slice($urls, 0, $limit); 
            foreach ($queue as $url) { 
                $url = w3_url_format($url, array('w3tc_preload' => 1)); 
                $result = w3_http_get($url, array('user-agent' => '')); 
                $msg = (is_wp_error($result)) ? $result->get_error_message() : $result['response']['code']; 
                $parse_url = @parse_url($url); 
                $page_key = ($parse_url && isset($parse_url['path'])) ? this->_get_page_key($parse_url['path']) : 'unknown'; 
                $data .= sprintf("rn%s", date('Y-m-d H:i:s') . ' Prime: ' . $url . ' Key: ' . $page_key . ' Result: ' . $msg); 
            } 

            /** 
             * Trace prime cache 
             */ 
            $cache = & $this->_get_cache(); 
            $page_key = $this->_get_page_key('w3tc_primetrace'); 
            $tracedata = $cache->get($page_key); 
            if ($tracedata) { 
                $data = $tracedata . $data; 
            } 
            $cache->set($page_key, $data, 86400); 
        } 
    } 

    /** 
     * Parses sitemap 
     * 
     * @param string $url 
     * @return array 
     */ 
    function parse_sitemap($url,&$start) { 
        require_once W3TC_INC_DIR . '/functions/http.php'; 

        $urls = array(); 
        $response = w3_http_get($url); 
        $msg = (is_wp_error($response)) ? $response->get_error_message() : $response['response']['code']; 
        if (!is_wp_error($response) && $response['response']['code'] == 200) { 
            $url_matches = null; 
            $sitemap_matches = null; 
            if (preg_match_all('~<sitemap>(.*?)</sitemap>~is', $response['body'], $sitemap_matches)) {
                $loc_matches = null; 
                foreach ($sitemap_matches[1] as $sitemap_match) { 
                    if (preg_match('~<loc>(.*?)</loc>~is', $sitemap_match, $loc_matches)) { 
                        $loc = trim($loc_matches[1]); 
                        if ($loc) { 
                            $urls = array_merge($urls, $this->parse_sitemap($loc)); 
                        } 
                    } 
                } 
            } elseif (preg_match_all('~<url>(.*?)</url>~is', $response['body'], $url_matches)) { 
                $locs = array(); 
                $loc_matches = null; 
                $priority_matches = null; 
                foreach ($url_matches[1] as $url_match) { 
                    $loc = ''; 
                    $priority = 0; 
                    if (preg_match('~<loc>(.*?)</loc>~is', $url_match, $loc_matches)) { 
                        $loc = trim($loc_matches[1]); 
                    } 
                    if (preg_match('~<priority>(.*?)</priority>~is', $url_match, $priority_matches)) {
                        $priority = (double) trim($priority_matches[1]); 
                    } 
                    if ($loc && $priority) { 
                        $locs[$loc] = $priority; 
                    } 
                } 

                arsort($locs); 
                $urls = array_keys($locs); 
            } 
        } 

        /** 
         * Return only uncached URLs that are not rejected from caching 
         */ 
        $data = ''; 
        if ($start == 0) { 
            $data .= sprintf("rn%s", date('Y-m-d H:i:s') . ' BEGIN Sitemap ' . $url . ' Result: ' . $msg); 
        } 

        $prime = array(); 
        $cache = & $this->_get_cache(); 
        $queue = array_slice($urls, $start); 
        $queuestart = $start; 
        $start = $probecount = 0 ; 
        $reject_uri = $this->_config->get_array('pgcache.reject.uri'); 
        $reject_uri = array_map('w3_parse_path', $reject_uri); 

        foreach ($queue as $key => $url) { 
            if ($this->_check_request_uri($reject_uri, $url)) { 
                $parse_url = @parse_url($url); 
                if ($parse_url && isset($parse_url['path'])) { 
                    $urlprime = rtrim($parse_url['path'],'/'); 
                    $urlprime = '/' . ltrim($urlprime,'/'); 
                    $page_key = $this->_get_page_key($urlprime); 
                    $valid = $cache->is_valid($page_key); 

                    if (!$valid) { 
                        $prime[] = $url; 
                        if (count($prime) > $this->_config->get_integer('pgcache.prime.limit')) { 
                            $start = $queuestart + $key; 
                            break; 
                        } 
                    } 
                } 
            } 

            /** 
             * Probe a maximum 100 times the prime limit 
             */ 
            $probecount++; 
            if ($probecount >= 1 + (100 * $this->_config->get_integer('pgcache.prime.limit'))) { 
                $start = $queuestart + $key; 
                break; 
            }
        } 

        if (!is_wp_error($response) && $response['response']['code'] == 200) { 
            if ($start == 0) { 
                $data .= sprintf("rn%s", date('Y-m-d H:i:s') . ' END Sitemap ' . $queuestart . ' to ' . count($urls) . ' entries processed'); 
            } else { 
                $data .= sprintf("rn%s", date('Y-m-d H:i:s') . ' SCAN Sitemap ' . $queuestart . ' to ' . $start); 
            } 
        } else { 
            if ($queuestart != 0) { 
                $data .= sprintf("rn%s", date('Y-m-d H:i:s') . ' SCAN Sitemap ' . $queuestart . ' Result: ' . $msg); 
            } 
        } 

        /** 
         * Trace sitemap probe event sequence 
         */ 
        $cache = & $this->_get_cache(); 
        $page_key = $this->_get_page_key('w3tc_primetrace'); 
        $tracedata = $cache->get($page_key); 
        if ($tracedata) { 
            $priorday = "rn" . date('Y-m-d',time()); 
            $history = (strpos($tracedata,$priorday) !== false) ? strstr($tracedata,$priorday) : $tracedata; 
            $data = $history . $data; 
        } 
        $cache->set($page_key, $data, 86400); 

        return $prime; 
    } 

    /** 
     * Checks request URI 
     * 
     * @param string $request_uri 
     * @return boolean 
     */ 
    function _check_request_uri($reject_uri, $request_uri) { 
        $auto_reject_uri = array( 'wp-login', 'wp-register' ); 
        foreach ($auto_reject_uri as $uri) { 
            if (strstr($request_uri, $uri) !== false) { 
                return false; 
            } 
        } 

        foreach ($reject_uri as $expr) { 
            $expr = trim($expr); 
            if ($expr != '' && preg_match('~' . $expr . '~i', $request_uri)) { 
                return false; 
            } 
        } 

        if (!$this->_config->get_boolean('pgcache.cache.query') && strstr($request_uri, '?') !== false) { 
            return false; 
        } 

        return true; 
    }
.
.
.
    /**
     * Check if WPSC rules exists
     *
     * @return boolean
     */
    function check_rules_wpsc() {
        $path = w3_get_pgcache_rules_core_path();

        return (($data = @file_get_contents($path)) && w3_has_rules(w3_clean_rules($data), W3TC_MARKER_BEGIN_PGCACHE_WPSC, W3TC_MARKER_END_PGCACHE_WPSC));
    }

    /** 
     * Returns cache object 
     * 
     * @return W3_Cache_Base 
     */ 
    function &_get_cache() { 
        static $cache = array(); 
        if (!isset($cache[0])) { 
            $engine = $this->_config->get_string('pgcache.engine'); 
            switch ($engine) { 
                case 'memcached': 
                    $engineConfig = array( 'servers' => $this->_config->get_array('pgcache.memcached.servers'),
                        'persistant' => $this->_config->get_boolean('pgcache.memcached.persistant') ); 
                    break; 
                case 'file': 
                    $engineConfig = array( 'cache_dir' => W3TC_CACHE_FILE_PGCACHE_DIR, 
                        'locking' => $this->_config->get_boolean('pgcache.file.locking'), 
                        'flush_timelimit' => $this->_config->get_integer('timelimit.cache_flush') ); 
                    break; 
                case 'file_generic': 
                    $engineConfig = array( 'exclude' => array( '.htaccess' ), 
                        'expire' => $this->_lifetime, 
                        'cache_dir' => W3TC_CACHE_FILE_PGCACHE_DIR, 
                        'locking' => $this->_config->get_boolean('pgcache.file.locking'), 
                        'flush_timelimit' => $this->_config->get_integer('timelimit.cache_flush'),
			'expire_uri' => $this->_config->get_array('pgcache.expire.uri')
                    break; 
                default: 
                    $engineConfig = array(); 
            } 

            require_once W3TC_LIB_W3_DIR . '/Cache.php'; 
            @$cache[0] = & W3_Cache::instance($engine, $engineConfig); 
        } 

        return $cache[0]; 
    } 

    /** 
     * Returns page key 
     * 
     * @param string $request_uri 
     * @param string $mobile_group 
     * @param string $referrer_group 
     * @param string $encryption 
     * @param string $compression 
     * @return string 
     */ 
    function _get_page_key($request_uri) { 
        // replace fragment 
        $key = preg_replace('~#.*$~', '', $request_uri); 

        $enhanced_mode = ($this->_config->get_string('pgcache.engine') == 'file_generic');
        if ($enhanced_mode) { 
            // URL decode 
            $key = urldecode($key); 
            // replace double slashes 
            $key = preg_replace('~[/\]+~', '/', $key); 
            // replace query string 
            $key = preg_replace('~?.*$~', '', $key); 
            // replace index.php 
            $key = str_replace('/index.php', '/', $key); 
            // trim slash 
            $key = ltrim($key, '/'); 

            if ($key && substr($key, -1) != '/') { 
                $key .= '/'; 
            } 

            $key .= '_index'; 
        } else { 
            $key = sprintf('w3tc_%s_page_%s', w3_get_host_id(), md5($key)); 
        } 

        if ($enhanced_mode) { 
            /** 
             * Append HTML extension 
             */ 
            $key .= '.html'; 
        } 

        /** 
         * Allow to modify page key by W3TC plugins 
         */ 
        $key = w3tc_do_action('w3tc_pgcache_cache_key', $key); 
        return $key; 
    }

 

w3-total-cache/lib/W3/Cache/Base.php

To define a new cache validity function in the base class

.
.
.
    /**
     * Returns data
     *
     * @abstract
     * @param string $key
     * @return mixed
     */
    function get($key) {
        return false;
    }

    /**
     * Returns cache validity
     *
     * @abstract
     * @param string $key
     * @return boolean
     */
    function is_valid($key) {
        return false;
    }
.
.
.

 

w3-total-cache/lib/W3/Cache/File.php

The implementation of the is_valid() function for basic mode disk caching

.
.
.
    /**
     * Returns data
     *
     * @param string $key
     * @return mixed
     */
    function get($key) {
        $var = false;
        $path = $this->_cache_dir . DIRECTORY_SEPARATOR . $this->_get_path($key);

        if (is_readable($path)) {
            $ftime = @filemtime($path);

            if ($ftime) {
                $fp = @fopen($path, 'rb');

                if ($fp) {
                    if ($this->_locking) {
                        @flock($fp, LOCK_SH);
                    }

                    $expires = @fread($fp, 4);

                    if ($expires !== false) {
                        list(, $expire) = @unpack('L', $expires);
                        $expire = ($expire && $expire <= W3TC_CACHE_FILE_EXPIRE_MAX ? $expire : W3TC_CACHE_FILE_EXPIRE_MAX);

                        if ($ftime > time() - $expire) {
                            $data = '';

                            while (!@feof($fp)) {
                                $data .= @fread($fp, 4096);
                            }

                            $var = @unserialize($data);
                        }
                    }

                    if ($this->_locking) {
                        @flock($fp, LOCK_UN);
                    }

                    @fclose($fp);
                }
            }
        }

        return $var;
    }

    /**
     * Returns cache validity
     *
     * @param string $key
     * @return mixed
     */
    function is_valid($key) {
        $var = false;
        $path = $this->_cache_dir . DIRECTORY_SEPARATOR . $this->_get_path($key);

        if (is_readable($path)) {
            $ftime = @filemtime($path);

            if ($ftime) {
                $fp = @fopen($path, 'rb');

                if ($fp) {
                    $expires = @fread($fp, 4);

                    if ($expires !== false) {
                        list(, $expire) = @unpack('L', $expires);
                        $expire = ($expire && $expire <= W3TC_CACHE_FILE_EXPIRE_MAX ? $expire : W3TC_CACHE_FILE_EXPIRE_MAX);

                        if ($ftime > time() - $expire) {
                            $var = true;
                        }
                    }

                    @fclose($fp);
                }
            }
        }

        return $var;
    }
.
.
.

 

w3-total-cache/lib/W3/Cache/File/Generic.php

The implementation of the is_valid() function for enhanced mode disk caching

.
.
.
    /**
     * Returns data
     *
     * @param string $key
     * @return string
     */
    function get($key) {
        $var = false;
        $path = $this->_cache_dir . '/' . $this->_get_path($key);

        if (is_readable($path)) {
            $ftime = @filemtime($path);

            if ($ftime && $ftime > (time() - $this->_get_expiry($key))) {
                $fp = @fopen($path, 'r');

                if ($fp) {
                    if ($this->_locking) {
                        @flock($fp, LOCK_SH);
                    }

                    $var = '';

                    while (!@feof($fp)) {
                        $var .= @fread($fp, 4096);
                    }

                    @fclose($fp);

                    if ($this->_locking) {
                        @flock($fp, LOCK_UN);
                    }
                }
            }
        }

        return $var;
    }

    /**
     * Returns cache validity
     *
     * @param string $key
     * @return boolean
     */
    function is_valid($key) {
        $var = false;
        $path = $this->_cache_dir . '/' . $this->_get_path($key);

        if (is_readable($path)) {
            $ftime = @filemtime($path);

            if ($ftime && $ftime > (time() - $this->_get_expiry($key))) {
                $var = true;
            }
        }

        return $var;
    }

    /**
     * Returns cache file path by key
     *
     * @param string $key
     * @return string
     */
    function _get_path($key) {
        return $key;
    }

    /**
     * Returns cache file expiry time by key
     *
     * @param string $key
     * @return int
     */
    function _get_expiry($key) {
	$expires = $this->_expire;
	$expire_uri = $this->_expire_uri;
				
	foreach ($expire_uri as $expr) {
	    $args = preg_split("/:/", trim($expr), 2);
	    $expr = (count($args) > 1) ? $args[1] : '';
	    if ($expr != '' && preg_match('~' . $expr . '~i', $key)) {
		$expires = (int) $args[0];
		break;
	    }
	}
		
	return $expires;
    }

2 thoughts on “W3TC Cache Preload or Cache Prime

  1. Excellent, thanks for the hard but good work !

    I was wondering when they will ever fix Fault 1 especially, as it generates an error in my server error logs and filling it up …

  2. Thank you so much, this really improved the website, page speed went from 52 to 65 on desktop, so the improvement was greatly needed.

    I had to turn it off though, the lightbox effect didn’t function on the single portfolio pages (using the Angular theme and their premium support is no help, just said to use the W3TC plugin)

    W3TC doesn’t work correctly in WP 3.2.2, selecting the minify and page cache options breaks it the same way your version does, but with no speed improvement. also using better WP minify, google libraries and lazy loading, but not much more improvement. this theme doesn’t work with super cache and other plugins haven’t been updated, like smush.it, things would be helpful at this time.

    Yours is the only one that really helped, its a real shame. Would you have any idea why it does that? Any help would be greatly appreciated on that. But I wanted to let you know that your efforts made a major improvement and I’m sure would benefit others.
    Great work! Thank you

Comments are closed.