WordPress

Q Are transients garbage collected?

This question got me thinking Transient RSS feeds in wp_options not removed automatically?

Transients are supposed to expire and be deleted. However the only way I see this handled is when transient is expired and requested, then it is deleted during request.

What if transient is expired but never requested after that? From description in Codex I thought that some kind of garbage collection is implied. Now I am not so sure and can't find any code that performs such.

So will it just be stuck in database forever?

A

If someone can't show me otherwise it seems that transients are not garbage collected after all. What makes it worse is that unlike options they are not guaranteed to be stored in database. So there is no reliable way to fetch list of all transients to check them for expiration.

Some makeshift code to do garbage collection if database is used for storage:

add_action( 'wp_scheduled_delete', 'delete_expired_db_transients' );

function delete_expired_db_transients() {

    global $wpdb, $_wp_using_ext_object_cache;

    if( $_wp_using_ext_object_cache )
        return;

    $time = isset ( $_SERVER['REQUEST_TIME'] ) ? (int)$_SERVER['REQUEST_TIME'] : time() ;
    $expired = $wpdb->get_col( "SELECT option_name FROM {$wpdb->options} WHERE option_name LIKE '_transient_timeout%' AND option_value < {$time};" );

    foreach( $expired as $transient ) {

        $key = str_replace('_transient_timeout_', '', $transient);
        delete_transient($key);
    }
}
A

Moving some of the comments from the discussion into an answer, with re-wording and re-formatting..

Basically, what it comes down to is that unless you have a super extreme case, they don't really need to be "garbage collected". If you never fetch them, then it doesn't matter if they're there or not.

See, transients are stored in the options table by default. In a base install, the options table will have maybe 100 entries in it. Each transient adds two more entries, but even if you have thousands, they don't affect the site speed, since they're not autoloaded.

On startup, WordPress loads the options into memory, but it only loads options that have their autoload flag turned on. Transients don't get this, and so don't get loaded into memory. Only transients that get actually used later will incur a cost.

From the database's perspective, the options table has indexes on both the option Id and the option name. Transients are always loaded based on the name (key), and so the lookups for them are always simple selects on a single unique key value. Thus the lookup is O(log(n)) and is super fast. With a Big-O of log(n), you'd have to get into the millions and millions of rows before it became noticable. Frankly, the overhead in the setup and teardown of the query, along with the actual data transfer, is way longer. The query itself runs in essentially zero-time by comparison. So simply having extra unused rows doesn't affect anything but using extra disk space.

Indexing in databases is one of those deep-read kind of ideas that doesn't make sense to people who haven't actually understood what's going on behind the scenes. Databases are designed for fast data retrieval, from the ground up, and can handle this sort of thing without issues. This is a pretty good read: http://en.wikipedia.org/wiki/Index_(database)

Now, cleanup in the most obvious way (calling SQL DELETE on them) doesn't actually delete them from the database. It just removes them from the index and marks the row as "deleted". Again, this is just how databases work. To actually clear up the disk space, you have to then continue on and do an OPTIMIZE TABLE afterwards, and this is not a fast operation. It takes time. Probably more time than it's worth. It's probably not enough to give you a savings in CPU time, in total.

If you have some case that is causing a continual insertion of new transients that are not being used, then you need to find the underlying problem instead. What is inserting these transients? Are they using a changing or mutating key? If so, then the plugin or code causing this should be fixed to, basically, not do that. That will be more helpful, because it's likely that the code that isn't creating them properly also isn't retrieving them, and thus doing more work than it has to do.

On the other hand, there may be a case where transients are being created for something like every post. This may indeed be perfectly acceptable. I do this myself in SFC, to store incoming comments from Facebook. Each post has a potential transient associated with it, which means two extra rows per post. If you have 10k posts, you'll have 20k rows in the options table (eventually). This isn't bad or slow, because again, there's very little difference between 100 rows and 20,000 rows as far as databases really care. It's all indexed. It's fast as heck. Sub-sub-milliseconds.

When you start getting into millions of rows, then I'd be worried. When the options table size increases above hundreds of megabytes, then I'd be concerned enough to take a closer look. But generally speaking, this isn't an issue except for extreme cases. It's certainly not an issue for anything smaller than something like a large news site, with hundreds of thousands of posts. And for any site large enough for it to be a problem, you should be using an external object cache of some sort, and in that case, the transients get automagically stored there instead of in the database.

A

Otto - I couldn't disagree with you more. The issue is that eventually with all those transients, the size of the table becomes ridiculous. It doesn't take millions of rows to bog down. I'm currently dealing with an options table that has over 130k rows, and hangs regularly. Because the value field is a large text type, even looking for only the "autoload" rows becomes a nightmare of performance. Those value fields are stored separately from the rest of the row data. Even though it's logically part of the same table, joins must happen in order to pull up the rows you want. Joins that now take forever because the data you need is spread all over the place on disk. Profiling (using jet profiler for mysql) has proven this.

Adding auto-load to the clustered key might help solve this problem. Clustering on Autoload Desc, ID ASC for example, would allow all the autoload rows to bunch together first on disk. Even still I think you're looking at a huge strain from a DB perspective.

Personally I think the design of this system is wack. The options table seems to have turned into a general catch-all for a lot of things. That's fine if the value field is small enough to be included on the same page as the rest of the rowdata, and can be indexed effectively. Unfortunately that's not the case. Whoever designed this needs to go back to DB101 class.