Counting associated records

Published on and tagged with cakephp  model

If you have a hasMany association, a common use case is that you want to list records of one table plus the number of associated records. For example you want to list the ten latest posts together with the number of comments each post has.

There are different ways how you can implement this. Probably the most common approaches are to use a counter cache or to use a sub query.

A counter cache is simply an additional counter field in the posts table (if we stay with the example of Post hasMany Comments). If a comment is added to a post, the counter is incremented. And it is decremented, if a comment is removed. A simple concept. You can find tutorials about using this concept with CakePHP on the blog of Robert Beekman (using the counter_cache behavior from the bakery) and on the blog of Christian Winther (using the built-in counter cache functionality from CakePHP 1.2beta).

The second approach is to use a sub query. It is quite straight-forward, the only “difficulty” is to bring the result into the format used by CakePHP. Here the code:

// in the Post model
function findAllAndCountComments($conditions) {
    $this->recursive = 0;
    $posts = $this->findAll($conditions, array('Post.*', "(select count(c.id) from comments as c where c.post_id = Post.id) as 'comment_count'"));
    $posts = $this->toCakeArrayFormat($posts);
			
    return $posts;
}

private function toCakeArrayFormat($posts) {
    $result = array();
			
    for ($i = 0; $i < sizeof($posts); $i++) {
        $result[$i]['Post'] = $posts[$i]['Post'];
        $result[$i]['Post']['comment_count'] = $posts[$i][0]['comment_count'];
    }
			
     return $result;
}

From a performance point of view this approach is not as good as the counter cache approach. It requires n+1 selects on the database side (maybe it gets internally optimized, I don’t know) whereas with a counter cache only one select is needed. On the other hand it doesn’t depend on some “magic” functionality. But in the end it depends on your preferences which approach you will use.

12 comments baked

  • Simon Brüchner

    Small performance improvement:

    instead of:
    for ($i = 0; $i < sizeof($posts); $i++) {

    perhaps this:
    $nr = sizeof($posts);
    for ($i = 0; $i < $nr; $i++) {

  • Ivo

    The subquery you use in your example will probably not get optimized internally, because the Post.id differs for every record. Instead you should use a group by clause. This is a big performance upgrade when you compare it to using subqueries.

    $fields = array(‘Post.*’, ‘COUNT(Comment.id) as comment_count’);
    $conditions = ‘GROUP BY Post.id’;
    $this->findAll($conditions, $fields);

    Don’t forget though to do this recursively (otherwise the table Comment will not be recognized) OR recursive = 0 and bind the Comment model on the fly with a hasmany association.

  • Michał Bachowski

    Hi,

    I agree with Ivo – approach with GROUP BY is much faster.
    I usually write more complex queries by myself – I don`t try to fit the query to CakePHP manner.

    BTW: if you use PostgreSQL, you can write:
    ‘COUNT(Comment.id) as Post__comment_count’

    And CakePHP will parse the column to own array format automaticaly.

    Regards

  • GreyCells

    To use GROUP BY in a $conditions array:

    $conditions = array(‘Post.id’ => 42, ‘1=1 GROUP BY Post.id);

    I think the same applies to HAVING (i.e. ‘1=1 HAVING count(Comment.id) > 5)

  • cakebaker

    @all: Thanks for your comments!

    @Simon: Yes, your code snippet is more efficient than what I wrote.

    @Ivo, Michal, GreyCells: To make it work with GROUP BY I have to use a manual query as CakePHP no longer uses a join:

    $this->execute('SELECT Post.*, COUNT(Comment.id) AS comment_count FROM posts AS Post LEFT OUTER JOIN comments AS Comment ON Comment.post_id = Post.id GROUP BY Post.id');
    
  • Brandon P

    @cakebaker:
    What is your opinion on using this kind of functionality at the database level (stored procedures, callbacks, etc)? Seems to me that is the most appropriate place.

  • Brandon P

    Sorry, callbacks = triggers :) Got CakePHP on the brain

  • cakebaker

    @Brandon: Good question. From a logical point of view it is the appropriate place, and in a company I worked some years ago we used stored procedures for almost everything (with MS SQL server). But in the meantime I don’t use stored procedures anymore, as it is much easier to implement this logic on the model level instead of the database level. You don’t have to learn a new language, and you can use the tools you are used to (at least with SQL server it was a pain to write stored procedures *g*).

  • Will

    or,

    echo count($post[‘ApprovedComment’])

    ?

  • cakebaker

    @Will: Hm, it seems like your comment got eaten by WordPress :|

  • neudor

    Hey there. There is another very useful way to retrieve the number of associated records.
    I did this for Posts and Comments, but in my case their names are Threads and Messages

    $this->Thread->recursive = -1;
    $threads = $this->paginate('Thread');
    $this->set('threads', $threads);
        
    $thread_ids = array();
    for($i=0; $i<count($threads); $i++) {
        $thread_ids[] = $threads[$i]['Thread']['id'];
    }
        
    $this->Message->recursive = -1;
    $messages = $this->Message->find('all', array(
        'conditions' => array(
            'Message.created <=' => date("Y-m-d H:i:s"),
            'thread_id' => $thread_ids
        ),
        'group' => 'Message.thread_id',
        'fields' => array('thread_id', 'COUNT(*) as `count`')
    ));
    $count = array();
    for($i=0; $i<count($messages); $i++) {
        $count[$messages[$i]['Message']['thread_id']] = $messages[$i]['0']['count'];
    }
        
    for($i=0; $i<count($threads); $i++)  {
        $id = $threads[$i]['Thread']['id'];
        $threads[$i]['Thread']['messages_count'] = $count[$id];
    }
        
    debug($threads);

    This approach requires only two queries in any case.

  • cakebaker

    @neudor: Yes, this is a more efficient approach than what I described in the article. Thanks for mentioning it!

Bake a comment




(for code please use <code>...</code> [no escaping necessary])

© daniel hofstetter. Licensed under a Creative Commons License