If you have a hasMany association, a common use case is that you want to list records of one table plus the number of associated records. For example you want to list the ten latest posts together with the number of comments each post has.
There are different ways how you can implement this. Probably the most common approaches are to use a counter cache or to use a sub query.
A counter cache is simply an additional counter field in the posts table (if we stay with the example of Post hasMany Comments). If a comment is added to a post, the counter is incremented. And it is decremented, if a comment is removed. A simple concept. You can find tutorials about using this concept with CakePHP on the blog of Robert Beekman (using the counter_cache behavior from the bakery) and on the blog of Christian Winther (using the built-in counter cache functionality from CakePHP 1.2beta).
The second approach is to use a sub query. It is quite straight-forward, the only “difficulty” is to bring the result into the format used by CakePHP. Here the code:
// in the Post model
function findAllAndCountComments($conditions) {
$this->recursive = 0;
$posts = $this->findAll($conditions, array('Post.*', "(select count(c.id) from comments as c where c.post_id = Post.id) as 'comment_count'"));
$posts = $this->toCakeArrayFormat($posts);
return $posts;
}
private function toCakeArrayFormat($posts) {
$result = array();
for ($i = 0; $i < sizeof($posts); $i++) {
$result[$i]['Post'] = $posts[$i]['Post'];
$result[$i]['Post']['comment_count'] = $posts[$i][0]['comment_count'];
}
return $result;
}
From a performance point of view this approach is not as good as the counter cache approach. It requires n+1 selects on the database side (maybe it gets internally optimized, I don’t know) whereas with a counter cache only one select is needed. On the other hand it doesn’t depend on some “magic” functionality. But in the end it depends on your preferences which approach you will use.

Small performance improvement:
instead of:
for ($i = 0; $i < sizeof($posts); $i++) {
perhaps this:
$nr = sizeof($posts);
for ($i = 0; $i < $nr; $i++) {
The subquery you use in your example will probably not get optimized internally, because the Post.id differs for every record. Instead you should use a group by clause. This is a big performance upgrade when you compare it to using subqueries.
$fields = array(’Post.*’, ‘COUNT(Comment.id) as comment_count’);
$conditions = ‘GROUP BY Post.id’;
$this->findAll($conditions, $fields);
Don’t forget though to do this recursively (otherwise the table Comment will not be recognized) OR recursive = 0 and bind the Comment model on the fly with a hasmany association.
Hi,
I agree with Ivo – approach with GROUP BY is much faster.
I usually write more complex queries by myself – I don`t try to fit the query to CakePHP manner.
BTW: if you use PostgreSQL, you can write:
‘COUNT(Comment.id) as Post__comment_count’
And CakePHP will parse the column to own array format automaticaly.
Regards
To use GROUP BY in a $conditions array:
$conditions = array(’Post.id’ => 42, ‘1=1 GROUP BY Post.id);
I think the same applies to HAVING (i.e. ‘1=1 HAVING count(Comment.id) > 5)
@all: Thanks for your comments!
@Simon: Yes, your code snippet is more efficient than what I wrote.
@Ivo, Michal, GreyCells: To make it work with GROUP BY I have to use a manual query as CakePHP no longer uses a join:
$this->execute('SELECT Post.*, COUNT(Comment.id) AS comment_count FROM posts AS Post LEFT OUTER JOIN comments AS Comment ON Comment.post_id = Post.id GROUP BY Post.id');@cakebaker:
What is your opinion on using this kind of functionality at the database level (stored procedures, callbacks, etc)? Seems to me that is the most appropriate place.
Sorry, callbacks = triggers :) Got CakePHP on the brain
@Brandon: Good question. From a logical point of view it is the appropriate place, and in a company I worked some years ago we used stored procedures for almost everything (with MS SQL server). But in the meantime I don’t use stored procedures anymore, as it is much easier to implement this logic on the model level instead of the database level. You don’t have to learn a new language, and you can use the tools you are used to (at least with SQL server it was a pain to write stored procedures *g*).