YAML in a cake? No, thank you!

Published on and tagged with cakephp  testing

While implementing fixture support in the testsuite the question arised which format should be used for the fixtures? The first answer was: YAML, of course. It is used in Ruby on Rails, so it cannot be bad ;-) Hm. Let’s have a look at a simple YAML example:

// urls.yml
  id: 1
  name: CakePHP website
  url: http://www.cakephp.org

  id: 2
  name: CakePHP manual
  url: http://manual.cakephp.org

It looks nice. But there is one “problem”. It violates the DRY (Don’t repeat yourself) principle: the column names are repeated in each record.

So I decided to use a different approach: plain PHP. It is simple: each fixture is a class, and each record is a function. The YAML example from above rewritten as a class looks like:

class Urls
    function cakephp()
        return array(1, 'CakePHP website', 'http://www.cakephp.org');

    function manual()
        return array(2, 'CakePHP manual', 'http://manual.cakephp.org');

Simple, isn’t it?

9 comments baked

  • Patrice

    And where do you define the order of the columns? Does it take the order in the database?

    Not sure I like your approach because there is NO inline documentation. And a change in the table layout will probably require a re-write of every single fixture for that table. Unless you always add columns at the end – which I don’t.

  • Evan Broder

    Agreed with Patrice. I know that apparently the Cake Overlords decided they didn’t want YAML in Cake, but I think that repeating column names is much better than having no column names at all.

  • scott lewis

    “No Magic Numbers” is a far, far more important rule than DRY. Your Cake implementation has the magic numbers 0, 1 and 2. To eliminate them, we need to use an associative array and define meaningful names for the keys. Which would eliminate the one ‘advantage’ of the PHP method and leave us with a more verbose syntax than the YAML option.

    Premature optimization is the root of all evil. — Donald Knuth

  • Bret Kuhns

    I also agree with the comments above. The PHP solution uses more syntax and it seems less flexible. Plus the repetition of the PHP syntax itself seems less DRY than simply repeating column names.

  • cakebaker

    @all: Thanks for your feedback. You are right, column names are useful, so I added an array to the class:

    var $columns = array(‘id’, ‘name’, ‘url’);

    DRY was not the only reason I have chosen plain PHP over YAML:
    – a PHP solution is more consistent with cake
    – there is IDE support for PHP, but not for YAML (thanks to the templates of my IDE I am faster writing a method than a YAML record)
    – YAML looks ugly if you mix it with code

  • scott lewis

    The PHP solution is more consistent at the cost of being more verbose:




    (excluding whitespace). Also note that the PHP version requires the class wrapper and the columns array, neither of which are required for the YAML version.

    Lack of IDE support just means you need a better IDE. :)

    And I think code tends to look ugly next to YAML, but that is just an aesthetic choice.

    Trade-offs, trade-offs, trade-offs. Everything is trade-offs. :)

  • cakebaker

    @scott lewis: Yeah, you are right, in software engineering there are always trade-offs you have to make.

  • Felix Geisendörfer

    I think I like YAML a lot, I’m using it for my unit testing data right now and I’m very pleased with the efficiency.

    Regarding your problem with it’s DRY’ness, you could do something like this:

    – id
    – name
    – url

    – 1
    – CakePHP website

    – 2
    – CakePHP manual

    But I don’t even see why you’d want to do that. When you need to change table fields, you can run a quick search & replace, which your IDE should handle, even for YAML, and your done. The entire point of YAML is that it’s not about DRY’ness but about readability for humas, which is exactly what you want when dealing with test data.

    Regarding your other issues with it:
    – Yes, PHP is more consistent with Cake. But only because Cake goes with Conventions over Configuration, otherwise PHP would maybe never have become the language used for configuration files. But we are not talking about configuration files here, we are talking about Domain data. And data is usally stored in the database when working CakePHP. Since database isn’t the first choice for test fixtures, I think YAML wins over php as a form of data storage.

    – Another concept of YAML is, that you don’t *need* an IDE to be reasonably able to edit it. But you do need one for php, so again, in my oppinion YAML wins.

    – “YAML looks ugly if you mix it with code”: Your sql database looks ugly when you mix it with code too? Or do you mean putting business logic in your yaml files? I wouldn’t see why that would be needed. Different fixtures, different files.

  • cakebaker

    @Felix: The point with DRY’ness is not the updating of the field names, but the creation of the records. I am lazy, so I don’t like to repeat the column names again and again (which is not necessary with your posted example). Readability is IMHO not that important, as the audience for the test data are developers. So the test data must be readable for developers, but not necessary for everyone else.

    The “problem” with fixtures is that they sometimes do not store plain data. Fixtures (as in Railsland) can contain loops or function calls, so in my opinion it is wrong to store fixtures in a data format. MAybe the format shown in the post is not perfect and can be improved, but I think it is the direction to go.

    I think the concept of YAML that you do not need an IDE to reasonably edit the files is not really important for developers. I don’t know how you work, but if I write tests, I am always in my IDE.

    Well, in Rails you can mix your YAML files with Ruby snippets to allow for dynamic fixtures, so you can do something like (the example is in PHP):

    password: <?php echo md5(‘thepassword’); ?>

© daniel hofstetter. Licensed under a Creative Commons License