Learning how to fail (a Drupal CI odyssey)

by nick.schuch / 6 December 2013

Over the past year we have gone through many iterations of our CI system, 3 major versions to be exact (not counting minor updates). During this time we have focused on stability, speed and documentation. These are the major changes that we have found make a huge difference.

Fail fast

When we started running our CI system we had 2 questions around fail fast: 1) Do we want to get a list of all the passed and failed tests. or 2) Do we want automation to fail as quickly as possible so we know the build is broken. For the past year we went with the first option. Until our test suite became a very very large build (1hr to run a build). We have recently switched to failing the build fast. 1) Coding standards have failed - FAIL THE BUILD NOW! 2) Behat BDD tests have failed - FAIL THE BUILD NOW! 3) Simpletest tests have failed - FAIL THE BUILD NOW! This has resulted in quicker feedback especially for Coding standards fails. Who wants to run their whole test suite knowing the code isn't compliant.

Run smaller subset for PR

As long as your are running tests on HEAD (master branch) you can afford to run a smaller subset of tests at the time of Pull request (or when it's time to merge). We generally run this by: * Limit Behat BDD tests to a specific tag. * Use a seperate group in Simpeletest. * We still run phpcs to ensure coding standards integrety. This is very important.

SSD All the way

We were origially running on 10000rpm harddrives and in an attempt to test the waters of SSD we spun up a test host on a SSD provider. The results were as follow: 10000rpm Host HEAD - 54min PR - 28min SSD Host HEAD - 19min PR - 9min While there are most certainly other factors, the speed difference is undeniable. Our build times went down by over half! There are some good SSD hosting platforms available. Here are 2 that I like:

Standardise commands

Phing is an awesome build tool! It has helped us solve the problem of "Well I don't know what the CI suite is running". With Phing the developer has full control over the build process and the tasks it runs. This is how we are able to implement the "Run smaller subset for PR" tip. We have standardised everything from clearing the Drupal cache to syncing the DEV environment and setting up development modules. For more information please go see Boris Gordon's recorded talk from Drupalcon Prage 2013.

Don't run too many concurrent processes

This one is a juggling act and is implemented through trial and error. On one side we want our tests to run right away and on the other we don't want the host to hit max utilization and kill all the tests. My recommendation.... start with 4 concurrent builds and change as required.

Conclusion

Most of these changes revolve around workflow, so I strongly recommend sitting down with the team and workshopping out ideas and discussing the ones above. If you have opinions on this topic please leave comments down below on your success and/or failures.

Nick SchuchOperations Lead