Posts tagged with: process

DevOps for the Agency

Managing the Process and Tools of Multivariant Software Development

The Problem

DevOps at an agency or consultancy provides challenges not normally encountered at a software product company. Typical product and platform companies have a narrow band of operating systems and technologies they use, whereas agencies are often developing for every conceivable platform, language, and device. This can make the normal explanations of DevOps seem too server or ‘IT’ focused without a lot of help on how to manage order in the chaos matrix. The goal of this document is to explain some of the best practices available to your team(s) and give examples of how to use them in real life and some tools that can help with that.

A normal week in an agency’s development team can include deployments in multiple languages to Windows Server and Linux (of various flavors) running on hosted virtual machines, clustered in the cloud (typically Amazon Web Services, Azure or Rackspace), or even to bare metal. Without any process this is utter chaos, with the bare minimum of process you can mostly avoid egg in the face public failures- but are your servers secure? Can you deploy your projects to production reliably if your rock star engineer hits it big with the next Angry Birds and decides to retire to his own personal island? Probably not, yet.

Applying a good process to your current chaos doesn’t have to be that hard or painful. Material such as Visible Ops and the Information Technology Infrastructure Library’s Change Management guidance apply more broadly to change management at an organizational level including (but not limited to) configuration and hardware changes for servers and other broad topics. They don’t cover the intricacies that are involved with having different project types with vastly different technology stacks and clients and projects with different management methodologies. Process starting with the Software Development and Quality Assurance teams can radiate outwards and eventually encompass every team involved in software development projects at your company, even it if it is only a heightened awareness of the process.

Crawl, Walk, Run

A word of note: bite off meaningful chunks of change and work on instituting those before moving on to the next. These can be pushed from the software development team outward, depending on the organizational structure this would grow to include technical/business analysts, QA and project management. Start with the processes your team can affect and grow the process change to include more as buy-in grows. Don’t try to go from nothing to everything in a month. Your team will likely wind up creating process documentation, sample artifacts (like deployment checklists), and finally your team members and other teams will need to be trained and informed.

Version Control

If your team is not using a version control system (like Subversion or Git), then stop reading this right now and begin the process of integrating a version control system immediately. Without a version control system, no other steps in regulating code and configuration change are possible or meaningful.

The Environmental Landscape

For the purposes of this conversation we’ll assume there are some basic good practices being followed today; like having a development, QA and production environment. Because of the kind of projects I’ve been working on for the last half decade, we’ve typically had one more environment, staging. This is because production is often (at very least a slight majority of the time) not under the control of our agency but the client’s IT or development team. Staging becomes our in-house production and client acceptance environment.

Separation of Environments

It is exceedingly important that each environment (Development, QA, etc.) has it’s own resources. Environments should not share database tables, but could reasonable share database servers. For instance, Development and QA could use the same server, but different databases or schemas, so that the data and content are separated. The same thing with file systems, load balancers and application servers and the like.

The Development Environment as a Test Bed

The development environment should be the first environment deployed to. A continuous integration tool like Jenkins or Travis is fantastic for automatically building the project, running unit tests, and even deploying to the development environment at regular intervals or upon every version control check in. Development servers are for testing integration, server configuration and flushing out the deployment process. In practice, you should assume that development servers are always running the absolute latest code and are full of bugs in progress.

Every single change – code, configuration or otherwise – should be tested in this environment before ever moving up the chain to another environment.

Quality Assurance and Testing Environments

QA should be where completed bugs get tested by the quality assurance team and passed or sent back for further development. Deployments to QA should be at a much slower pace than to development environments but still could be as often as multiple times a day during periods of rapid development and testing (such as the night before a project is due).

Deployments to QA should come with a build number, version control revision number and a deployment log of all the bug tickets that are being addressed as well as notes on any other material changes.

Material changes will always include server configuration changes, database schema modifications and even changes to the deployment process itself. Technical leads, project managers and the QA team should all be made aware and sign off before deployments to this environment. In the absence of a staging environment, the QA environment will also be the environment that client review and approval should happen at.

Upon final approvals all around, this is the time where your team will want to tag the revision in version control, assign a version number to the build artifact and save it in ‘escrow’ somewhere. Any additionaly documentation such as staging deployment checklists or other process documentation should be filled out prior to deploying to staging.

Staging as a Mirror (and a Safety Net)

In the agency world, production is often not entirely under your team’s control, which can make it awfully difficult to troubleshoot. When production is a load balanced, highly scalable multi tier environment and the development environment is a simple, single virtual machine hosting the application and database; trouble shooting development does not necessarily fix production. Even if production isn’t locked behind the client’s IT and your team has unfettered access, a bad deployment in production is as bad as it gets. The whole world (or at very least, the client) will know that you’ve botched the job at the last minute. It’s a terrible feeling if you haven’t had the privilege of it yet.

Staging should be as close of an approximation as is feasible. Instead of a 10 node cluster like production, maybe just two VMs will suffice, for example. If production will be in Amazon Web Services, staging should be up there as well. If the persistence layer will be on a separate device, separate it in staging as well. Certain things like authentication and third party services may require staging to be publicly visible which is a decision your team will have to make. We’re looking for as close as feasible, not necessarily a direct copy. There may need to be some configuration the would have staging use mock services instead of the real thing- these are all things that need to be planned with the team while defining the feature that consumes the services.

Deploying to staging is the final practice before public failure and should be treated just like production, including restricted access and planned deployments and upgrades. It should require approval from the technical lead, QA and the project manager to verify that all the bugs that were to be addressed in the current build have been addressed and no regression was detected in the QA environment. This process should be tightly gated and auditable. Who deployed it, what was deployed and when it was deployed should all be recorded for posterity.

This will be the final pre-production deployment and is where client approval and sign off will happen if this environment exists in your process. Upon final sign off in this environment your team will generate the production deployment checklist and any other documentation that needs to be created in addition to the staging documentation that was created for this particular deployment.

Production – The Final Frontier

This is where client relationships can be destroyed in an otherwise healthy project. Nothing will shake your client’s confidence more than a lack of communication or bad production deploys. Bad production deploys and overall project delivery will always boil down to a simple fact of poor change management. On some projects the poor change management and communication are the fault of the client’s own IT department, other times it will be the fault of your team. It will always be your team’s problem, however. As much as 80%1 of the time spent fixing production delivery problems will be spent finding the root cause. That means that if proper change management procedures are being followed your team can have production running up to five times faster than they otherwise would have been able to.

When it comes time for production delivery, the team should have a well laid out battle plan that will include scheduled maintenance times, documentation and notes for every server configuration change, database scripts and code changes being deployed as well as the issue ticket references (the QA release notes will often suffice). If the environment is complicated with load balancers and multiple nodes, an order of operations should be established and documented and followed to the letter.

Process Flow

An Auditable Series of Steps

The appropriate amount undocumented change in any environment is zero. As dials and knobs are getting twisted in development environments it is important to record the changes being made and why. Application server and database configuration changes can have fantastic and devastating changes and are just as important as the new code and content being pushed. The most important part of each deployment is that exactly what is being deployed is documented and that someone is signing off on the deployment.

Deploying to QA

Going into QA should be a gated step that requires an okay from a technical lead and a QA lead (they may still be testing the last build!) along with some generic release notes. If your team uses a tool like Jira, Trac or other ticket tracking systems, the release notes for this step may simply be a matter of moving all the fixed tickets over to an “In QA” status or something similar.

Deploying to Staging

After QA has passed all the tickets and verified no regression has taken place, the technical lead and project manager should schedule a staging deployment. This should run just like a production deployment. Put things in maintenance mode, restart services, application pool restarts- the whole nine yards. This is your team’s final dress rehearsal and the last chance to validate the deployment process privately. If things don’t go well, document every issue and get a root cause. This will often times result in going back to development environments to test proposed changes and proceeding through QA and back into staging. With an automated build and deployment system this is an almost painless experience and the changes in destination environments is nearly immaterial.

After the deployment to staging, QA will want to do a final regression pass in that environment and any client sign off will need to happen. Prior to client review your team will be looking for obvious issues like content and data differences between environments (development and QA may be full of lorem ipsum and fake content after all) as well as testing that the proposed changes are working in staging as they did in QA. This is where each technology and team has their own flavor of doing things. Given the level of QA effort already provided, it would seem redundant to utilize the same amount of human time in staging- otherwise, why have staging AND QA environments? This is where things like remote unit and integration tests (like Arquillian for Java) can give a quick ‘green light’ to functionality in each new environment.

After that, version numbers need to be generated, version control source needs to be tagged or branched and if possible, the actual binary deployment should be compressed and held as an artifact somewhere. Build/Continuous Integration systems actually have functionality that generates build numbers, tags source and saves artifacts- all from the user interface. The process doesn’t need to be elaborate, just well documented and strictly enforced.

Deploying to Production

Deploying to production environments should follow the same (documented) steps as deploying to staging environments. If there are differences in configuration or environments they should be explicitly listed in a production deployment checklist that has been prepared with all the changes from the last production release.

Production Deployment Checklist

The production (and staging) deployment checklist should be a combination of two things: a standard deployment checklist that reflects all the steps that should be necessary for every production deployment (like where and how to copy the deployment artifacts), and any additional steps that are needed for this particular deployment based on the documentation from the QA and Staging environment deployments and release notes.

For example, database schema changes won’t necessarily occur every time, but when they do they should be completely scripted and ready to run, the deployment checklist would say when to run the scripts, which scripts to run, and detail the expected results.

The creation of this document will force technical leads to stop and think about what differences may exist between the staging deployment and this one as well as any other teams (like the client’s IT team) that will need to be wrapped in to the plan. It will allow for timing and order of operations and minimize the chance of exceeding the maintenance window that was scheduled.

Useful Tools

Tools will vary based upon the core technologies of your company, but if your team is producing a number of projects in various languages there are many tools that exist that can be strung together to build and deploy everything from .NET and Java projects to Python and PHP. In a follow up post I will discuss how I’ve been using these tools in real life to automatically spin up local environments for developers, development, QA, staging and production server environments; one click build and deploy, artifact management and source tagging and a whole host of other automated tasks that can keep the noise to a minimum.

You will want to check back regularly for edits, updates and links to follow ups!

Jenkins

Jenkins is an application that monitors executions of repeated jobs. It can continuously build and and test software projects as well as store build artifacts, generate build numbers and store test results. It also stores project health details and can notify team members when the build breaks.

Apache Ant

Ant is a Java library and command-line tool used for driving build processes that are defined in build files and targets. It is a very versatile tool that can compile software projects, run unit tests, copy (and move) files and do just about anything you would manually do when building a project.

Node.js

Node has a variety of cross platform plugins (like SCP and SSH) that can be very useful for scripting automatic remote deployments, increasing repeatability and decreasing maintenance time.

Vagrant

Vagrant aids in creating and configuring lightweight, reproducible, and portable development environments. Get new developers spun up into a project quickly.

Chef and Puppet

Chef and Puppet are automation platforms for server/virtual environments. They allow your team to define the requirements of an environment and automatically set up and configure new machines without (much or any) user intervention. They can be used to document the target environments and manage any changes that need to be made. Both of these tools exist in the same space and are oustanding in their own right- your team will very likely choose either Chef or Puppet and not both.

Docker

Docker is a newer (as of 2013) entrant into the DevOps space that attempts to treat the environment, application code and configuration as single artifact (a container) to be deployed. It is being used for automating packaging and deployments to the creation of PAAS environments.

Footnotes

1. Behr, K. Kim, G. Spafford, G. The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps Information Technology Process Institute