Why We’re Migrating From Chef to Octopus
A couple of months ago I started leading a team that was using Chef to manage our deployment infrastructure. We are now migrating from Chef to Octopus Deploy. I thought it might be interesting to explain why. What follows is my opinion and my best understanding of the facts. As always, do your own thinking. π
Chef
Chef takes the DevOps mantra “infrastructure as code” to heart. Everything created for consumption by Chef is code. This is a strength in many ways, but sometimes a weakness.
Architecture
Chef consists of a server which contains metadata about your infrastructure, “cookbooks” which contain instructions for configuring machines and applications, and a client application which does the work of truing up a machine to its representation in the Chef Server.
In Chef, the target of work is a node. When chef_client
runs it’s goal is to make the node it’s running on look like Chef Server expects it to. It has little awareness of applications except insofar as you layer that into your cookbook design. chef_client
pulls the data it needs from the chef server in order to do its work. This means that in order to update a machine you have to have access to that machine as well as rights to run chef_client
. This also means that it takes a bit more effort to be able to redeploy a single application to the node in the event something goes wrong. If you are running single-purpose nodes that’s probably not an issue for you.
It bears mentioning that Chef was designed for Linux first. Windows is an afterthought and it shows. There was no built-in support for common Windows constructs such as IIS and Windows Services at the time of our implementation. (My understanding from a coworker is that Chef now has built-in support for windows services.)
Chef has an api, but it’s nearly impossible to access using standard tools due to a convoluted authentication model. Chef as a CLI called knife
which wraps up most api behavior. It should be stated that you cannot initiate activities through the api–that is all done by running chef_client
on the target machines. knife
can query the api for the machines you want and initiate chef_client
runs for you however.
Developer Experience
The developer experience with Chef is all about editing cookbooks. A “cookbook” is a collection of “recipes” that can be executed by chef_client
on the target node. Cookbook development involves using a ruby dsl that transpiles to raw ruby on the target node during a process called “convergence.” You edit the cookbook, push the modified cookbook to the chef server, then run chef_client
on the node. On the surface this seems simple. It’s not.
If you are staying “in the rails” (rimshots for the ruby joke π ) with chef and using their built in functions (called “resources”) to do your work, you will mostly be okay. If you are interested in testing your work you will start to feel pain. Chef provides the ability to do “local” testing–by which they mean spinning up VM’s using Vagrant over VirtualBox. “Local” in this context doesn’t mean you get to see the transpiled ruby before it executes, or step into a debugger. It just means that the VM you’re testing on happens to be running on your developer machine.
There are tools or automated testing with Chef cookbooks. We haven’t really learned to use them very well so I can’t personally comment on their quality. My coworker had this to say:
There is a commonly used unit testing framework for Chef, called ChefSpec (which is built upon RSpec). I have issues with ChefSpec, so most of my Chef unit tests are done in pure RSpec. However, ChefSpec is not officially supported by Chef.
There is chef-zero, which allows you to spin up an in memory mock Chef Server, which will allow you to start/pause/stop a mock node convergence, and will allow you to examine attributes at different steps of resolution. Test-Kitchen is the integration testing framework, which spins up actual VMs and uses Vagrant as a driver (it may use Docker now? Not sure), then fully converges that VM as a node. So the Chef ecosystem has the ability to do lots of testing at various levels. I feel it is inaccurate to state that testing is poor in Chef. What is accurate to state is that Chef is Linux first, Windows afterthought. So all the testing goodness in the Chef ecosystem was either unavailable to us as a Windows shop, or it was extremely difficult to use. … we knew that lack of testing was a weakness in our Chef code. We really tried to make it happen, but the hurdles were insurmountable with the resources available.
You can customize chef behavior by writing “resources” and “providers.” This process is not straightforward or easy to understand and frankly I still find it confusing. When writing new code for chef you can use straight ruby, but consuming standard ruby gems is problematic because of the convergence process. It’s not clear when the ruby code will run. Sometimes it runs during convergence. Sometimes it runs at when a recipe is called. Keeping it all straight and making it work as expected requires some mental and technical gymnastics. We had numerous errors caused by ruby code not executing when we thought it would.
Guidance
There is very little guidance about the correct way to use Chef. In fact, Opscode seems to pride itself on this fact. They exclaim on twitter and at conferences “You can use Chef anyway you want!” Great–but how should I use Chef? This is not a trivial question as there are at least 3 kinds of cookbooks in Chef development. There are cookbooks that define data, define utilities or “resources”, and deploy applications. There are 6 levels of overrides for variables. Which data should you put in application cookbooks vs. environment cookbooks? When should you use overrides and how?
Community
Chef has elected to rely on the community to provide cookbooks for most things. Some vendors have created officially supported cookbooks for their products, but most have not. Who can blame them? There’s no clear winner in the DevOps space yet, so why spend the time and money to support a tool that may fizz out? This means that most community cookbooks are of dubious quality. They are generally designed to solve the particular problem the author was facing when they were trying to install something. They may or may not set things up in a recommended fashion. They make assumptions about the OS, version, security, and logging strategies that are by no means universal. This left us in the position of having to read and fully understand the cookbook before deciding whether or not to use it. More often, we just ended up writing our own customized version of the same cookbook.
Conclusion
It’s clear to us that Chef is not a good fit for our organization. We deploy 50+ applications to 40+ environments. Because chef provisions a node and not an application if there’s a problem in any one application the entire deployment is considered a failure. We were never able to get above 80% successful deployments across our environments. Many of the reasons for this failure were non-technical and have nothing to do with the tool, but the complexity of the tool did not help us. Finding expert-level help with Chef on Windows was equally problematic. Even at the conferences we attended, the Windows Chef implementors were like the un-cool kids sitting at a table by themselves in the corner not quite excited by the “ra ra ra” of the pep rally.
If I were going to use Chef I would use it in an ecosystem that had fewer applications and more servers. Let’s say I had a web service that I needed to horizontally scale across 1000 Linux nodes. Chef would be a great fit for this scenario.
My co-worker says:
Chef would be a good choice if
a. You use Linux
b. You have a simple deployment topology, which can be supported by Systems Engineers who can do a little scripting (i.e., use only the built in resources).
c. Or you have a large DevOps engineering team that has enough throughput to dissect the bowels of Chef.Chef is designed for a setup where there are thousands of nodes, each node is responsible for doing 1 thing, and it is cheap to kill and provision nodes. Because we use Windows, we have hundreds of nodes [and] each node is responsible for many things. It is expensive to kill and provision nodes. Windows is moving towards the cattle not pets paradigm, but itβs not there yet. Chef is designed to work with setups that are already at the cattle stage.
Octopus
Octopus kind of ignores the “infrastructure as code” idea and tries to give you a button-click experience for deployments.
Architecture
Octopus Deploy is composed of a web server and a windows service. The web server stores metadata about your hardware and software ecosystem. The windows service (called Tentacle) is installed on each machine that Octopus will manage and performs the work locally. Octopus was developed api-first which means that all of the functionality you invoke through the Web UI is available through the api. The api is secured using standard models which means it is easy to consume using your favorite tools.
Instructions are packaged up by the server and pushed to the Tentacle agents where they are executed. As long as the Tentacle service account has the requisite permissions, anyone with permission to deploy through Octopus Server can deploy to the target machines even if they don’t otherwise have access to those machines.
Whereas Chef is node-centric, Octopus is application-centric. This is a strength in that it’s easy to deploy or redeploy a single application to a node that may have many applications on it. This is a weakness in that it’s harder to deploy all of the applications required for a particular machine (more button clicks). This weakness will be mitigated by the elastic environment features of the soon-to-be-released 3.4 version.
Octopus separates artifact management from release process management. You can modify your release process without ever editing your deployment artifacts, and vice versa. Deployment artifacts are a first class citizen in Octopus. You will have to layer in your own artifact management into Chef.
Developer Experience
Octopus is written in C# and runs on Windows. The advantage over Chef here is that we get native Active Directory integration for security purposes. In the Octopus world developers publish their applications as NuGet packages which are already understood by most .NET developers. The nuget package defintion lives with the application source code so no external mapping of deployment code to the application is required. Further, it’s obvious which bits go in the NuGet package and which bits will live on the Octopus Server.
Octopus uses one of several variable substitution techniques to manage the config files. Since Octopus was written with Windows deployments in mind, it contains built in processes for deploying IIS applications and Windows Services. If further customization is needed, developers can write custom scripts in Powershell as either a Step Template or a script included in the nuget package.
Unlike Chef, you can test the scripts you write locally with the caveat that you will have to provide any variables they require to execute. Since it’s just Powershell you can use any of the existing tools and debuggers written for Powershell. The lack of a transpilation step means that the code you wrote is the code that will run.
Guidance
Like Chef, Octopus has not “won” the DevOps space. My impression from the community is that the percentage of shops working in the DevOps space at all is fairly low so this isn’t a mark against either tool. However, low adoption means that there are fewer people sharing their lessons learned with the world. If you go looking for Octopus best practices, you won’t find much more than my blog series. Most “best practice” articles and videos about DevOps consist of little more than advice to automate all the things.
On the other hand, the way Octopus itself is structured steers you toward some best practices. Some of these ideas could be layered onto a Chef implementation just as easily, but Octopus is opinionated in such a way that it steers you in the right direction. It’s nice to have Octopus tell us “this is the blessed path to accomplish your goal.”
Octopus distinguishes variables from processes. A Release Process is a set of logic that results in the successful deployment of an application to a machine in an environment. It is informed by variables, which can be scoped to environments and roles (and channels as of version 3.3). Variables can be managed at a project level or in shared libraries called Variable Sets.
Custom logical steps can be defined in powershell, stored and versioned in Octopus, and used as part of the Release Process. These are called Step Templates. Step Templates are usually a lighter weight maintenance effort than a cookbook since they’re just powershell and often targeted at very small units of functionality that can be chained together without much overhead.
In Octopus, a Release is created which binds a deployment artifact to a set of variables and instructions. This release is then promoted through a series of environments defined in a lifecycle. There is no binding of artifacts to cookbooks in Chef unless you write that yourself.
Community
My impression is that the Octopus community is smaller than the Chef community. There is only one sanctioned mechanism for the community to share code–a github repo which Octopus itself reviews before accepting pull requests. The fact that Octopus reviews contributions gives me the impression that the quality of the community code is better than what we found in Chef. Having gone through the review process with them for a new step template we created we know that they are actually looking at the code. On the other hand, since Octopus targets the Windows ecosystem and has built-in support for most things Windows the need for community help isn’t as great.
Conclusion
When we were first learning Octopus, our first deploy from time to install was about 30 minutes. With Chef there were days of training and hours spent tweaking cookbooks until we got our first deployment. Octopus is simply easier for us to understand. We have a small team of 3 dedicated developers to manage 40+ environments that must be deployed with 50+ applications on demand. In addition this team manages Artifactory, TeamCity, and glue-code that ties these systems together. Our goal is to bring our successful deployment rate up to 95+% and not need babysit deployments to production.
For our needs Octopus is the clear winner. We are a .NET shop deploying C# applications to the Windows ecosystem. Octopus is a C# application that runs on Windows. Octopus’ understanding of our problem space is far superior to Chef. The tools they provide are high quality, easy to use, and work with very little configuration. There are gaps to fill, but the fact that Octopus is written API-first gives me confidence that we can easily fill them.
DevOps is a relatively new space in the software engineering world. There are a smattering of tools to aid in the automation of application deployments, but precious little guidance with respect to patterns and practices for using the new tools. As a guy who loves leaning on principles this lack of attention to best practices leaves me feeling a bit uncomfortable. Since I’m leading a migration to Octopus Deploy, I thought I would share some of the decisions we’ve made.
This series of posts is an attempt to start a conversation about best practices. I want to be clear: We have not been applying these ideas long enough to know what all of the ramifications are. Your mileage may vary.
Posts in this series
1. Environments
2. Roles
3. Variables & Variable Sets
Variables & Variable Sets
Octopus Deploy allows you to modify your application’s configuration through the use of variables. You can define variables at the project level, or share variable values between projects through variable sets. If you have relatively little sharing of variables between projects you will likely prefer to create variables at the project level. My team manages over 50 different applications. Many of them are web services designed to support SOA. The net impact is that we have a lot of shared variables and for this reason we define variables exclusively through variable sets. This saves us time hunting for where a given variable is defined.
We use 2 kinds of variable sets
1. Global
2. Role based
Global variable sets define values that might be required across the company irrespective of any particular application, or that are more easily managed together. For example, we wish to capture metadata about environments. Octopus itself does not have a facility for tagging environments with arbitrary metadata. To satisfy this goal we created a variable set called “environment” in which we create variables to indicate values such as “owner” and “abbr”. We use these values to compose the values of other variables such as dns addresses or email addresses.
We also have some environments for which we do not create dns addresses for the sites. In these environments we need to install web applications with alternate ports. We keep a variable sets to define the port number we use for web applications in these environments since they must be unique across the web server.
The number of global variable sets should be as small as possible.
Role based variable sets are variables defined for the specific roles they target. If we have a role called heroes-iis
we will also have a variable set called heroes-iis
. Since we create roles on a per-deployed-application basis, this helps us keep roles, projects, and variable sets linked. If heroes-iis
as web service end points, this variable set may be included in some other project that depends on those end points.
Naming Conventions
It is important to have naming conventions for your variables. I highly recommend prefixing all variables in a variable set with the name of the variable set to avoid potential naming collisions. For example, If I have a variable set called heroes-iis
it will have variables with names like:
- heroes-iis.application-pool.name
- heroes-iis.application-pool.password
Define a Standard Structure for Similar Variable Sets
Once you get the rhythm of installing applications with Octopus, you will discover that similar kinds of applications have similar variable definition needs. You can save yourself a lot of time and Chrome tabs by establishing a variable set template that you use when creating a variable set for each kind of application you deploy. Here is our variable set template for web applications being deployed to iis:
Variable Set Name | Segment | Field | Variable Name | Notes |
---|---|---|---|---|
name-iis | application-pool | name | name-iis.application-pool.name | The name of the application pool |
username | name-iis.application-pool.username | the username the application pool runs under | ||
password | name-iis.application-pool.password | the password the application pool runs under | ||
host | name.host | This corresponds to the site name as registered in IIS. It does not include the protocol (http://, https://). It should be blank if the site is being deployed into an environment without a dns entry. | ||
site-name | name.site-name | This will be just the name of the web application in environments that do not have a dns entry. If the environment has a dns entry, it should resolve the host property. | ||
site-root | name.site-root | This is the url root for the site. It should include the protocol (http://, https://) as well as the port, and any additional routing information. | ||
endpoints | endpoint-name | name.endpoints.endpoint-name | A web service may expose one or more endpoints. These should have unique names. Their values should be defined with reference to the host and port variables. | connection-strings | cs-name | name.connection-strings.cs-name | The name of the connection string in the config file. |
Scope
Octopus Deploy allows you to scope variables by environment, role, or channel (as of 3.3). The scoping rules are as follows:
(environment1 OR environment2 OR ...) AND (role1 OR role2 OR ...) AND (channel1 OR channel2 OR ...)
I recommend that you scope variable values as broadly as possible. Use composed variable values where you can to minimize the number of variable values you have to maintain. For example:
<br />heroes-iis.connection-strings.heroes-db => "Server=#{environment.sql-server.url}; Database=#{heroes-db.database-name};"
heroes-db.database-name => #{HEROES_}#{environment.name}
environment.sql-server.url => http://sql-server.#{environment.name}.com
By using a composed variable value I don’t need to scope the connection string variable itself. Instead, I can confine scoping to environment.name
and satisfy the resolution of all of the descendant variables. This minimizes the number of variables I have to actively maintain as new environments are created.
DevOps is a relatively new space in the software engineering world. There are a smattering of tools to aid in the automation of application deployments, but precious little guidance with respect to patterns and practices for using the new tools. As a guy who loves leaning on principles this lack of attention to best practices leaves me feeling a bit uncomfortable. Since I’m leading a migration to Octopus Deploy, I thought I would share some of the decisions we’ve made.
This series of posts is an attempt to start a conversation about best practices. I want to be clear: We have not been applying these ideas long enough to know what all of the ramifications are. Your mileage may vary.
Posts in this series
1. Environments
2. Roles
3. Variables & Variable Sets
Roles
When you add machines into Octopus, you must specify environments and roles for that machine. For our purposes, environments were pretty easy to define. Roles however took some work. Here are the kinds of roles we defined.
Operating Systems
Example: windows, linux
This is pretty easy. We started with Linux
and Windows
for this type of role. I can see a day when we may need to additionally specify ubuntu-14
or 2k8-r2
. In the meantime, YAGNI.
Environment Types
Example: dev, uat, integration, staging, prod, support
Our environment naming convention for developer environments is dev-{first initial}{last name}
. For uat environments it’s uat-{team}
. There is only one of each integration, staging, production, and support environments. There are certain variables that are defined consistently across all dev
environments but may differ in uat
environments. For this reason we are applying the environment type as a role across all machines in the relevant environments.
Commands
Example: hero-db.migrator
This is a standalone role. There will only be one machine in each environment that will have this role. It’s purpose is to execute commands on some resource in the enviornment that should not be run multiple times or concurrently. A good example of this is an Entity Framework database migration. We choose one machine in an environment that database migrations can be run from.
Applications
Example: webapp-iis, topshelf-service
Each deployable application has its own role. Not every application gets installed on every machine in an environment. We use the -iis
affix for applications installed into IIS regardless of whether they’re sites or web services. We use the -service
affix for Windows Services. We do this because we sometimes have a family of applications that have the same name but target a different kind of application.
DevOps is a relatively new space in the software engineering world. There are a smattering of tools to aid in the automation of application deployments, but precious little guidance with respect to patterns and practices for using the new tools. As a guy who loves leaning on principles this lack of attention to best practices leaves me feeling a bit uncomfortable. Since I’m leading a migration to Octopus Deploy, I thought I would share some of the decisions we’ve made.
This series of posts is an attempt to start a conversation about best practices. I want to be clear: We have not been applying these ideas long enough to know what all of the ramifications are. Your mileage may vary.
Posts in this series
1. Environments
2. Roles
3. Variables & Variable Sets
Our Default Lifecycle
Before I begin, I should give you some background on our development ecosystem. Our Octopus Lifecycle looks like this:
dev => uat => integration => staging => prod => support
The Environments
Name | Convention | Purpose | Notes |
---|---|---|---|
dev | dev-{first initial}{last name} | The primary purpose of these environments is to test the deployment tooling itself. | We have 15 or so individual developer environments. Each developer gets their own environment with 2 servers (1 Linux, 1 Windows) and all of our 60 or so proprietary applications installed to it. |
uat | uat-{team} | These environments are used by teams to test their work. | We have 10 or so User Acceptance Testing environments. These are a little bit more fleshed out in terms of hardware. There are multiple web servers behind load balancers. The machines are beefier. These enviornments are usually owned by a single team, though they may sometimes be shared. |
integration | N/A | Dress rehearsal by Development for releases | The integration environment is much closer to production. When multiple teams are releasing their software during the same release window, integration gives us a rehearsal environment to make sure all of the work done by the various teams will work well together. |
staging | N/A | Dress rehearsal by Support for releases | Staging is exactly like integration except that it is not owned by the Development department. We have a team of people who are responsible for executing releases. This is their environment to verify that the steps development gave them will work. |
prod | N/A | Business Use | Prod is not managed by the deployment engineering team. We build the button that pushes to prod, but we do not push it. |
support | N/A | Rehearsal environment for support solutions | Support is a post-production environment that mirrors production. It allows support personnel to test and verify support tasks in a non-prod environment prior to running them in production. |
When I started following @OctopusDeploy on twitter in preparation for adopting it for Redacted Financial Services Inc., I saw that they were sending representatives to the Kansas City Developer’s Conference (@kc_dc, #kcdc16). I looked up the conference and saw that @jeffreypalermo was going to be there as well. Excited, I reached out to @darrencauthon to see if he would be there too. When I got the confirmation, I’d made up my mind to go.
Living in the Northwest, I’d never heard of this conference. I’m super glad I came. There 1600 people at the conference. The topics’ focus are a little .NET centric but range toward the philosophical as well as the technical. In addition to .NET, there are presentations on R, Ruby, Javascript, leadership, etc.. The keynote encouraged attendees to learn more deeply about something they’re already doing and also learn something completely new.
This is exactly what I was wishing for in a conference. I’ve enjoyed the opportunity to talk with the gentlemen at @OctopusDeploy. I was impressed by the quality of Jeffrey Palermo’s talk about Continuous Delivery. I greatly enjoyed hearing about new technologies such as Semantic UI. I was fascinated by the differences between Angular, Ember, Knockout, and ReactJS. I’m now very interesting in trying out React.
The highlight was meeting Darren Cauthon in person after 5 years of being twitter friends. Darren & I are kindred spirits in that we both believe in TDD and strive for quality, well-crafted code. We both have backgrounds in .NET and experience with Ruby (though his experience with Ruby is far superior to mine).
I’ll come to this conference again–perhaps as a speaker next time.
I’m in the midst of setting Redacted Financial Services Inc. up to use Octopus Deploy. An issue we’ve had with automated deployments in the past is that the tooling reports that everything is okay, but later we find out the software is misconfigured for the environment it’s installed in. To resolve this problem we’ve started including a service health api into our deployed web services & sites. At the end of any deployment we can issue a GET
to the api and have the site tell us whether or not it is at least configured in such a way that it can access its dependent databases and services. Anything other than a 200 and we fail the deployment. We are also handing this api call off to Nagios for monitoring.
The code sample below demonstrates the implementation of this api. We deliver as a code-based add-in using our internal nuget feed. It includes health checks for Sql Server, Couchbase, Rabbit MQ, and standard REST services. This code is longish and we’ll probably open source it after we’ve had a chance to bake it. If you try it out let me know how it works for you.
In the time since I first wrote about my software engineering internship program I’ve gotten a bit more organized and had some new observations and insights. The philosophy behind the program is the same, but I can now better discuss the practical implementation, side-effects, and ROI. If you are interested in starting your own program, you should definitely read my first post on the topic as it covers the strategic elements in greater detail.
Return on Investment
Running the internship has costs. They are:
- ~15% of the mentor’s salary
- ~5% of the coach’s salary
- The intern’s salary
- The cost of equipment
- The cost of a Pluralsight License
- The cost of Clean Code and The Pragmatic Programmer
How do we justify this to the business? How is the program cost-effective?
After the bootcamp phase of the program the interns can usually write test-driven code with a level of quality approaching that of any other software engineer. We give them projects to work on that would not otherwise be approved. They “fill in the gaps” solving problems for our development organization.
Here are some of the projects they’ve worked on:
- GitBack – a console application used to backup all of your git repos to disk
- It’s open source. Feel free to use it and/or enhance it.
- A web application portal to host prototypes of tools and reports needed by the business that don’t neatly fit into any existing application.
- A web application to report on our hardware eco-system–which machines are used for what in which environments?
- A web application to scrub and provision sql server databases into non-production environments for testing purposes
The other major way we benefit as a department is that we get the opportunity to work with people while they’re starting out. We train them to build software our way. We get to know their warts, and they ours. When we have open positions, we prefer to offer them to people we know. We have had 2 direct hires of college students out of our internship program. When they come to work for us, they already know they like our environment. That may not sound like a lot but it can add up fast when you consider recruiter fees.
But wait! There’s more!
Alternative Career Path
After I’d been running the program for a year or so, something interesting happened: We had a member of another department request to go through our internship program. He wanted to effect a career-change into Software Engineering. Since we already had the program in place, we were able to accomodate him. Before he had even finished the program, we had another member of our organization request the same thing. We are able to provide an alternative career path to our organization and the development department benefits.
As of this moment we have hired:
- 2 interns out of college
- 2 career changes out of our internal organization
- 1 career change from outside our organization
That’s roughly 20% of our development organization.
Further, we have made the program available to people in other departments whose jobs require that they write code even though they’re not software engineers. It is my hope that this will result in higher quality code across the rest of our organization as well, though that remains to be seen.
Curriculum
The curriculum of the program has remained largely unchanged since I first wrote about it. However, I’ve found that I can make it largely self-serve for the first 4-6 weeks by organizing it as a sequence. This reduces the amount of time I need to spend mentoring the intern. Of course, once they get started on their real project my investment will be greater.
Here is the updated curriculum:
- Clean Code
- Read chapters 1-9 before the refactoring your homework exercise.
- Setup your dev machine (we have powershell scripts for this).
- Learn about Github & Branching
- Refactor your own school homework exercises to improve the readability of your code
- you are practicing what you have learned in Clean Code
- Movies Tutorial – build an ASP .NET MVC web application
- watch ASP .NET MVC 4 Fundamentals
- Sections 1 & 2
- The purpose here is to learn in detail what you did in the Movies Tutorial
- watch MVC5 Fundamentals
- Identity & Security
- Bootstrap
- Web Api 2
- Entity Framework 6
- Music Store Tutorial – build another ASP .NET MVC web application
- You should see that you are much faster this time around.
- You should understand the “why” in the steps of the tutorial.
- Nerd Dinner Tutorial β here’s another version updated for EF4
- These tutorials are out of date. To complete the project you will likely have to switch between them.
- Use EF Code first instead of what’s in the tutorial for data access.
- Stop when the tutorial tells you to import MicrosoftAjax.js.
- Chris will give you his speech on why this file is historically important and why we’re not going to use it.
- You’ll use jquery instead.
- Pair on the Bowling Game Kata
- Implement Fizz Buzz on your own using TDD.
- Learn about Dependency Injection
- Introduce Fakes, Stubs, and Mocks using this PluralSight video on Rhino Mocks
- Practice using Mocks and Stubs using this exercise
- Get started on your first real project!
The Context
I’m working on bringing Octopus Deploy into our company. We have a good number of TopShelf services. Octopus Deploy does not directly support installing Topshelf services so I’m creating the functionality in a Step Template. While attempting to install my service I got the command-line arguments slightly wrong. Instead of deploying a service called MyService
I deployed a service called "MyService"
. The services manager on the target server recognized the service, but neither the Powershell function Get-Services
nor sc.exe
could find it. I was eventually able to get a reference to the service using Get-WmiObject Win32_Service -Name "MyService"
. I tried calling the delete
method on this service and it did not succeed.
I was stuck. I can’t uninstall the service because none of the built-in tools recognize it.
I had no choice but to kill the server and rebuild it. It’s a good thing that we’re trying to treat servers like cattle and not pets.
Technical Debt is a metaphor to describe software rot. The idea is that each time software engineers take shortcuts in the code they incur “debt” in the code base. Each time future engineers must read and/or modify the indebted code, they pay “interest” on the debt in the form of longer project times and increased risk of defects due to unreadable code.
Some argue, and I’m one of them, that sometimes it is necessary to incur technical debt for the sake of speed. In a personal example I had a feature that was required to be implemented inside a week. My team estimated the work at 2 weeks. This was unacceptable due to an externally imposed deadline. The team offered a hacky solution to the problem that resulted in a quick turnaround. We offered the solution on the condition that we would immediately be given the 2 weeks to correct the design flaw. Agreement was reached and we released the feature in 1 week. We finished the feature in 3 weeks.
When you take on technical debt, you don’t reduce the cost of the feature–you increase it. You take on the work of the hacky solution and the work of reworking the hacky solution.
The reason we were able to reach this agreement is that our product owner and our team all understood that bad code is more expensive. We deliberately wrote bad code for the sake of instant gratification and then we immediately paid the full price for good, tested code.
We’ll fix it later
My colleague @jrolstad likes to say
Technical Debt is the lie we tell ourselves that we’ll come back and fix it later.
This is a phenomenon widely observed by many software engineers. We complain that the software is rotting and are promised an opportunity to “fix it later,” but “later” never seems to come.
What is always coming but never arrives? Tomorrow
— Children’s joke
Your code is your design
There’s an antipathy toward useless documentation in the Agile community, especially documentation about what code does. “The best documentation of the code is the code.”
When it comes to the design of their software, many developers make the mistake of thinking of the system in terms of their aspirations for the code base. The design of the system is always its current state. If you take a policy of accumulating technical debt in your code base then your actual design is a mess–not the gleaming structure of rationality you imagine it will be when you fix the technical debt… later.
What can we do?
If your organization has a legitimate need to take on technical debt, we can insist that the work to repair the debt be placed on the work calendar immediately. Most of the time the “business value” of getting a feature delivered fast is an attempt to pretend that a feature doesn’t cost as much as it does.
As a software engineering professional, we should not pretend that features do not cost what they do. We should not lie to the business, nor help them lie to themselves.
Estimation
These ideas have an implication with respect to estimation. If you owned a home and wanted to add a room on the second floor, how would you react if your contractor said “Well, these beams are rotted. We can probably build the room without replacing the beams, but your house may collapse in ten years. What do you want to do?” If you are anything like me, you would be appalled that the contractor even offered the option. The fact that the beams are rotted requires that they be replaced. This is not optional. The problem is not “solved” by building a room on rotted beams that will collapse in 10 years. It is not solved even if we know we are going to sell the house in 5 years (I’m looking at you startups).
The engineer should not offer to build the room on rotted beams. We should not offer or suggest alternatives to the business that result in the accumulation of technical debt. We should be honest with ourselves and with our employers about the real cost of the work they have requested.
But what if there’s a legitimate reason?
Is it always wrong to accrue technical debt? I don’t think so. As I stated earlier, I believe it can be acceptable to take some short-term shortcuts to get a solution out fast. However, this should be anomalous and the resultant mess should be cleaned up immediately following the achievement of the business goal. It needs to be understood by all parties that the shortcut costs more, not less.
How do I sell this to the business?
This is a complex topic and I don’t have all the answers. I have some answers and some promising leads.
First, stop presenting hacky alternatives in your estimate. Look at the code and honestly assess what it will take to alter it correctly. What is it going to take to add appropriate tests where they are missing? To clean the code so it will tolerate the change? To make the change? To repair an architectural deficiency? Estimate and present your estimate confidently and do not offer hacky options. If the business wants to negotiate, ask which features they would like to drop from the project. Do not offer to take engineering shortcuts.
Another tactic you can take is to measure estimates vs. technical debt in the code. You’ll have to start collecting some data for this approach and it will take some time. You’ll need:
- Project estimates
- How long the projects actually took
- Cyclomatic complexity for the affected code.
You’ll want to relate the accuracy of estimates to complexity. You should see that estimation accuracy decreases as complexity increases. For extra points, you can look at actual project estimates for similar features in code bases with different complexities. You should see that that more complex code bases are harder to maintain–even from an estimation perspective–than their simpler counterparts. You can use this information to put teeth into the claim that unclean code costs the company money.
I’d love to hear any other ideas you might have in the comments!
Sometimes we fall into the trap of thinking that since no one is complaining about our work then everyone must be happy with it. This is a dangerous mode of thought because our customers may not in fact be happy with our work, and because it inhibits us from improving ourselves. Getting accurate customer feedback about our performance is critical if we are to continually improve.
A Personal Example
At Redacted Financial I have some resource needs that are slow to be filled. There’s a process in place to make sure you get the right resources, but budgetary constraints mean that resource needs must be demonstrated before they are doled out. This makes sense to a degree but:
Budgets are there to keep you from being irresponsible. They should not keep you from being smart.
--Chris McKenzie
Yes, I’m quoting myself–but I really like that formulation :). The process to get more resources is just painful enough that my team members are averse to going through it. Our team has similar needs across the board so iterating through the approval process for each member of the team strikes me as wasteful. I’d rather identify a baseline for all of my team members and start each person with those resources. When I proposed this idea I was told:
We have resources allocated in wildly different ways and nobody is complaining. Clearly there is no standard baseline for your team.
--The Gatekeeper
I thought this attitude was interesting. In the mind of this engineer, “nobody is complaining” is equivalent to “everybody has what they need.” Sometimes as engineers we operate on the assumption that everyone complains if something is bothering them.
This is false. Marketers have known this for years. That’s why they spend so much time and effort to get you to fill out customer satisfaction surveys. Sometimes the most valuable information you can mine for is “where am I failing?” Marketers know that for every person who is vocal about there complaints, there are hundreds they will never hear from.
How can you find out if your customers are happy?
Let me ask a different question first: “Who are your customers?” There are different people who are impacted by your work. They are all stakeholders, but not every stakeholder is your customer. If you can’t look at your stakeholders and clearly identify your primary customer, consider using a Responsibility Assignment Matrix.
I found it useful in a recent project to use the RASCI matrix which is defined as follows:
- Responsible – This is you in our example
- Accountable – This is your customer. This is who you are accountable to. This isn’t your boss (necessarily). This is the person who will use the end-product of your work. It’s to enable them in their goals that you are doing yours.
- Support – These are stakeholders who will support you in your efforts, but who do not directly consume the end-product of your work.
- Consulted – These are stakeholders who should be consulted about your work. They may need to approve some aspect of what you’re doing, or they may have important insight about how you should go about your job.
- Informed – These are stakeholders who should be informed about your work and/or your progress.
These matrices are often used to facilitate the repair of organizational disfunction, but they are also useful simply to clarify the roles & responsibilities of all of the stakeholders in your personal work ecosystem.
Once you’ve identified your primary customer, the simplest way to find out if they are happy with your work is… ask them. Face-to-face conversations are nice where possible. If you are lucky enough to be able to have a face-to-face conversation with your customers, try to adopt an attitude that is open to criticism. Don’t interrupt what they’re saying with explanations or excuses–even if you disagree with what they’re saying or if you think they’re wrong about something. There will be time for responding later. For now, your task is to listen and gather as much information about their assumptions and agendas.
Some customers will be confrontation-averse so a face-to-face conversation may not yield honest results. As mentioned before, surveys might be useful. An anonymous comment box (or anonymized email account) could work. It’s on you to figure out how to mine the information.
What do I do with customer feedback once I get it?
It may be hard to get the information. It may also be hard to hear it. You should take some time to reflect on the feedback before you respond, especially if it’s negative. Try to distance yourself from any initial emotional reaction so that you can consider more than just what the feedback says about you and your work. What is the underlying agenda your customer is trying to achieve? Are you helping or hindering that agenda? Are they lacking any key pieces of information? Are there other easy solutions to their problems?
When your customer gives you feedback, the worst thing you can do is not respond to it. If you fail to respond to feedback–positive or negative–you send the message that you do not value it. If you don’t value the feedback, it shows you don’t value your customer. You should respond to feedback even if you’re not sure you can do anything to address their underlyng complaint.
In our recent town hall, it was expressed that working on project teams instead of product teams isn’t ideal from the perspective of collective code-ownership. The response was “The Business doesn’t want to work that way.” We are not resourced to have product teams for the 50 or so applications we manage. Business priorities shift and sometimes require all of our resources to concentrate on a few applications at a time. This wasn’t happy news for our development teams, but it is understandable. My point is even if you can’t do anything to address the critical feedback due to issues beyond your control, you can respond to the feedback by explaining the other constraints to your customer.
The response to the town hall was overwhelmingly positive. Sometimes, it is enough for your customer to hear a good reason why it’s hard to accomodate their needs to reduce their frustration.