Why We’re Migrating From Chef to Octopus

Why We’re Migrating From Chef to Octopus

A couple of months ago I started leading a team that was using Chef to manage our deployment infrastructure. We are now migrating from Chef to Octopus Deploy. I thought it might be interesting to explain why. What follows is my opinion and my best understanding of the facts. As always, do your own thinking. 🙂

Chef

Chef takes the DevOps mantra “infrastructure as code” to heart. Everything created for consumption by Chef is code. This is a strength in many ways, but sometimes a weakness.

Architecture

Chef consists of a server which contains metadata about your infrastructure, “cookbooks” which contain instructions for configuring machines and applications, and a client application which does the work of truing up a machine to its representation in the Chef Server.

In Chef, the target of work is a node. When chef_client runs it’s goal is to make the node it’s running on look like Chef Server expects it to. It has little awareness of applications except insofar as you layer that into your cookbook design. chef_client pulls the data it needs from the chef server in order to do its work. This means that in order to update a machine you have to have access to that machine as well as rights to run chef_client. This also means that it takes a bit more effort to be able to redeploy a single application to the node in the event something goes wrong. If you are running single-purpose nodes that’s probably not an issue for you.

It bears mentioning that Chef was designed for Linux first. Windows is an afterthought and it shows. There was no built-in support for common Windows constructs such as IIS and Windows Services at the time of our implementation. (My understanding from a coworker is that Chef now has built-in support for windows services.)

Chef has an api, but it’s nearly impossible to access using standard tools due to a convoluted authentication model. Chef as a CLI called knife which wraps up most api behavior. It should be stated that you cannot initiate activities through the api–that is all done by running chef_client on the target machines. knife can query the api for the machines you want and initiate chef_client runs for you however.

Developer Experience

The developer experience with Chef is all about editing cookbooks. A “cookbook” is a collection of “recipes” that can be executed by chef_client on the target node. Cookbook development involves using a ruby dsl that transpiles to raw ruby on the target node during a process called “convergence.” You edit the cookbook, push the modified cookbook to the chef server, then run chef_client on the node. On the surface this seems simple. It’s not.

If you are staying “in the rails” (rimshots for the ruby joke 🙂 ) with chef and using their built in functions (called “resources”) to do your work, you will mostly be okay. If you are interested in testing your work you will start to feel pain. Chef provides the ability to do “local” testing–by which they mean spinning up VM’s using Vagrant over VirtualBox. “Local” in this context doesn’t mean you get to see the transpiled ruby before it executes, or step into a debugger. It just means that the VM you’re testing on happens to be running on your developer machine.

There are tools or automated testing with Chef cookbooks. We haven’t really learned to use them very well so I can’t personally comment on their quality. My coworker had this to say:

There is a commonly used unit testing framework for Chef, called ChefSpec (which is built upon RSpec). I have issues with ChefSpec, so most of my Chef unit tests are done in pure RSpec. However, ChefSpec is not officially supported by Chef.

There is chef-zero, which allows you to spin up an in memory mock Chef Server, which will allow you to start/pause/stop a mock node convergence, and will allow you to examine attributes at different steps of resolution. Test-Kitchen is the integration testing framework, which spins up actual VMs and uses Vagrant as a driver (it may use Docker now? Not sure), then fully converges that VM as a node. So the Chef ecosystem has the ability to do lots of testing at various levels. I feel it is inaccurate to state that testing is poor in Chef. What is accurate to state is that Chef is Linux first, Windows afterthought. So all the testing goodness in the Chef ecosystem was either unavailable to us as a Windows shop, or it was extremely difficult to use. … we knew that lack of testing was a weakness in our Chef code. We really tried to make it happen, but the hurdles were insurmountable with the resources available.

You can customize chef behavior by writing “resources” and “providers.” This process is not straightforward or easy to understand and frankly I still find it confusing. When writing new code for chef you can use straight ruby, but consuming standard ruby gems is problematic because of the convergence process. It’s not clear when the ruby code will run. Sometimes it runs during convergence. Sometimes it runs at when a recipe is called. Keeping it all straight and making it work as expected requires some mental and technical gymnastics. We had numerous errors caused by ruby code not executing when we thought it would.

Guidance

There is very little guidance about the correct way to use Chef. In fact, Opscode seems to pride itself on this fact. They exclaim on twitter and at conferences “You can use Chef anyway you want!” Great–but how should I use Chef? This is not a trivial question as there are at least 3 kinds of cookbooks in Chef development. There are cookbooks that define data, define utilities or “resources”, and deploy applications. There are 6 levels of overrides for variables. Which data should you put in application cookbooks vs. environment cookbooks? When should you use overrides and how?

Community

Chef has elected to rely on the community to provide cookbooks for most things. Some vendors have created officially supported cookbooks for their products, but most have not. Who can blame them? There’s no clear winner in the DevOps space yet, so why spend the time and money to support a tool that may fizz out? This means that most community cookbooks are of dubious quality. They are generally designed to solve the particular problem the author was facing when they were trying to install something. They may or may not set things up in a recommended fashion. They make assumptions about the OS, version, security, and logging strategies that are by no means universal. This left us in the position of having to read and fully understand the cookbook before deciding whether or not to use it. More often, we just ended up writing our own customized version of the same cookbook.

Conclusion

It’s clear to us that Chef is not a good fit for our organization. We deploy 50+ applications to 40+ environments. Because chef provisions a node and not an application if there’s a problem in any one application the entire deployment is considered a failure. We were never able to get above 80% successful deployments across our environments. Many of the reasons for this failure were non-technical and have nothing to do with the tool, but the complexity of the tool did not help us. Finding expert-level help with Chef on Windows was equally problematic. Even at the conferences we attended, the Windows Chef implementors were like the un-cool kids sitting at a table by themselves in the corner not quite excited by the “ra ra ra” of the pep rally.

If I were going to use Chef I would use it in an ecosystem that had fewer applications and more servers. Let’s say I had a web service that I needed to horizontally scale across 1000 Linux nodes. Chef would be a great fit for this scenario.

My co-worker says:

Chef would be a good choice if
a. You use Linux
b. You have a simple deployment topology, which can be supported by Systems Engineers who can do a little scripting (i.e., use only the built in resources).
c. Or you have a large DevOps engineering team that has enough throughput to dissect the bowels of Chef.

Chef is designed for a setup where there are thousands of nodes, each node is responsible for doing 1 thing, and it is cheap to kill and provision nodes. Because we use Windows, we have hundreds of nodes [and] each node is responsible for many things. It is expensive to kill and provision nodes. Windows is moving towards the cattle not pets paradigm, but it’s not there yet. Chef is designed to work with setups that are already at the cattle stage.

Octopus

Octopus kind of ignores the “infrastructure as code” idea and tries to give you a button-click experience for deployments.

Architecture

Octopus Deploy is composed of a web server and a windows service. The web server stores metadata about your hardware and software ecosystem. The windows service (called Tentacle) is installed on each machine that Octopus will manage and performs the work locally. Octopus was developed api-first which means that all of the functionality you invoke through the Web UI is available through the api. The api is secured using standard models which means it is easy to consume using your favorite tools.

Instructions are packaged up by the server and pushed to the Tentacle agents where they are executed. As long as the Tentacle service account has the requisite permissions, anyone with permission to deploy through Octopus Server can deploy to the target machines even if they don’t otherwise have access to those machines.

Whereas Chef is node-centric, Octopus is application-centric. This is a strength in that it’s easy to deploy or redeploy a single application to a node that may have many applications on it. This is a weakness in that it’s harder to deploy all of the applications required for a particular machine (more button clicks). This weakness will be mitigated by the elastic environment features of the soon-to-be-released 3.4 version.

Octopus separates artifact management from release process management. You can modify your release process without ever editing your deployment artifacts, and vice versa. Deployment artifacts are a first class citizen in Octopus. You will have to layer in your own artifact management into Chef.

Developer Experience

Octopus is written in C# and runs on Windows. The advantage over Chef here is that we get native Active Directory integration for security purposes. In the Octopus world developers publish their applications as NuGet packages which are already understood by most .NET developers. The nuget package defintion lives with the application source code so no external mapping of deployment code to the application is required. Further, it’s obvious which bits go in the NuGet package and which bits will live on the Octopus Server.

Octopus uses one of several variable substitution techniques to manage the config files. Since Octopus was written with Windows deployments in mind, it contains built in processes for deploying IIS applications and Windows Services. If further customization is needed, developers can write custom scripts in Powershell as either a Step Template or a script included in the nuget package.

Unlike Chef, you can test the scripts you write locally with the caveat that you will have to provide any variables they require to execute. Since it’s just Powershell you can use any of the existing tools and debuggers written for Powershell. The lack of a transpilation step means that the code you wrote is the code that will run.

Guidance

Like Chef, Octopus has not “won” the DevOps space. My impression from the community is that the percentage of shops working in the DevOps space at all is fairly low so this isn’t a mark against either tool. However, low adoption means that there are fewer people sharing their lessons learned with the world. If you go looking for Octopus best practices, you won’t find much more than my blog series. Most “best practice” articles and videos about DevOps consist of little more than advice to automate all the things.

On the other hand, the way Octopus itself is structured steers you toward some best practices. Some of these ideas could be layered onto a Chef implementation just as easily, but Octopus is opinionated in such a way that it steers you in the right direction. It’s nice to have Octopus tell us “this is the blessed path to accomplish your goal.”

Octopus distinguishes variables from processes. A Release Process is a set of logic that results in the successful deployment of an application to a machine in an environment. It is informed by variables, which can be scoped to environments and roles (and channels as of version 3.3). Variables can be managed at a project level or in shared libraries called Variable Sets.

Custom logical steps can be defined in powershell, stored and versioned in Octopus, and used as part of the Release Process. These are called Step Templates. Step Templates are usually a lighter weight maintenance effort than a cookbook since they’re just powershell and often targeted at very small units of functionality that can be chained together without much overhead.

In Octopus, a Release is created which binds a deployment artifact to a set of variables and instructions. This release is then promoted through a series of environments defined in a lifecycle. There is no binding of artifacts to cookbooks in Chef unless you write that yourself.

Community

My impression is that the Octopus community is smaller than the Chef community. There is only one sanctioned mechanism for the community to share code–a github repo which Octopus itself reviews before accepting pull requests. The fact that Octopus reviews contributions gives me the impression that the quality of the community code is better than what we found in Chef. Having gone through the review process with them for a new step template we created we know that they are actually looking at the code. On the other hand, since Octopus targets the Windows ecosystem and has built-in support for most things Windows the need for community help isn’t as great.

Conclusion

When we were first learning Octopus, our first deploy from time to install was about 30 minutes. With Chef there were days of training and hours spent tweaking cookbooks until we got our first deployment. Octopus is simply easier for us to understand. We have a small team of 3 dedicated developers to manage 40+ environments that must be deployed with 50+ applications on demand. In addition this team manages Artifactory, TeamCity, and glue-code that ties these systems together. Our goal is to bring our successful deployment rate up to 95+% and not need babysit deployments to production.

For our needs Octopus is the clear winner. We are a .NET shop deploying C# applications to the Windows ecosystem. Octopus is a C# application that runs on Windows. Octopus’ understanding of our problem space is far superior to Chef. The tools they provide are high quality, easy to use, and work with very little configuration. There are gaps to fill, but the fact that Octopus is written API-first gives me confidence that we can easily fill them.

20 thoughts on “Why We’re Migrating From Chef to Octopus

  1. Great post, I was especially interested in what you have experienced in how Chef is architected and the “developer experience”. Your post has made clear what i felt made Chef cumbersome and awkward when using it to implement Machine configuration in a Windows Server Environment.
    I would like to hear more about how it is used in your organisation and if you use custom PS script, what they are used to configure on the Servers or how they are used in general.

  2. Very interesting post. Thank you. If you’re willing, I’d love to hear more about how you use Artifactory within your organization. In my experience, it’s another tool that appears to be clashing with the C#/Windows stack, in that it’s clearly Java-first, with C#/.NET an afterthought. Are you using it as a repository for your Octopus/NuGet packages, and, if so, have you considered alternatives such as ProGet?

    1. Artifactory is our general package management solution. We’ve been using it for 2 years with no problems. I haven’t had any clashes with the Windows ecosystem. Nuget, Chocolatey, TeamCity, and Octopus all integrate well with it.

      We used to be MyGet users but we found the reliability and performance of MyGet to be problematic. Artifactory’s api performance is great. In addition to NuGet packages, we use it to store .zip’s, msi’s, ruby gems, and chocolately packages. It’s versatility with package type coupled with it’s ability to organize and annotate packages in arbitrary ways makes it an excellent tool for us.

      I do have some complaints: 1) The UI performance can be very slow in large repos with thousands of packages. 2) Their implementation of the NuGet api does not exactly match the nuget.org public api.

      In general my measurement of tool quality is “How much time do I have to spend worrying about the tool?” vs. “How much time does the tool save me?” So far, Artifactory has been a low-maintenance solution for our package management needs.

      I haven’t looked at ProGet in a serious way. At the time we were making a decision, ruby gems support was a requirement. That’s no longer the case for us as we’ve decided ruby isn’t a good fit for our environment, but we’ve had no reason to consider changing tools.

  3. We use powershell in two ways. Inside of Octopus we use powershell to write custom step templates. Outside of Octopus we maintain a powershell module which we use to provision hardware to our private cloud, bootstrap machines into Chef, bootstrap machines into Octopus, provision databases to our SQL Servers, and initiate application deployments. We currently use TeamCity as a wrapper around our powershell module so that we can expose these behaviors to development teams without them having to configure a laundry list of tools on their local work stations. Our long-term plan is to wrap all of this behavior up in an internal web api.

    1. Thank you for the reply.
      I have seen similar usage of Octopus with Custom PS Templates and build engines (TeamCity, TFS, Bamboo) running PS scripts to configure servers.
      My thoughts were that people were creating great useful tasks but using the incorrect tools to execute the tasks. So we also started to build a Web Api and Web Site to manage these types of Tasks. The idea being that Octopus or TeamCity/TFS can call them (via the API) as part of the Continuous Delivery process therefore they become part of the deployment pipeline and application management.
      It talks to the Octopus API to get all the Environments and Node information and it can query Nodes and run AdHoc steps against any Node in a particular Octopus role.
      It would be great to hear more on the WebApi and where/how you see it being used once your Team has fleshed it out.
      Thanks again and please keep up the great posts really enjoying them.

  4. Hi, nice post, just stumbled across it and thought I’d weigh in. I’m not really sure Chef is the right tool for what you were trying to do i.e. application deployment.

    We’re a mixed Linux/Windows shop and we’ve found decent success in using Chef to setup infrastructure while using Octopus for deploying applications – so we use both, but for very different purposes.

    Chef is for anything not released by our engineering team – so installing and configuring software like Octopus Tentacles, IIS, RabbitMQ, .NET Framework versions, Java, the list goes on. We’ve definitely encountered some of the issues you mentioned, but tbh between that and manual configuration or Windows DSC we’re happy to stick with it!

    And then we use Octopus for releasing our product stacks – which it does extremely well and provides us with a great audit trail and visibility into what is placed where.

    P.S. Totally agree about testing with Chef on Windows, a bit of a nightmare. All our infrastructure DevOps work is done on Mac/Linux using Vagrant/Test Kitchen and running Windows VirtualBox images, and that’s worked quite well for us, especially if you start using other tooling like Packer.

    1. “I’m not really sure Chef is the right tool for what you were trying to do i.e. application deployment.”
      Agreed. Unfortunately the guys at OpsCode led us to believe it would be a good fit… on multiple conference calls and multiple trainings.

      “using Chef to setup infrastructure while using Octopus for deploying applications”
      I can see how Chef as a metadata server would work. In our case, we have interns writing an api to store our machine configuration data our cloud provider publishes a powershell module we use to manage our hardware infrastructure.

      Thanks for the great comment!

  5. Thanks for writing this. I’ve been curious about the Windows experience of tools such as Chef. In migrating to Octopus Deploy for these purposes do you see yourself using the polling tentacle setup more than the listening tentacle one for server configuration changes?

    Also, bit of a plug here, you might want to checkout Bluefin, a free Chrome extension I started over a year ago for Octopus Deploy. http://bluefin.teapotcoder.com/

    1. Hi David–I was introduced to you at KCDC by @DarrenCauthon. We are well aware of Bluefin over here. It is our intent after we complete the migration to Octopus to become contributors to Bluefin, FWIW.

      Regarding Listening vs. Polling Tentacle, it’s not something we’ve been focused on at all. I checked though, and the environment bootstrap code we’ve written in Powershell sets tentacle up in listening mode. Is there something important that I should be aware of here?

    2. Bluefin chrome extension is extremely helpful, I find that there are a few let downs with Octopus UI that i run into every week and the bluefin extension solves one big pain for me.

  6. Wondering how do you in fact do node configuration management with Octopus only environment since as you mentioned itself Octopus specialization is applications and not OS(node). Probably the best case scenario is Chef + Octopus but you seems to be gravitating to Octopus only environment per article, so how do you in fact do node management with Octopus only setup?

    1. This is a great question. It deserves a full length post of its own. The short answer is that Chef relies on Powershell functions that are freely available. What it adds is node metadata. At $72 per node per year though, we can (and did) build our own node metadata store. Powershell + node metadata = an inexpensive alternative to Chef.

      1. Chef delivery is only version which requires pay, Chef configuration management is free. Why did you put a price per node? Not advocating Chef (I wish Microsoft built proper tooling around DSC) but node configuration management needs to be addressed in this article

        1. We use hosted chef server which has a cost. We do not have internal resources with the expertise to be responsible for an internally hosted Chef server. The value that Chef offers at this point isn’t anything we can’t achieve with a simple document database topped with an api. I had two interns knock that out in 6 weeks. No intermediate layers of Ruby DSL required.

  7. Chris
    Do you have any Info on the workflow around the metadata you mention and how it is used in a workflow. I’m not after details or particulars just would really like to know more about your thoughts and the process you came up with when building it out.

    Gregory
    Stephen Owen, a Powershell MVP is building one, it is new and in the early early stages but he has the chops to build it properly. He spoke at MS Ignite on Powershell etc https://foxdeploy.com/
    The other one is mentioned on this blog in one of the posts sorry i cannot remember which one but i looked it up before and it sounds ok.
    And I am building a self hosted web based one similar to what Stephen is building but with a different idea as to the UI.

Leave a Reply

%d bloggers like this: