Monthly Archives: July 2016
Asynchrony in Powershell

As part of our Octopus Deploy migration effort we are writing a powershell module that we use to automatically bootstrap the Tentacle installation into Octopus. This involves maintaining metadata about machines and environments outside of Octopus. The reason we need this capability is to adhere to the “cattle vs. pets” approach to hardware. We want to be able to destroy and recreate our machines at will and have them show up again in Octopus ready to receive deployments.

Our initial implementation cycled through one machine at a time, installing Tentacle, registering it with Octopus (with the same security certificate so that Octopus recognizes it as the same machine), then moving on to the next machine. This is fine for small environments with few machines, but not awesome for larger environments with many machines. If it takes 2m to install Tentacle and I have 30 machines, I’m waiting an hour to be able to use the environment. With this problem in mind I decided to figure out how we could parallelize the boostrapping of machines in our Powershell module.

Start-Job

Start-Job is one of a family of Powershell functions created to support asynchrony. Other related functions are Get-Job, Wait-Job, Receive-Job, and Remove-Job. In it’s most basic form, Start-Job accepts a script block as a parameter and executes it on a background thread.

# executes "dir" on a background thread.
$job = Start-Job -ScriptBlock { dir } 

The job object returned by Start-Job gives you useful information such as the job id, name, and current state. You can run Get-Job to get a list of running jobs, Wait-Job to wait on one or more jobs to complete, Receive-Job to get the output of each job, and Remove-Job to delist jobs in the current Powershell session.

Complexity

If that’s all there was to it, I wouldn’t be writing this blog post. I’d just tweet the link to the Start-Jobs msdn page and call it done. My scenario is that I need to bootstrap machines using code defined in my Powershell module, but run those commands in a background process. I also need to collate and log the output of those processes as well as report on the succes/failure of each job.

When you call Start-Job in Powershell it creates a new session in which currently loaded modules are not automatically loaded. If you have your powershell module in the $PsModulePath you’re probably okay. However, there is a difference between the version of the module I’m currently working on and testing vs. the one I have on my machine for normal use.

Start-Job has an additional parameter for a script block used to initialize the new Powershell session prior to executing your background process. The difficulty is that while you can pass arguments to the background process script block, you cannot pass arguments to the initialization script. Here’s how you make it all work.

Setup Code

<br /># Store the working module path in an environment variable so that the new powershell session can locate the correct version of the module.
# The environment variable will not persist beyond the current powershell session so we don't have to worry about poluting our machine state.
$env:OctobootModulePath = (get-module Octoboot).Path
$init =  {
    # When initializing the new session, use the -Force parameter in case a different version of the module is already loaded by a profile.
    import-module $env:OctobootModulePath -Force
}

# create a parameterized script block
$scriptBlock = {
    Param(
          $computerName,
          $environment,
          $roles,
          $userName,
          $password,
          $apiKey,
    )

    Install-Tentacle -computer $computerName `
        -environment $environment `
        -roles $roles `
        -userName $userName -password $password `
        -apiKey $apiKey
}

# I like to use an -Async switch on the controlling function. Debugging issues is easier in a synchronous context than in an async context. Making the async functionality optional is a win.
if ($async) {
    $job = Start-Job `
        -ScriptBlock $scriptBlock `
        -InitializationScript $init `
        -Name "Install Tentacle on $($computerName)" `
        -ArgumentList @(
             $computerName, $environment, $roles
             $userName, $password,
             $apiKey) -Debug:$debug
} else {
    Install-Tentacle -computer $computerName `
          -environment $environment `
          -roles $roles `
          -userName $userName -password $password `
          -apiKey $apiKey      
}

The above code is in a loop in the controlling powershell function. After I’ve kicked off all of the jobs I’m going to execute, I just need to wait on them to finish and collect their results.

Finalization Code

<br />if ($async) {
    $jobs = get-job
    $jobs | Wait-Job | Receive-Job
    $jobs | foreach {
        $job = $_
        write-host "$($job.Id) - $($job.Name) - $($job.State)"
    }
    $jobs | remove-job
}

Since each individual job is now running in parallel, bootstrapping large environments doesn’t take much longer than bootstrapping smaller ones. The end result is that hour is now reduced to a few minutes.

Why We’re Migrating From Chef to Octopus

Why We’re Migrating From Chef to Octopus

A couple of months ago I started leading a team that was using Chef to manage our deployment infrastructure. We are now migrating from Chef to Octopus Deploy. I thought it might be interesting to explain why. What follows is my opinion and my best understanding of the facts. As always, do your own thinking. πŸ™‚

Chef

Chef takes the DevOps mantra “infrastructure as code” to heart. Everything created for consumption by Chef is code. This is a strength in many ways, but sometimes a weakness.

Architecture

Chef consists of a server which contains metadata about your infrastructure, “cookbooks” which contain instructions for configuring machines and applications, and a client application which does the work of truing up a machine to its representation in the Chef Server.

In Chef, the target of work is a node. When chef_client runs it’s goal is to make the node it’s running on look like Chef Server expects it to. It has little awareness of applications except insofar as you layer that into your cookbook design. chef_client pulls the data it needs from the chef server in order to do its work. This means that in order to update a machine you have to have access to that machine as well as rights to run chef_client. This also means that it takes a bit more effort to be able to redeploy a single application to the node in the event something goes wrong. If you are running single-purpose nodes that’s probably not an issue for you.

It bears mentioning that Chef was designed for Linux first. Windows is an afterthought and it shows. There was no built-in support for common Windows constructs such as IIS and Windows Services at the time of our implementation. (My understanding from a coworker is that Chef now has built-in support for windows services.)

Chef has an api, but it’s nearly impossible to access using standard tools due to a convoluted authentication model. Chef as a CLI called knife which wraps up most api behavior. It should be stated that you cannot initiate activities through the api–that is all done by running chef_client on the target machines. knife can query the api for the machines you want and initiate chef_client runs for you however.

Developer Experience

The developer experience with Chef is all about editing cookbooks. A “cookbook” is a collection of “recipes” that can be executed by chef_client on the target node. Cookbook development involves using a ruby dsl that transpiles to raw ruby on the target node during a process called “convergence.” You edit the cookbook, push the modified cookbook to the chef server, then run chef_client on the node. On the surface this seems simple. It’s not.

If you are staying “in the rails” (rimshots for the ruby joke πŸ™‚ ) with chef and using their built in functions (called “resources”) to do your work, you will mostly be okay. If you are interested in testing your work you will start to feel pain. Chef provides the ability to do “local” testing–by which they mean spinning up VM’s using Vagrant over VirtualBox. “Local” in this context doesn’t mean you get to see the transpiled ruby before it executes, or step into a debugger. It just means that the VM you’re testing on happens to be running on your developer machine.

There are tools or automated testing with Chef cookbooks. We haven’t really learned to use them very well so I can’t personally comment on their quality. My coworker had this to say:

There is a commonly used unit testing framework for Chef, called ChefSpec (which is built upon RSpec). I have issues with ChefSpec, so most of my Chef unit tests are done in pure RSpec. However, ChefSpec is not officially supported by Chef.

There is chef-zero, which allows you to spin up an in memory mock Chef Server, which will allow you to start/pause/stop a mock node convergence, and will allow you to examine attributes at different steps of resolution. Test-Kitchen is the integration testing framework, which spins up actual VMs and uses Vagrant as a driver (it may use Docker now? Not sure), then fully converges that VM as a node. So the Chef ecosystem has the ability to do lots of testing at various levels. I feel it is inaccurate to state that testing is poor in Chef. What is accurate to state is that Chef is Linux first, Windows afterthought. So all the testing goodness in the Chef ecosystem was either unavailable to us as a Windows shop, or it was extremely difficult to use. … we knew that lack of testing was a weakness in our Chef code. We really tried to make it happen, but the hurdles were insurmountable with the resources available.

You can customize chef behavior by writing “resources” and “providers.” This process is not straightforward or easy to understand and frankly I still find it confusing. When writing new code for chef you can use straight ruby, but consuming standard ruby gems is problematic because of the convergence process. It’s not clear when the ruby code will run. Sometimes it runs during convergence. Sometimes it runs at when a recipe is called. Keeping it all straight and making it work as expected requires some mental and technical gymnastics. We had numerous errors caused by ruby code not executing when we thought it would.

Guidance

There is very little guidance about the correct way to use Chef. In fact, Opscode seems to pride itself on this fact. They exclaim on twitter and at conferences “You can use Chef anyway you want!” Great–but how should I use Chef? This is not a trivial question as there are at least 3 kinds of cookbooks in Chef development. There are cookbooks that define data, define utilities or “resources”, and deploy applications. There are 6 levels of overrides for variables. Which data should you put in application cookbooks vs. environment cookbooks? When should you use overrides and how?

Community

Chef has elected to rely on the community to provide cookbooks for most things. Some vendors have created officially supported cookbooks for their products, but most have not. Who can blame them? There’s no clear winner in the DevOps space yet, so why spend the time and money to support a tool that may fizz out? This means that most community cookbooks are of dubious quality. They are generally designed to solve the particular problem the author was facing when they were trying to install something. They may or may not set things up in a recommended fashion. They make assumptions about the OS, version, security, and logging strategies that are by no means universal. This left us in the position of having to read and fully understand the cookbook before deciding whether or not to use it. More often, we just ended up writing our own customized version of the same cookbook.

Conclusion

It’s clear to us that Chef is not a good fit for our organization. We deploy 50+ applications to 40+ environments. Because chef provisions a node and not an application if there’s a problem in any one application the entire deployment is considered a failure. We were never able to get above 80% successful deployments across our environments. Many of the reasons for this failure were non-technical and have nothing to do with the tool, but the complexity of the tool did not help us. Finding expert-level help with Chef on Windows was equally problematic. Even at the conferences we attended, the Windows Chef implementors were like the un-cool kids sitting at a table by themselves in the corner not quite excited by the “ra ra ra” of the pep rally.

If I were going to use Chef I would use it in an ecosystem that had fewer applications and more servers. Let’s say I had a web service that I needed to horizontally scale across 1000 Linux nodes. Chef would be a great fit for this scenario.

My co-worker says:

Chef would be a good choice if
a. You use Linux
b. You have a simple deployment topology, which can be supported by Systems Engineers who can do a little scripting (i.e., use only the built in resources).
c. Or you have a large DevOps engineering team that has enough throughput to dissect the bowels of Chef.

Chef is designed for a setup where there are thousands of nodes, each node is responsible for doing 1 thing, and it is cheap to kill and provision nodes. Because we use Windows, we have hundreds of nodes [and] each node is responsible for many things. It is expensive to kill and provision nodes. Windows is moving towards the cattle not pets paradigm, but it’s not there yet. Chef is designed to work with setups that are already at the cattle stage.

Octopus

Octopus kind of ignores the “infrastructure as code” idea and tries to give you a button-click experience for deployments.

Architecture

Octopus Deploy is composed of a web server and a windows service. The web server stores metadata about your hardware and software ecosystem. The windows service (called Tentacle) is installed on each machine that Octopus will manage and performs the work locally. Octopus was developed api-first which means that all of the functionality you invoke through the Web UI is available through the api. The api is secured using standard models which means it is easy to consume using your favorite tools.

Instructions are packaged up by the server and pushed to the Tentacle agents where they are executed. As long as the Tentacle service account has the requisite permissions, anyone with permission to deploy through Octopus Server can deploy to the target machines even if they don’t otherwise have access to those machines.

Whereas Chef is node-centric, Octopus is application-centric. This is a strength in that it’s easy to deploy or redeploy a single application to a node that may have many applications on it. This is a weakness in that it’s harder to deploy all of the applications required for a particular machine (more button clicks). This weakness will be mitigated by the elastic environment features of the soon-to-be-released 3.4 version.

Octopus separates artifact management from release process management. You can modify your release process without ever editing your deployment artifacts, and vice versa. Deployment artifacts are a first class citizen in Octopus. You will have to layer in your own artifact management into Chef.

Developer Experience

Octopus is written in C# and runs on Windows. The advantage over Chef here is that we get native Active Directory integration for security purposes. In the Octopus world developers publish their applications as NuGet packages which are already understood by most .NET developers. The nuget package defintion lives with the application source code so no external mapping of deployment code to the application is required. Further, it’s obvious which bits go in the NuGet package and which bits will live on the Octopus Server.

Octopus uses one of several variable substitution techniques to manage the config files. Since Octopus was written with Windows deployments in mind, it contains built in processes for deploying IIS applications and Windows Services. If further customization is needed, developers can write custom scripts in Powershell as either a Step Template or a script included in the nuget package.

Unlike Chef, you can test the scripts you write locally with the caveat that you will have to provide any variables they require to execute. Since it’s just Powershell you can use any of the existing tools and debuggers written for Powershell. The lack of a transpilation step means that the code you wrote is the code that will run.

Guidance

Like Chef, Octopus has not “won” the DevOps space. My impression from the community is that the percentage of shops working in the DevOps space at all is fairly low so this isn’t a mark against either tool. However, low adoption means that there are fewer people sharing their lessons learned with the world. If you go looking for Octopus best practices, you won’t find much more than my blog series. Most “best practice” articles and videos about DevOps consist of little more than advice to automate all the things.

On the other hand, the way Octopus itself is structured steers you toward some best practices. Some of these ideas could be layered onto a Chef implementation just as easily, but Octopus is opinionated in such a way that it steers you in the right direction. It’s nice to have Octopus tell us “this is the blessed path to accomplish your goal.”

Octopus distinguishes variables from processes. A Release Process is a set of logic that results in the successful deployment of an application to a machine in an environment. It is informed by variables, which can be scoped to environments and roles (and channels as of version 3.3). Variables can be managed at a project level or in shared libraries called Variable Sets.

Custom logical steps can be defined in powershell, stored and versioned in Octopus, and used as part of the Release Process. These are called Step Templates. Step Templates are usually a lighter weight maintenance effort than a cookbook since they’re just powershell and often targeted at very small units of functionality that can be chained together without much overhead.

In Octopus, a Release is created which binds a deployment artifact to a set of variables and instructions. This release is then promoted through a series of environments defined in a lifecycle. There is no binding of artifacts to cookbooks in Chef unless you write that yourself.

Community

My impression is that the Octopus community is smaller than the Chef community. There is only one sanctioned mechanism for the community to share code–a github repo which Octopus itself reviews before accepting pull requests. The fact that Octopus reviews contributions gives me the impression that the quality of the community code is better than what we found in Chef. Having gone through the review process with them for a new step template we created we know that they are actually looking at the code. On the other hand, since Octopus targets the Windows ecosystem and has built-in support for most things Windows the need for community help isn’t as great.

Conclusion

When we were first learning Octopus, our first deploy from time to install was about 30 minutes. With Chef there were days of training and hours spent tweaking cookbooks until we got our first deployment. Octopus is simply easier for us to understand. We have a small team of 3 dedicated developers to manage 40+ environments that must be deployed with 50+ applications on demand. In addition this team manages Artifactory, TeamCity, and glue-code that ties these systems together. Our goal is to bring our successful deployment rate up to 95+% and not need babysit deployments to production.

For our needs Octopus is the clear winner. We are a .NET shop deploying C# applications to the Windows ecosystem. Octopus is a C# application that runs on Windows. Octopus’ understanding of our problem space is far superior to Chef. The tools they provide are high quality, easy to use, and work with very little configuration. There are gaps to fill, but the fact that Octopus is written API-first gives me confidence that we can easily fill them.