Showing all posts by crmckenzie
Powershell Gems: Array Comparisons

There is a shorthand syntax that can be applied to arrays to apply filtering. Consider the following syntactically correct Powershell:

1,2,3,4,5 | ?{ $_ -gt 2 } # => 3,4,5

You can write the same thing in a much simpler fashion as follows:

1,2,3,4,5 -gt 2 => 3,4,5

In the second example, Powershell is applying the expression -gt 2 to the elements of array and returning the matching items.

Null Coalesce

Unfortnately, Powershell lacks a true null coalesce operator. Fortunately, we can simulate that behavior using array comparisons.

($null, $null, 5,6, $null, 7).Length # => 6
($null, $null, 5,6, $null, 7 -ne $null).Length # => 3
($null, $null, 5,6, $null, 7 -ne $null)[0] # => 5

Powershell Gems: Destructuring

Destructuring

What is destructuring?

Destructuring is a convenient way of extracting multiple values from data stored in (possibly nested) objects and Arrays. It can be used in locations that receive data (such as the left-hand side of an assignment).

source

Here is an example of destructuring in powershell.

$first, $second, $therest = 1,2,3,4,5
$first
1
$second
2
$therest
3
4
5

As you can see, Powershell assigns the first and second values in the array to the variables $first and $second. The remaining items are then assigned to the last variable in the assignment list.

Gotchas

If we look at the following Powershell code nothing seems out of the ordinary.

$arr = @(1)
$arr.GetType().FullName
System.Object[]

However, look at this code sample:

# When Function Returns No Elements
Function Get-Array() { 
    return @() 
} 
$arr = Get-Array
$arr.GetType()
You cannot call a method on a null-valued expression.
At line:1 char:1
+ $arr.GetType()
+ ~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

$arr -eq $null
True

# When Function Returns One Element
Function Get-Array() { 
    return @(1)  
}
$arr = Get-Array
$arr.GetType().FullName
System.Int32

# When Function Returns Multiple Elements
Function Get-Array() { 
    return @(1,2)
} 
$arr = Get-Array
$arr.GetType().FullName
System.Object[]

When returning arrays from functions, if the array contains only a single element, the default Powershell behavior is to destructure it. This can sometimes lead to confusing results.

You can override this behavior by prepending the resultant array with a ‘,’ which tells Powershell that the return type should not be destructured:

# When Function Returns No Elements
Function Get-Array() {
    return ,@() 
} 
$arr = Get-Array
$arr.GetType().FullName
System.Object[]

# When Function Returns One Element
Function Get-Array() {
    return ,@(1) 
} 
$arr = Get-Array
$arr.GetType().FullName
System.Object[]

# When Function Returns Multiple Elements
Function Get-Array() {
    return ,@(1,2)
}
$arr = Get-Array
$arr.GetType().FullName
System.Object[]
Proposed Quality Metrics for Engineering Managers

As a manager I’d like to have some data-driven metrics by which to decide where to focus my questions and priorities for making improvements on my engineering team.Β  Not all of these metrics will be measured using the same scale–some metrics are taken as pure work item counts, estimations in term of story points, and hours. As these metrics are an attention-guide for managers, it’s important that a manager is choosing the most important set to focus on at any one time. Trying to address every issue at once is doomed to failure.

The list is not exhaustive and some traditional metrics like code coverage are specifically excluded. These are simply the items that I as an engineering am currently interested in. Once I can measure them through our work tracking tool, I will choose 3-4 to focus on in the short- to medium-term.

The structure of each metric is:
Title: The name of the metric and a short description of what it means
Question: What is the question the metric is trying to answer?
Remediation: One or more common paths to improve the metric. It should be understood that this list is not exhaustive and that any given metric out of balance may require new ideas about how to remediate.
Note: Any special notes required to properly understand the metric.

Quality

Defect Rate

Description: Rate at which new defects are created.
Question: How often do our customers find defects with our products?
Remediation: Increase emphasis on automated tests and clean code.

Regression Density

Description: % of created defects that are regressive.
Question: How often are we breaking things that used to work?
Remediation: Increase emphasis on automated tests and clean code.

Defect Density

Description: % of work in the backlog categorized as defects.
Question: How much of our work is dedicated to bugs vs. features?
Remediation: Increase emphasis on automated tests and clean code.
Also, impose a bug cap – e.g., 5x engineer count. If the team reaches this cap they must stop any new work and instead address defects.
Note: This is normally thought of as defects per line of code (LOC), but if LOC can’t be used as a quality metric then defect density can’t really be measured using LOC either. I’m taking the liberty of redefining this term here.

Support Effort

Description: % of total capacity dedicated to support
Question: How much of our capacity is consumed by support efforts?
Remediation: Look for trends in support tickets and address them with engineering.
Note: It might be useful to subdivide this to track the SOX, GDPR, and other separate support efforts.

Predictability

Unplanned Work Density

Description: % of work that is unplanned; includes support tickets.
Question: How well do our plans match reality?
Remediation: identity and address bottlenecks
Note: Unplanned work is work that the team has to do unexpectedly. If the team takes in extra work due to found capacity, this is “bonus” work.

Estimation Delta

Description: The difference between the estimation and the actual time it took to complete work.
Question: How good are we at estimating?
Remediation: Estimation problems are usually the result of large batch sizes or low-quality code bases. Reduce batch size and implement quality-centric practices such as TDD. Sometimes estimation issues are caused by unclear requirements–though that lack of clarity should automatically result in a higher estimate.

Value Density

Description: % of work that is planned user stories.
Question: What percentage of our effort is value-add versus overhead? (e.g., planned stories vs. bugs, support, and distracting work)
Remediation: Reduce manual overhead through automation.

Turnaround

Lead Time

Description: Time between when a work is requested and when it is delivered.
Question: How long do customers have to wait for the features they ask for?
Remediation: improve cycle time and predictability

Cycle Time

Description: Time between when a work is started and when it is delivered.
Question: How quickly can the team deliver from the time they start work?
Remediation: Clear bottlenecks in the testing and deployment pipeline. Aggressively target unplanned work.

Throughput

Work Items Completed

Description: The number of stories and bugs closed.
Question: What is our throughput in absolute numbers?
Remediation: identity and address bottlenecks.

Estimated Work Completed

Description: The point value of the closed work items.
Question: What is our throughput from an estimation perspective?
Remediation: identity and address bottlenecks.

A New Chapter at Microsoft

I am leaving Redacted Financial Services* in November to manage an IT team at Microsoft. I am changing the focus of my career from the day-to-day tech toward management–strategy over tactics. I’ll be bringing what I know about software engineering into the IT space as well as learning an entirely new set of disciplines.

I’ve worked at Redacted for 6 years. In that time I’ve enjoyed working with a motivated, dedicated group of Software Craftsmen. Other than getting my start writing software, it’s the best time of my professional life. I grew professionally in that time in no small part due to a manager who made room for me to explore my interests and found ways to capitalize on them for the benefit of the company. It is my goal to match his example.

One of the things I accomplished there was founding an internship program which became a feeder program into our development organization for up-and-coming developers. It had the unintended side-effect of creating a mechanism people within the company who had shown an interest in writing software could use to explore a career-change. I’ve worked with close to 30 interns. Some have stayed and worked with us. Others have gone on to companies like Visa, Google, Nordstrom, and Tableau. I’m proud to have played a part in their career development.

I took control of the hiring process for interns which expanded to include running the hiring process for our entire development organization. I learned that the largest impact I could have on my organization is through who I choose to hire. My wife works as an agency recruiter for accounting and finance professionals and with her help I learned how to work with agency recruiters to find the candidates I needed quickly. Hiring is hard and people are seldom properly trained how to do it. The end-result was that we spent less time sorting through resumes and interviewing dud-candidates. Instead, nearly every candidate we talked to was brought on-site. For the most part we were able to hire quickly with only a few cycles through the process.

A couple of years ago our DevOps initiative was going sideways. Known to be a passionate advocate for Software Craftsmanship, I was asked to ride-along with the DevOps group and make recommendations that would get us back on track. I ended up leading that group for the last year and a half. The improvements we made include tracking work in one place, identifying and eradicating root causes of common problems, clearly identifying our customers, identifying standard practices for common work, establishing a customer-centric mindset for the team, and practicing what we preach with respect to quality software. It’s a DevOps team, but we write tests for our scripts and services. In that time the stability and reliability of our production deployments increased dramatically.

In addition to being a technical leader on my team, I began managing other people. I always thought of this responsibility in servant-leadership terms. My role was to collaborate with the employee to make sure s/he is feeling challenged and growing. I learned to be free with my praise and politely direct with my critical feedback. I learned never to give critical feedback without also giving concrete examples of different behavior. I was able to coach my reports through some challenging scenarios and save them the effort of learning everything the hard way.

While I’m the one who did the work to learn these things, I was enabled by a phenomenal manager who gave me room to grow and challenge myself. He listened to my interests and made room for me to explore them–ever confident that it would pay off for the team. It did.

I was also challenged by a group of quality-focused engineers who accepted my ideas when they thought they were good, and who had the courage to speak up when they thought I was off the deep end. Some of my favorite people are my worst critics–and good friends.

Finally, I was aided by a wonderful wife with the highest emotional intelligence of any person I’ve ever encountered. I learned from her how critically important successful communication is and endeavored to apply that learning to my career. I’ve learned that I need to adapt my communication style to my audience–although putting that into practice is still a challenge!

I feel a swell of pride for having these people in my life and at the work we’ve accomplished together. To all of these people I feel a great debt of gratitude.

Thank you All.

 

* One of the interesting “perks” of working for a finance company is that some of them don’t want you to name your employer on social media. The rationale is that if you were to broadcast a stock purchase or otherwise comment on the markets it may be construed by someone else as Financial Advice which would in turn make the company potentially liable for the quality of that advice.
Stay Focused on the Goal, Not the Metrics

The goal is the thing you are trying to do.
The metric is how you are measuring your progress toward the thing you are trying to do. Metrics are only as good as their ability to measure progress toward the goal.

Don’t confuse them.

An Example

Imagine a sales team for an organization selling widgets has a goal to increase sales of a particular product line by 10%. The Sales Manager decides that the best way to achieve the goal is for the sales staff to make a certain number of cold-calls over the following months. After a few weeks, one of her sales staff is falling way behind in cold-calls. The wrinkle is that this salesman is the top biller in the department. Should the sales manager berate her top performer for not doing enough cold-calls?

Absolutely not. No manager should ever punish their reports for doing well (provided the means used are legal and ethical of course).

Punishing the salesman would send the message that billing isn’t the goal, but cold-calls are. Do you want a salesman who makes lots of cold-calls but can’t bill? Since sales staff are compensated via commission, punishing the salesman would introduce a division between his performance and his pay. In the best case, the salesman simply ignores the manager and continues to bill and get paid–benefiting the company in the process. In the worst case, the salesman leaves the company for greener pastures, depriving the company of it’s top biller.

The mistake here is that cold-calls are simply a form of measuring progress toward the goal–increased sales. Cold calls themselves are not the actual goal–they are a proxy for the goal. Further, they may not be the only possible proxy. Their value as a proxy is proportional to the relationship between cold-calls and increased sales. If a salesman is generating increased sales without cold-calls then there is either another possible metric or cold-calls are a poor metric.

It’s one thing to say that cold-calls are a proven way to generate increased sales. It’s quite another to ignore that there are other possible ways to do the same thing. It’s flat wrong to take the position that cold-calls are the only way to increase sales.

What is the appropriate response? Find out how the top biller is selling so well without cold-calls. Is the top biller doing something that no one else is doing? Is there something for the other sales staff to learn? Are there new, better metrics that can be introduced? Of course, it’s also possible that the top biller could bill even more if he did more cold-calls. Finding out will require collaboration between the salesman and the manager–but this is a process of active investigation instead of passive authoritarianism.

If the manager focuses on the metric instead of the goal, she is taking on the responsibility of having all the answers and dictating them to others. The proper approach is to adopt a learning stance toward the team’s work. If the team is doing well but the metrics aren’t being met, what can the manager learn from this? If the team meets the metrics, will they do better? If not–what good are they?

Choosing Metrics

When choosing metrics it’s important to consider that people will game the system. If you’re a software engineering manager and you make Lines of Code or Test Coverage the metric, people will write verbose code and create meaningless tests. A good metric will encourage people to game the system by focusing on the thing you want to achieve. I recently heard of an example in which the Product Owner threw out story sizing as part of their Scrum process. The only thing developers got credit for was the number of stories they completed. It didn’t matter how large or small–credit was only given when the story was completed an in production. The developers began gaming the system by reducing the story size to the smallest thing they could deliver.

Perfect.

NBuilder 5.0.0 Released: Now the .NET Standard 1.6 Support

NBuilder 5.0.0 is now available on NuGet.org.

Breaking Changes

We have dropped support for .NET 3.5. It is becoming cumbersome to support such an old framework in the build chain. We now support .NET 4.0 and above.

Exciting New Features

NBuilder is now available to .NET Core 1.1 applications via .NET Standard 1.6. This was an enormous amount of work made possible in a large part by the efforts of a contributor PureKrome. Thanks PureKrome!

NBuilder 4.0.0 Released

It’s been 5 years since a version of NBuilder was released. As happens to many of us, the original author got busy with life and was unable to spend the time brining it up to date. I volunteered to shepherd the project along, but I was also waylaid by life. However, I was recently able to spend some focused time fixing the final bugs and getting together an automated build with AppVeyor.

Given that it’s been 5 years, it’s impossible to know fully what has been changed. I did put together release notes for the things I know were changed.

My next task will be to port NBuilder to .NET Core

Release Notes for NBuilder 4.0.0

Breaking changes

1. Obsolete methods have been removed.

Any method previously marked with the Obsolete attribute has now been removed.

2. Silverlight No Longer Supported

As Silverlight is effective a dead technology, we have officially ended support for it. This will allow us to better focus on
a forthcoming release with .NET Core support.

New Features

1. Builder has a non-static implementation.

This will allow you to create customized BuilderSettings for different testing scenarios.

Old Code

<br />var results = Builder<MyObject>.CreateListOfSize(10).Build();

New Code

<br />var settings = new BuilderSettings()
var results = new Builder(settings).CreateListOfSize<MyObject>(10).Build();

2. With and Do action now supports a signature that receives an index.

Example

<br />var builderSettings = new BuilderSettings();
var list = new Builder(builderSettings)
        .CreateListOfSize<MyClassWithConstructor>(10)
        .All()
        .Do((row, index) => row.Int = index*2)
        .WithConstructor(() => new MyClassWithConstructor(1, 2f))
        .Build();

Bug Fixes

  • The decimal separator was wrong for some cultures.
  • Random number generation of decimals was sometimes incorrect.
  • Sequences were not created in the correct order.
  • Random strings were not always generated between the expected lengths.
10 Things I Wish I Had Known Before I Switched to DevOps

1. DevOps is hard

It might not seem like it, but DevOps is hard. A few years ago I thought to myself that it can’t be that difficult since installing an individual application isn’t that difficult. I was wrong in part because…

2. Security is hard

Production is scary. I’d rather not have access when possible. On the other hand the tools that we use will definitely need access to production since it’s kind of the reason they exist. This means we have to have very tight control over who has access to the credentials that the tools run under. We work to limit our own day-to-day accounts so that their access is limited as well.

As a developer I didn’t think much about Security. I pretty much just stuffed an AD Group in a config file somewhere when I was told to and I was done. As a DevOps engineer I have had (and will continue) to learn a lot more about security and its organization even though I don’t manage security for my organization. Security impacts deployments at every level so you will have to learn about security infrastructure in order to make safe and practical recommendations to your security administration group.

3. You are not Netflix (unless you are)

Our organization got excited about DevOps tools after seeing some compelling presentations by Netflix at QCon San Francisco. Netflix has the need for highly scaled web servers which fully embrace the “cattle vs. pets” philosophy because they have millions of concurrent users of a publicly facing service.

We are not Netflix. We have 50+ internal applications with usage rates measured in the 10’s. They’re important to us–they run our business–but our problems are not the same ones Netflix faces. The tools that Netflix uses are designed to solve problems Netflix has. That doesn’t necessarily make them a good fit for our needs. We lost a lot of time and effort trying to make Netflix solutions fit our problems.

4. Windows vs. Linux matters when choosing your tools.

There are basically 5 possibilities when it comes to your server topology:

  1. Windows Only
  2. Linux Only
  3. Windows Dominant
  4. Linux Dominant
  5. Hetergeneous

If you are managing a homogeneous ecosystem then it’s imperative that you use tools that natively support that system. Don’t try to use Linux tools to manage Windows and vice versa. If you do, you’re gonna have a bad time. If you are primarily deploying to Windows you should look at tools like Octopus Deploy or Build Master. If you’re managing a Linux ecosystem look into Chef, Puppet, or even Docker.

If you’re managing a mixed ecosystem where one OS was dominant, you should still use tools designed to support the dominant system. It may be worth the effort to see if your existing tools can also manage the subdominant system. In our case it’s not worth the effort so we have instead moved toward an “appliance” model for our Linux servers. What this means is instead of managing a bunch of code to deploy RabbitMQ to Linux, we’re instead creating VM Images for the Rabbit installation which we can hydrate at will. We have far fewer resources who know how to administer Linux so this model works better for us.

5. DevOps tools are in their infancy

DevOps tools are optimized for the problems their creators were facing. There are many more problems in the DevOps space than any of the dominant tools are capable of managing on their own.

For example, Chef wants to deploy a machine. It’s not primarily concerned about applications. The Chef model is to declare the state of the machine and then let Chef decide how to bring the machine to that state. This approach optimizes for horizontally scaling hundreds or thousands of identical nodes with very few commands. Awesome!

In our organization we see the world in terms of Applications–not machines. Our whole way of thinking about deployment is different than the way Chef looks at it. This isn’t a deficiency in Chef or in the way we look at the world, but when we started using Chef we weren’t aware of how fundamental that difference in perspective would actually be.

Because Chef looks at the world in terms of nodes, it has no built-in (or even recommended) solution for artifact and version management. We had to build that. We had to build solutions for managing cookbook versions, publishing artifact and cookbook versions into targeted environments, and forwarding changes to production to antecedent environments.

If you’re using Octopus (we’re migrating from Chef to Octopus) and looking at the world in terms of applications, you will have problems when you need to spin up new environments and whole machines with many applications pre-installed. Either way, you will have to build other tools to glue the off-the-shelf tools together.

(Aside: Though I am not personally a fan of Chef, I have heard of people using Chef to deploy their infrastructure and using Octopus to deploy applications.)

6. DevOps “best practices” are in their infancy

Chef likes to advertise “use Chef however you want! We’re flexible!” Great…. except Chef is complicated and I would like some guidance on how to use it! This isn’t so much a problem with Chef though–DevOps in general is a very young field so we don’t have the wealth of shared experience from which to draw generalized lessons. To the extent that there is guidance it’s basically cribbed from Software Engineering best practices and doesn’t always apply well.

Here are some of mine:

  1. Have a canary environment that rebuilds all machines and redeploys all software on a regularly scheduled basis. Use this environment to detect problems in your deployment tool chain early.
  2. Every developer should have an individual environment of their own to test deployments.
  3. Every team should have at least one environment for testing and/or UAT.
  4. Avoid “Standard Failures.” These are errors that occur often and either do not have a known solution or have a manual workaround. Identify the root cause of errors and address them. Incorporate manual workaround solutions into your automated solutions.
  5. Where possible, embed some sort of “health check” into your applications that you can invoke to have the application check it own configuration.
  6. Identify rollback strategies for your applications.

7. Developers will have to learn infrastructure

If you come from a development background you will have to learn about security, networking, hardware, virtual hardware, etc. This is the domain you are working in now. I’m still at the beginning of this process myself but I’m starting to see the size of how much I still have to learn. For example, if you’re deploying to the cloud you’ll have to learn the inner workings of your chosen cloud infrastructure.

8. Ops will have to learn development patterns and practices.

If you come from an Ops background you will have to learn Software Engineering patterns and practices. You are graduating from someone who writes the occasional script to someone who manages code. Writing some code that only has to be run once is easy. Writing code that has to work again and again and again as well as tolerate change is much, much harder. As the number of people, environments, and machines grow software engineering skills will become more and more important.

9. Don’t automate a bad process.

Consider this: Chef doesn’t provide a built-in way to define which artifacts should be deployed to which environments. To that end we built an “application versions” cookbook which contains a list of all applications, their version, and their artifact location. In order to start work a team must:

  1. Take a branch of the application versions cookbook.
  2. Edit the versions/artifact information.
  3. Upload the cookbook to Chef
  4. commit and push the changes back to github
  5. clone the chef-repo
  6. edit the affected environment to use the new version of the application versions cookbook.
  7. commit/push chef-repo
  8. upload the edited environment to Chef.

Does that sound like a good idea to you? It doesn’t to me–but it’s necessary if you’re going to use a Chef Cookbook as a source for environment application versions. Before you go and wrap some automation around this to make it “easier,” let’s challenge the basic assumption: should we maybe just store application versions by environment elsewhere? A json file on a network share would be easier than this.

When you automate a process (even to make it “easier”) you’re pouring a certain amount of lime over it. Be careful.

10. “Infrastructure as Code!” is not always a good idea.

Code != Artifacts != Configuration. The daily work of DevOps breaks down into basically three disciplines: Code, Configuration, and Artifact management. A change to one of these should not necessitate a change to the other. That means that Code, Configuration, and Artifacts should not live together in github.

Use a Package Manager for your artifacts. If you don’t know where to look check out Artifactory. It’s a versatile artifact repository that supports many different kinds of package managers. It’s API even understands version numbers and will let you identify and retrieve the “latest” version of your artifact. Let your CI server publish artifacts to your package manager and make it the canonical source for artifact retrieval.

Configuration should not be managed like code. Configuration data is any data required by applications to run. Examples are things like dns addresses, email addresses for notifications, database connection strings, api endpoints, etc.. Configuration data is just data about environments. Unlike code it does not need to be branched. It should be stored in some central repository and accessed directly by the deployment code.

The code that you use to execute your deployments is most emphatically and in every possible way code. This means it should be tested, stored in source control, subject to your company’s chosen branching strategies, built by a CI server, etc..

The “infrastructure as code” idea is a really great idea, but it applies only to the procedure of deploying hardware and software. It does not fit well with the metadata that describes which hardware and software should be deployed. Don’t use “infrastructure as code” as an execute to push square pegs into round holes.

Asynchrony in Powershell

As part of our Octopus Deploy migration effort we are writing a powershell module that we use to automatically bootstrap the Tentacle installation into Octopus. This involves maintaining metadata about machines and environments outside of Octopus. The reason we need this capability is to adhere to the “cattle vs. pets” approach to hardware. We want to be able to destroy and recreate our machines at will and have them show up again in Octopus ready to receive deployments.

Our initial implementation cycled through one machine at a time, installing Tentacle, registering it with Octopus (with the same security certificate so that Octopus recognizes it as the same machine), then moving on to the next machine. This is fine for small environments with few machines, but not awesome for larger environments with many machines. If it takes 2m to install Tentacle and I have 30 machines, I’m waiting an hour to be able to use the environment. With this problem in mind I decided to figure out how we could parallelize the boostrapping of machines in our Powershell module.

Start-Job

Start-Job is one of a family of Powershell functions created to support asynchrony. Other related functions are Get-Job, Wait-Job, Receive-Job, and Remove-Job. In it’s most basic form, Start-Job accepts a script block as a parameter and executes it on a background thread.

# executes "dir" on a background thread.
$job = Start-Job -ScriptBlock { dir } 

The job object returned by Start-Job gives you useful information such as the job id, name, and current state. You can run Get-Job to get a list of running jobs, Wait-Job to wait on one or more jobs to complete, Receive-Job to get the output of each job, and Remove-Job to delist jobs in the current Powershell session.

Complexity

If that’s all there was to it, I wouldn’t be writing this blog post. I’d just tweet the link to the Start-Jobs msdn page and call it done. My scenario is that I need to bootstrap machines using code defined in my Powershell module, but run those commands in a background process. I also need to collate and log the output of those processes as well as report on the succes/failure of each job.

When you call Start-Job in Powershell it creates a new session in which currently loaded modules are not automatically loaded. If you have your powershell module in the $PsModulePath you’re probably okay. However, there is a difference between the version of the module I’m currently working on and testing vs. the one I have on my machine for normal use.

Start-Job has an additional parameter for a script block used to initialize the new Powershell session prior to executing your background process. The difficulty is that while you can pass arguments to the background process script block, you cannot pass arguments to the initialization script. Here’s how you make it all work.

Setup Code

<br /># Store the working module path in an environment variable so that the new powershell session can locate the correct version of the module.
# The environment variable will not persist beyond the current powershell session so we don't have to worry about poluting our machine state.
$env:OctobootModulePath = (get-module Octoboot).Path
$init =  {
    # When initializing the new session, use the -Force parameter in case a different version of the module is already loaded by a profile.
    import-module $env:OctobootModulePath -Force
}

# create a parameterized script block
$scriptBlock = {
    Param(
          $computerName,
          $environment,
          $roles,
          $userName,
          $password,
          $apiKey,
    )

    Install-Tentacle -computer $computerName `
        -environment $environment `
        -roles $roles `
        -userName $userName -password $password `
        -apiKey $apiKey
}

# I like to use an -Async switch on the controlling function. Debugging issues is easier in a synchronous context than in an async context. Making the async functionality optional is a win.
if ($async) {
    $job = Start-Job `
        -ScriptBlock $scriptBlock `
        -InitializationScript $init `
        -Name "Install Tentacle on $($computerName)" `
        -ArgumentList @(
             $computerName, $environment, $roles
             $userName, $password,
             $apiKey) -Debug:$debug
} else {
    Install-Tentacle -computer $computerName `
          -environment $environment `
          -roles $roles `
          -userName $userName -password $password `
          -apiKey $apiKey      
}

The above code is in a loop in the controlling powershell function. After I’ve kicked off all of the jobs I’m going to execute, I just need to wait on them to finish and collect their results.

Finalization Code

<br />if ($async) {
    $jobs = get-job
    $jobs | Wait-Job | Receive-Job
    $jobs | foreach {
        $job = $_
        write-host "$($job.Id) - $($job.Name) - $($job.State)"
    }
    $jobs | remove-job
}

Since each individual job is now running in parallel, bootstrapping large environments doesn’t take much longer than bootstrapping smaller ones. The end result is that hour is now reduced to a few minutes.

Why We’re Migrating From Chef to Octopus

Why We’re Migrating From Chef to Octopus

A couple of months ago I started leading a team that was using Chef to manage our deployment infrastructure. We are now migrating from Chef to Octopus Deploy. I thought it might be interesting to explain why. What follows is my opinion and my best understanding of the facts. As always, do your own thinking. πŸ™‚

Chef

Chef takes the DevOps mantra “infrastructure as code” to heart. Everything created for consumption by Chef is code. This is a strength in many ways, but sometimes a weakness.

Architecture

Chef consists of a server which contains metadata about your infrastructure, “cookbooks” which contain instructions for configuring machines and applications, and a client application which does the work of truing up a machine to its representation in the Chef Server.

In Chef, the target of work is a node. When chef_client runs it’s goal is to make the node it’s running on look like Chef Server expects it to. It has little awareness of applications except insofar as you layer that into your cookbook design. chef_client pulls the data it needs from the chef server in order to do its work. This means that in order to update a machine you have to have access to that machine as well as rights to run chef_client. This also means that it takes a bit more effort to be able to redeploy a single application to the node in the event something goes wrong. If you are running single-purpose nodes that’s probably not an issue for you.

It bears mentioning that Chef was designed for Linux first. Windows is an afterthought and it shows. There was no built-in support for common Windows constructs such as IIS and Windows Services at the time of our implementation. (My understanding from a coworker is that Chef now has built-in support for windows services.)

Chef has an api, but it’s nearly impossible to access using standard tools due to a convoluted authentication model. Chef as a CLI called knife which wraps up most api behavior. It should be stated that you cannot initiate activities through the api–that is all done by running chef_client on the target machines. knife can query the api for the machines you want and initiate chef_client runs for you however.

Developer Experience

The developer experience with Chef is all about editing cookbooks. A “cookbook” is a collection of “recipes” that can be executed by chef_client on the target node. Cookbook development involves using a ruby dsl that transpiles to raw ruby on the target node during a process called “convergence.” You edit the cookbook, push the modified cookbook to the chef server, then run chef_client on the node. On the surface this seems simple. It’s not.

If you are staying “in the rails” (rimshots for the ruby joke πŸ™‚ ) with chef and using their built in functions (called “resources”) to do your work, you will mostly be okay. If you are interested in testing your work you will start to feel pain. Chef provides the ability to do “local” testing–by which they mean spinning up VM’s using Vagrant over VirtualBox. “Local” in this context doesn’t mean you get to see the transpiled ruby before it executes, or step into a debugger. It just means that the VM you’re testing on happens to be running on your developer machine.

There are tools or automated testing with Chef cookbooks. We haven’t really learned to use them very well so I can’t personally comment on their quality. My coworker had this to say:

There is a commonly used unit testing framework for Chef, called ChefSpec (which is built upon RSpec). I have issues with ChefSpec, so most of my Chef unit tests are done in pure RSpec. However, ChefSpec is not officially supported by Chef.

There is chef-zero, which allows you to spin up an in memory mock Chef Server, which will allow you to start/pause/stop a mock node convergence, and will allow you to examine attributes at different steps of resolution. Test-Kitchen is the integration testing framework, which spins up actual VMs and uses Vagrant as a driver (it may use Docker now? Not sure), then fully converges that VM as a node. So the Chef ecosystem has the ability to do lots of testing at various levels. I feel it is inaccurate to state that testing is poor in Chef. What is accurate to state is that Chef is Linux first, Windows afterthought. So all the testing goodness in the Chef ecosystem was either unavailable to us as a Windows shop, or it was extremely difficult to use. … we knew that lack of testing was a weakness in our Chef code. We really tried to make it happen, but the hurdles were insurmountable with the resources available.

You can customize chef behavior by writing “resources” and “providers.” This process is not straightforward or easy to understand and frankly I still find it confusing. When writing new code for chef you can use straight ruby, but consuming standard ruby gems is problematic because of the convergence process. It’s not clear when the ruby code will run. Sometimes it runs during convergence. Sometimes it runs at when a recipe is called. Keeping it all straight and making it work as expected requires some mental and technical gymnastics. We had numerous errors caused by ruby code not executing when we thought it would.

Guidance

There is very little guidance about the correct way to use Chef. In fact, Opscode seems to pride itself on this fact. They exclaim on twitter and at conferences “You can use Chef anyway you want!” Great–but how should I use Chef? This is not a trivial question as there are at least 3 kinds of cookbooks in Chef development. There are cookbooks that define data, define utilities or “resources”, and deploy applications. There are 6 levels of overrides for variables. Which data should you put in application cookbooks vs. environment cookbooks? When should you use overrides and how?

Community

Chef has elected to rely on the community to provide cookbooks for most things. Some vendors have created officially supported cookbooks for their products, but most have not. Who can blame them? There’s no clear winner in the DevOps space yet, so why spend the time and money to support a tool that may fizz out? This means that most community cookbooks are of dubious quality. They are generally designed to solve the particular problem the author was facing when they were trying to install something. They may or may not set things up in a recommended fashion. They make assumptions about the OS, version, security, and logging strategies that are by no means universal. This left us in the position of having to read and fully understand the cookbook before deciding whether or not to use it. More often, we just ended up writing our own customized version of the same cookbook.

Conclusion

It’s clear to us that Chef is not a good fit for our organization. We deploy 50+ applications to 40+ environments. Because chef provisions a node and not an application if there’s a problem in any one application the entire deployment is considered a failure. We were never able to get above 80% successful deployments across our environments. Many of the reasons for this failure were non-technical and have nothing to do with the tool, but the complexity of the tool did not help us. Finding expert-level help with Chef on Windows was equally problematic. Even at the conferences we attended, the Windows Chef implementors were like the un-cool kids sitting at a table by themselves in the corner not quite excited by the “ra ra ra” of the pep rally.

If I were going to use Chef I would use it in an ecosystem that had fewer applications and more servers. Let’s say I had a web service that I needed to horizontally scale across 1000 Linux nodes. Chef would be a great fit for this scenario.

My co-worker says:

Chef would be a good choice if
a. You use Linux
b. You have a simple deployment topology, which can be supported by Systems Engineers who can do a little scripting (i.e., use only the built in resources).
c. Or you have a large DevOps engineering team that has enough throughput to dissect the bowels of Chef.

Chef is designed for a setup where there are thousands of nodes, each node is responsible for doing 1 thing, and it is cheap to kill and provision nodes. Because we use Windows, we have hundreds of nodes [and] each node is responsible for many things. It is expensive to kill and provision nodes. Windows is moving towards the cattle not pets paradigm, but it’s not there yet. Chef is designed to work with setups that are already at the cattle stage.

Octopus

Octopus kind of ignores the “infrastructure as code” idea and tries to give you a button-click experience for deployments.

Architecture

Octopus Deploy is composed of a web server and a windows service. The web server stores metadata about your hardware and software ecosystem. The windows service (called Tentacle) is installed on each machine that Octopus will manage and performs the work locally. Octopus was developed api-first which means that all of the functionality you invoke through the Web UI is available through the api. The api is secured using standard models which means it is easy to consume using your favorite tools.

Instructions are packaged up by the server and pushed to the Tentacle agents where they are executed. As long as the Tentacle service account has the requisite permissions, anyone with permission to deploy through Octopus Server can deploy to the target machines even if they don’t otherwise have access to those machines.

Whereas Chef is node-centric, Octopus is application-centric. This is a strength in that it’s easy to deploy or redeploy a single application to a node that may have many applications on it. This is a weakness in that it’s harder to deploy all of the applications required for a particular machine (more button clicks). This weakness will be mitigated by the elastic environment features of the soon-to-be-released 3.4 version.

Octopus separates artifact management from release process management. You can modify your release process without ever editing your deployment artifacts, and vice versa. Deployment artifacts are a first class citizen in Octopus. You will have to layer in your own artifact management into Chef.

Developer Experience

Octopus is written in C# and runs on Windows. The advantage over Chef here is that we get native Active Directory integration for security purposes. In the Octopus world developers publish their applications as NuGet packages which are already understood by most .NET developers. The nuget package defintion lives with the application source code so no external mapping of deployment code to the application is required. Further, it’s obvious which bits go in the NuGet package and which bits will live on the Octopus Server.

Octopus uses one of several variable substitution techniques to manage the config files. Since Octopus was written with Windows deployments in mind, it contains built in processes for deploying IIS applications and Windows Services. If further customization is needed, developers can write custom scripts in Powershell as either a Step Template or a script included in the nuget package.

Unlike Chef, you can test the scripts you write locally with the caveat that you will have to provide any variables they require to execute. Since it’s just Powershell you can use any of the existing tools and debuggers written for Powershell. The lack of a transpilation step means that the code you wrote is the code that will run.

Guidance

Like Chef, Octopus has not “won” the DevOps space. My impression from the community is that the percentage of shops working in the DevOps space at all is fairly low so this isn’t a mark against either tool. However, low adoption means that there are fewer people sharing their lessons learned with the world. If you go looking for Octopus best practices, you won’t find much more than my blog series. Most “best practice” articles and videos about DevOps consist of little more than advice to automate all the things.

On the other hand, the way Octopus itself is structured steers you toward some best practices. Some of these ideas could be layered onto a Chef implementation just as easily, but Octopus is opinionated in such a way that it steers you in the right direction. It’s nice to have Octopus tell us “this is the blessed path to accomplish your goal.”

Octopus distinguishes variables from processes. A Release Process is a set of logic that results in the successful deployment of an application to a machine in an environment. It is informed by variables, which can be scoped to environments and roles (and channels as of version 3.3). Variables can be managed at a project level or in shared libraries called Variable Sets.

Custom logical steps can be defined in powershell, stored and versioned in Octopus, and used as part of the Release Process. These are called Step Templates. Step Templates are usually a lighter weight maintenance effort than a cookbook since they’re just powershell and often targeted at very small units of functionality that can be chained together without much overhead.

In Octopus, a Release is created which binds a deployment artifact to a set of variables and instructions. This release is then promoted through a series of environments defined in a lifecycle. There is no binding of artifacts to cookbooks in Chef unless you write that yourself.

Community

My impression is that the Octopus community is smaller than the Chef community. There is only one sanctioned mechanism for the community to share code–a github repo which Octopus itself reviews before accepting pull requests. The fact that Octopus reviews contributions gives me the impression that the quality of the community code is better than what we found in Chef. Having gone through the review process with them for a new step template we created we know that they are actually looking at the code. On the other hand, since Octopus targets the Windows ecosystem and has built-in support for most things Windows the need for community help isn’t as great.

Conclusion

When we were first learning Octopus, our first deploy from time to install was about 30 minutes. With Chef there were days of training and hours spent tweaking cookbooks until we got our first deployment. Octopus is simply easier for us to understand. We have a small team of 3 dedicated developers to manage 40+ environments that must be deployed with 50+ applications on demand. In addition this team manages Artifactory, TeamCity, and glue-code that ties these systems together. Our goal is to bring our successful deployment rate up to 95+% and not need babysit deployments to production.

For our needs Octopus is the clear winner. We are a .NET shop deploying C# applications to the Windows ecosystem. Octopus is a C# application that runs on Windows. Octopus’ understanding of our problem space is far superior to Chef. The tools they provide are high quality, easy to use, and work with very little configuration. There are gaps to fill, but the fact that Octopus is written API-first gives me confidence that we can easily fill them.

Previous Page · Next Page