Showing all posts by crmckenzie
Learning Rust: Look Ma, No Null!

Introduction

Rust is a systems programming language (think C-like) that makes it easier to perform memory-safe operations than languages like C or C++. It accomplishes this by making it harder to do memory-unsafe operations–and catching these sorts of issues at compile-time instead of runtime.

In order to accomplish this, Rust imposes some constraints on the engineer through the borrow checker and immutable-by-default types. I’m not going to write about those things here as they have been covered in depth by others.

My focus for this post (and other posts in this potential series) is to focus on other language features and idioms that may be unfamiliar to managed-language developers.

No Null

Rust does not have a the concept of null. Consider the following declaration of a struct:

let employee = Employee {
    first_name: "Chris",
    last_name: "McKenzie",
};

In this code sample, an Employee struct is immediately created in memory. There is no option to create a variable of type Employee and leave it unassigned. This might not seem terrible on its face, but what about data we don’t have? What if we add a birth date field to the struct but we don’t know what it is?

struct Employee {
    first_name: &str,
    last_name: &str,
    birth_date: &str,
}

let employee = Employee {
    first_name: "Chris",
    last_name: "McKenzie",
    birth_date: // not sure what to do here?
}

Option

For this scenario, Rust has a built-in enum called Option<T>. An option has two variants: Some and None which are automatically imported for you. To represent the above code, you would write this:

#[derive(Debug)]
struct Employee {
    first_name: String,
    last_name: String,
    birth_date: Option<String>
}

pub fn main() {
    let employee = Employee {
        first_name: "Chris".to_string(),
        last_name: "McKenzie".to_string(),
        birth_date: Option::None
    };

    println!("{:?}", employee);
}

or

#[derive(Debug)]
struct Employee {
    first_name: String,
    last_name: String,
    birth_date: Option<String>
}

pub fn main() {
    let employee = Employee {
        first_name: "Chris".to_string(),
        last_name: "McKenzie".to_string(),
        birth_date: Option::Some(<value>)
    };

    println!("{:?}", employee);
}

Since rust imports std::Option for you automatically, the above can be rewritten without the Option:: prefix:

birth_date: None
...
birth_date: Some(<value>)

Am I Cheating?

Wait! Isn’t “None” just null under the covers?

Well–no. To understand the difference, you first need to understand a few things about enums.

The members of an enum are called variants. Variants can carry data, and variants must be matched.

Let’s examine what this means in practice for a moment.

Building an Enum

Consider the requirement that we represent both salaried and non-salaried employee types. The annual salary must be stored for salaried employees. The hourly rate, overtime rate, and normal hours must be stored for the hourly employee.

In managed languages such as C# you might represent that as follows:

public enum EmployeeType {
    Salaried,
    Hourly,
}

public class Employee {
    public string FirstName {get; set;}
    public string LastName {get; set;}
    public DateTime? BirthDate {get; set;}
    public EmployeeType Type {get; set;}

    // should only be stored for Salaried employees
    public double AnnualSalary {get; set;} 

    // should only be stored for Hourly employees
    public double HourlyRate {get; set;} 
    // should only be stored for Hourly employees
    public double OvertimeRate {get; set;} 
    // should only be stored for Hourly employees
    public int NormalHours {get; set;} 
}

This structure is fine but has the drawback that it’s possible to populate the class with data that’s invalid–e.g., setting the HourlyRate for a Salaried employee is a violation of the business rules–but nothing in the structure of the code prevents it. It’s also possible to fail to provide any of the needed information.

Obviously, you can enforce these rules through additional code, but again–nothing in the structure of the code prevents it–or even communicates the requirements.

I could do exactly the same thing in Rust of course and it doesn’t look much different:

#[derive(Debug)]
pub enum EmployeeType {
    Salaried,
    Hourly,
}

#[derive(Debug)]
pub struct Employee {
    pub first_name: String,
    pub last_name: String,
    pub birth_date: Option<String>,
    pub employee_type: EmployeeType,

    // should only be stored for Salaried employees
    pub annual_salary: Option<f32>,

    // should only be stored for Hourly employees
    pub hourly_rate: Option<f32>, 
    // should only be stored for Hourly employees
    pub overtime_rate: Option<f32>, 
    // should only be stored for Hourly employees
    pub normal_hours: Option<u8>, 
}

pub fn main() {
    let employee = Employee {
        first_name: "Chris".to_string(),
        last_name: "McKenzie".to_string(),
        birth_date: Option::None,
        employee_type: EmployeeType::Salaried,
        annual_salary: None,
        hourly_rate: None,
        overtime_rate: None,
        normal_hours: None,
    };

    println!("{:?}", employee);
}

Ideally, we’d want to bind the annual salary information to the Salaried employee type, and the hourly information to the Hourly employee type. It turns out–we can!

#[derive(Debug)]
pub enum EmployeeType {
    Salaried { annual_salary: f32 },
    Hourly { rate: f32, overtime_rate: f32, normal_hours: u8 } ,
}

#[derive(Debug)]
pub struct Employee {
    pub first_name: String,
    pub last_name: String,
    pub birth_date: Option<String>,
    pub employee_type: EmployeeType,
}

pub fn main() {
    let salaried = Employee {
        first_name: "Chris".to_string(),
        last_name: "McKenzie".to_string(),
        birth_date: Option::None,
        employee_type: EmployeeType::Salaried {
            // one, MILLION DOLLARS! Mwa ha ha ha ha ha ha! (I wish!)
            annual_salary: 1_000_000.00
        },
    };

    println!("{:?}", salaried);

    let hourly = Employee {
        first_name: "Fred".to_string(),
        last_name: "Bob".to_string(),
        birth_date: Some("1/1/1970".to_string()),
        employee_type: EmployeeType::Hourly {
            rate: 20.00, 
            overtime_rate: 30.00, 
            normal_hours: 40
        }
    };

    println!("{:?}", hourly);
}

Suddenly things get very interesting because we can express through the structure of the language what fields are required for which enum variant. Since all fields of a struct must be filled out when it is created, the compiler is able to force us to provide the correct data for the correct variant.

Nice!

Reading our enum variant

The match statement in Rust roughly corresponds to switch or if-else in C# or Java. However, it becomes much more powerful in the context of enum variants that carry data.

#[derive(Debug)]
pub enum EmployeeType {
    Salaried { annual_salary: f32 },
    Hourly { rate: f32, overtime_rate: f32, normal_hours: u8 } ,
}

#[derive(Debug)]
pub struct Employee {
    pub first_name: String,
    pub last_name: String,
    pub birth_date: Option<String>,
    pub employee_type: EmployeeType,
}

pub fn main() {
    let salaried = Employee {
        first_name: "Chris".to_string(),
        last_name: "McKenzie".to_string(),
        birth_date: Option::None,
        employee_type: EmployeeType::Salaried {
            // one, MILLION DOLLARS! Mwa ha ha ha ha ha ha! (I wish!)
            annual_salary: 1_000_000.00
        },
    };


    let hourly = Employee {
        first_name: "Fred".to_string(),
        last_name: "Bob".to_string(),
        birth_date: Some("1/1/1970".to_string()),
        employee_type: EmployeeType::Hourly {
            rate: 20.00, 
            overtime_rate: 30.00, 
            normal_hours: 40
        }
    };

    let employees = vec![salaried, hourly];

    for item in employees {
        // match on employee type
        match item.employee_type {
            EmployeeType::Salaried {annual_salary} => {
                println!("Salaried! {} {}: {}", item.first_name, item.last_name, annual_salary);
            },
            EmployeeType::Hourly {rate, overtime_rate: _, normal_hours: _ } => {
                println!("Salaried! {} {}: {}", item.first_name, item.last_name, rate);
            }
        }

    }

}

I need to point out a couple of things about the match statement.

  1. Each variant must fully describe the data in the variant. I can’t simply match on EmployeeType::Salaried without bringing the rest of the data along.
  2. match requires that every variant of the enum be handled. It is possible to use “_" as a default handling block, but ordinarily you would just handle all of the variants.

Back to Option

The language definition for Option is as follows:

pub enum Option<T> {
    /// No value
    None,
    /// Some value `T`
    Some(T),
}

None is a variant with no data. Some is a variant carrying a tuple with a single value of type T.

Anytime we match on an Option for processing, we are required to handle both cases. None is not the same as null. There will be no “Object reference not set to instance of object.” errors in Rust.

You can play with this code and run it in the playground.

Can This Idea Be Useful Elsewhere?

I’ve you’ve not spent a decent amount of time tracking down null reference exceptions in managed languages, this may not be that interesting to you. On the other hand, the Optional value concept can help drive semantic clarity in your code even if you’re not working in Rust. I’m not arguing that you should stop using null in C# & Java. I am arguing that you should add the concept of Optional values to your toolbox.

The Optional library for C# provides an implementation of this time. It’s worth reviewing the README. Java introduced the Optional type in version 8.

NBuilder 6.0.1 Released

RELEASE NOTES 6.0.1

This is a bug fix release.

  • Bug Guid.Empty being incremented. Solution was to disable NBuilder’s property name for static or read-only fields. Thanks to Dominic Hemken for the PR.
  • Bug CreateListOfSize had undefined behavior when called from the static Builder and executed on multiple threads at the same time. While the future of NBulider will be to remove the static builder, it’s a defect in the current implementation. Thanks to Ovidiu Rădoi for the PR.
Testable Component Design in Rust

I consider myself an advanced beginner in Rust. There is still much I’m wrapping my head around–and I still get caught off guard by the “move” and “mutability” rules Rust enforces. However, in keeping with my personal emphasis, I’ve devoted my efforts to learning how to create automated tests in Rust. The below guidelines are not exhaustive, but represent my learning so far. Feedback is welcome!

Engineering Values

  • Code should be clean.
  • Code should be covered by automated tests.
    • Tests should be relatively easy to write.
  • Dependencies should be configurable by the components that use them (see Depedency Inversion Principle and Ports & Adapters)

Achieving These Values in Rust Component Design

These are great engineering values, but how do we achieve them practically in Rust? Here are my thoughts so far.

Required for Unit Testing

  • The component should provide a stable contract composed of traits, structs, and enums.
  • Structs exposed in the contract layer should be easy to construct in a test.
  • All types exposed in the contract layer should implement derive(Clone, Debug) so that they can be easily mocked in tests.
    • This means that types like failure::Error should be converted to something that is cloneable.

Required for Configurable Dependencies

  • The contract layer should not reference any technology or framework unless it is specifically an extension for that technology or framework.

Empathy

  • Every effort should be made to make the public api surface of your component as easy to use and understand as possible.
  • The contract layer should minimize the use of generics.
    • Obvious exceptions are Result<T> and Option<T>.
    • Concepts like PagedResult<T> that are ubiquitous can also be excepted.
    • Using type aliases to hide the generics does not qualify since the generic constraits still have to be understood and honored in a test.
    • In general this advice amounts to “generics are nice, but harder to understand than flat types. Use with care in public facing contracts.”
  • If a trait exposes a Future as a return result, it should offer a synchronous version of the same operation. This allows the client to opt-in to futures if they need them and ignore that complexity if they don’t.
    • I understand that the client can add the .wait() call to the end of a Future. My point is that an “opt-in” model is friendlier than an “opt-out” model.

Example Hypothetical Contract Surface

#[derive(Clone, Debug)]
struct Employee {
    id: String,
    type: String,
    status: String,
    first_name: String,
    last_name: String,
    address: String,
    city: String,
    birth_date: UTC,
    // snipped for brevity
}

struct PagedResponse<T> { // exposes a generic, but the reason is warranted.
    page_number: i32,
    page_size: i32,
    items: Vec<T>
}


#[derive(Debug, Clone)]
enum MyComponentError {
    Error1(String), // If the context parameter is another struct, it must also derive Clone & Debug
    Error2(i32),
};

#[derive(Clone, Debug)]
struct EmployeesQuery {
    r#type: Option<String>,
    name: String, 
    types: Vec<String>, // matches any of the specified types,
    cities: Vec<String>, // matches any of the specified cities
}

type Result<T> : Result<T, MyComponentError>; // Component level Result. Type aliasing expected here.

trait EmployeeService {
    type Employees = PagedResponse<Employee>;

    // sync version of async_get()
    fn get(id: String) -> Result<Employee>{
        async_get(id).wait();
    }

    fn async_get(id: String) -> Future<Item = Employee, Error = MyCompomentError>;

    // sync version of async_query()
    fn query(query: Option<EmployeesQuery>) -> Employees {
        async_query(id).wait()
    }

    fn async_query(query: Option<EmployeesQuery>) -> Future<Item = Transactions, Error = MyCompomentError>;

    // etc...
}
Non-Technical Engineering Quality Indicators

A PM I work with asked me the following question:
“How can someone who is not close to the engineering read the tea-leaves about the engineering quality of a given project.”

I love this question because it shows the PM cares about engineering quality despite not having engineering expertise. I’m much more accustomed to having to argue that non-technical folks should care about engineering quality. Now I have someone who does care and wants help knowing what to look for. How do I help this person? I have some starting ideas. Some of my favorite engineering manager metrics are easily adaptable for non-technical PM’s.

  1. Does the team rely on manual testing? In my opinion manual testing adds little value. It’s impossible to do a complete regression. There is no future benefit to the effort beyond the immediate release. It’s slow. If you want to remove waste from the delivery pipeline, invest in test automation.
  2. What is the defect/user story ratio in the backlog?
  3. How often are new defects discovered and added to the backlog?
  4. How does the team react to the idea of asking for a near-zero cycle time for defects? To achieve this the team would need:
    a. To have relatively few defects
    b. To receive new defects on an infrequent basis
    c. To have confidence that correcting any defect would take hours instead of days
    e. To have deep knowledge of a well-engineered system so that the exact nature of the problem can be identified quickly
    f. To have confidence that they can pass the system through their quality gates and get into production in less than an hour

The idea that you can have zero defects is sometimes shocking for both PM’s and engineers to consider. It can be an uphill battle to convince them that this is achievable in reality. If your team can’t accept this as a reality, see if they can accept is as a goal and move in this direction.

  1. Is Lead Time increasing? That could indicate that the team can not respond to work as quickly as it arrives. You will also need to know what Lead Time you want. If you want to plan your roadmap in three-month increments, your Lead Time should be between 45 and 180 days–depending on how often you update your roadmap.
  2. Is Cycle Time increasing? If so, it probably means the system is difficult to work in.
  3. Is Cycle Time close to Lead Time? If so, it could indicate a lack of planning, or an inability to work on the plan due to emergent issues.
  4. How long does it take to correct a defect in production? Ideally, this would be ~1hr from discovery.
  5. How much do we spend on support for this service through all channels? (E.g., service desk, engineering time, etc.)

This list is not exhaustive and none of these measurements would be conclusive on their own. As diagnostics however they could be quite useful to draw your attention to potential problems. Most would have to be tracked over a longer period of time to be meaningful.

If you’ve got ideas of your own, please leave them in the comments. This is an important piece of the communication between product management and engineering.

So I See You Want To CI/CD

Avoiding Common Pitfalls When Getting Started With DevOps

If you’re in the planning or early development stages of implementing CI/CD for the first time, this post might help you.

DevOps is all the rage. It’s the new fad in tech! Years ago we were saying we should rely less on manual testing and fold testing into our engineering process. Now we are saying we should rely less on manual deployments and fold deployments and operational support into our engineering process. This all sounds lovely to me!

Having been a part of this effort toward automating more and more of our engineering process for the bulk of my career, I’ve had the opportunity to see CI/CD initiatives go awry. Strangely, it’s not self-evident how to setup a CI/CD pipeline well. It’s almost as if translating theory into practice is where the work is.

There are several inter-related subject-areas that need to be aligned to make a CI/CD pipeline successful. They are:

Let’s talk about each of them in turn.

Source Control

Your code is in source control right? I hate to ask, but I’m surprised by how often I have encountered code that is not in source control. A common answer I get is “yes, except for these 25 scripts we use to perform this or that task.” That’s a “no.” All of your code needs to be in source control. If you’re not sure where to put those scripts, create a /scripts folder in your repo and put them there. Get them in there, track changes, and make sure everyone is using the same version.

It’s customary for the repo structure to look something like

/src
/build
/docs
/scripts
/tests
README.md
LICENSE.md

I also encourage you to consider adopting a 1 repo, 1 root project, 1 releasable component standard separating your repositories. Releasable components should be independently releasable and have a separate lifecycle from other releasable components.

Branching Strategy

You should use a known, well-documented branching strategy. The goal of a branching strategy is to make sure everyone knows how code is supposed to flow through your source control system from initial development to the production release. There are three common choices:

Feature Branch Git Flow Commit to Master
  • Feature branches are taken from master.
  • Features are developed and released separately. They are merged to master at release time.
  • Appropriate for smaller teams who work on one feature at a time.
  • Continuous Integration happens on the feature branch.
  • More sophisticated version of feature branching.
  • Allows multiple teams to work in the project simultaneously while maintaining control over what gets released.
  • Appropriate for larger teams or where simultaneous feature development is needed.
  • Continuous Integration happens on the develop branch.
  • For mature DevOps teams.
  • Use feature flags to control what code is active in production.
  • Requires lifecycle management of feature flags.
  • Continuous Integration happens on master.

Some purists will argue that Continuous Integration isn’t happening unless you’re doing Commit to Master. I don’t agree with this. My take is that as long as the team is actively and often merging to the same branch, then the goal of Continuous Integration is being met.

Automated Build

Regardless of what programming language you are using, you need an automated build. When your build is automated, your build scripts become a living document that removes any doubt about what is required to build your software. You will need an automated build system such as Jenkins, Azure DevOps, or Octopus Deploy. You need a separate server that knows how to run your build scripts and produce a build artifact. It should also programmatically execute any quality gates you may have such as credentials scanning or automated testing. Ideally, any scripts required to build your application should be in your repo under the /build folder. Having your build scripts in source control has the additional advantage that you can use and test them locally.

Automated Testing

Once you can successfully build your software consistently on an external server (external from your development workstation), you should add some quality gates to your delivery pipeline. The first, easiest, and least-expensive quality gate should be unit tests. If you have not embraced Test Driven Development, do so. If your Continuous Integration server supports it, have it verify that your software builds and passes your automated tests at the pre-commit stage. This will prevent commits from making it into your repo if they don’t meet minimum standards. If your CI server does not support this feature, make sure repairing any failed builds or failed automated testing is understood to be the #1 priority of the team should they go red.

Build Artifacts

Once the software builds successfully and passes the initial quality gates, your build should produce an artifact. Examples of build artifacts include nuget packages, maven packages, zip files, rpm files, or any other standard, recognized package format.

Build artifacts should have the following characteristics:

  • Completeness. The build artifact should contain everything necessary to deploy the software. Even if only a single component changed, the artifact should be treated like it is being deployed to a fresh environment.
  • Environment Agnosticism. The build artifact should not contain any information specific to any environment in which it is to be deployed. This can include URL’s, connection strings, IP Addresses, environment names, or anything else that is only valid in a single environment. I’ll write more about this in Environment Segregation.
  • Versioned. The build artifact should carry it’s version number. Most standard package formats include the version in the package filename. Some carry it as metadata within the package. Follow whatever conventions are normally used for your package management solution. If it’s possible to stamp the files contained in the package with the version as well (e.g., .NET Assemblies), do so. If you’re using a zip file, include the version in the zip filename. If you are releasing a library, follow Semantic Versioning. If not, consider versioning your application using release date information (e.g., for a release started on August 15th, 2018 consider setting the version number to 2018.8.15 or 1808).
  • Singleton. Build artifacts should be built only once. This ensures that the artifact you deploy to your test environment will be the artifact that you tested when you go to production.

Deployment Automation

Your deployment process should be fully automated. Ideally, your deployment automation tools will simply execute scripts they find in your repo. This is ideal because it allows you to version and branch your deployment process along with your code. If you build your release scripts in your release automation tool, you will have integration errors when you need to modify your deployment automation for different branches independently.

The output of your build process is a build artifact. This build artifact is the input to your deployment automation along with configuration data appropriate to the environment you are deploying to.

Taking the time to script your deployment has the same benefits as scripting your build–it creates a living document detailing exactly how your software must be deployed. Your scripts should assume a clean machine with minimal dependencies pre-installed and should be re-runnable without error.

Take advantage of the fact that you are versioning your build artifact. If you are deploying a website to IIS, create a new physical directory matching the package and version name. After extracting the files to this new directory, repoint the virtual directory to the new location. This makes reverting to the previous version easy should it be necessary as all of the files for the previous version are still on the machine. The same trick can be accomplished on Unix-y systems using sym-links.

Lastly, your deployment automation scripts are code. Like any other code, it should be stored in source control and tested.

Environment Segregation

I’ve written that you should avoid including any environment-specific configuration in your build artifact (and by extension, in source control), and I’ve said that you should fully automate your deployment process. The configuration data for the target environment should be defined in your deployment automation tooling.

The goal here is to use the same deployment automation regardless of which environment you are deploying to. That means there should be no special steps for special environments.

Most deployment automation tools support some sort of variable substitution for config files. This allows you to keep the config files in source control with defined placeholders where the environment-specific configuration would be. At deployment time, the deployment automation tools will replace the tokens in the config files with values that are meaningful for that environment.

If variable substitution is not an option, consider maintaining a parameter-driven build script that writes out all your config files from scratch. In this case your config files will not be in source control at all but your scripts will know how to generate them.

The end-result of all of this is that you should be able to select any version of your build, point it to the environment of your choice, click “deploy,” and have a working piece of software.

Epilogue

The above is not a complete picture of everything you need to consider when moving towards DevOps. I did not cover concepts such as post-deployment testing, logging & monitoring, security, password & certificate rotation, controlling access to production, or any number of other related topics. I did however cover things you should consider when getting started in CI/CD. I’ve seen many teams attempt to embrace DevOps and create toil for themselves because they didn’t understand the material I’ve covered here. Following this advice should save you the effort of making these mistakes and give you breathing room to make new ones :).

NBuilder 6.0.0 Released

Thank you to the contributors who submitted pull requests for the issues that were important to them. A summary of the changes for NBuilder 6 are as follows:

  • Breaking Change: WithConstructor
    • No longer takes an Expression<Func<T>>.
    • Takes a Func<T>.
    • Marked [Obsolete] in favor of WithFactory
    • This change was to address an issue in which the constructor expression was not being reevaluated for each item in a list.
  • Feature: @AdemCatamak Added support for IndexOf as part of the ListBuilder implementation.
var products = new Builder()
    .CreateListOfSize<Product>(10)
    .IndexOf(0, 2, 5)
    .With(x => x.Title = "A special title")
    .Build();
  • Feature: @PureKrome Added support for DateTimeKind to RandomGenerator
var result = randomGenerator.Next(DateTime.MinValue, DateTime.MaxValue, DateTimeKind.Utc);
  • Feature: Added DisablePropertyNamingFor(PropertyInfo) overload to BuilderSettings.
  • Feature: Added TheRest as an extension to the ListBuilder.
var results = new Builder()
        .CreateListOfSize<SimpleClass>(10)
        .TheFirst(2)
        .Do(row => row.String1 = "One")
        .TheRest()
        .Do(row => row.String1 = "Ten")
        .Build()
    ;
  • Bug: Last item in enum is never generated when generating property values randomly.
  • Bug: Lost strong name when porting to .NET Standard.
  • Bug: Non-deterministic behavior when calling TheLast multiple times for the same range.
Powershell: How to Write Pipable Functions

Piping is probably one of the most underutilized feature of Powershell that I’ve seen in the wild. Supporting pipes in Powershell allows you to write code that is much more expressive than simple imperative programming. However, most Powershell documentation does not do a good job of demonstrating how to think about pipable functions. In this tutorial, we will start with functions written the “standard” way and convert them step-by-step to support pipes.

Here’s a simple rule of thumb: if you find yourself writing a foreach loop in Powershell with more than just a line or two in the body, you might be doing something wrong.

Consider the following output from a function called Get-Team:

Name    Value
----    -----
Chris   Manager
Phillip Service Engineer
Andy    Service Engineer
Neil    Service Engineer
Kevin   Service Engineer
Rick    Software Engineer
Mark    Software Engineer
Miguel  Software Engineer
Stewart Software Engineer
Ophelia Software Engineer

Let’s say I want to output the name and title. I might write the Powershell as follows:

$data = Get-Team
foreach($item in $data) {
    write-host "Name: $($item.Name); Title: $($item.Value)"
}

I could also use the Powershell ForEach-Object function to do this instead of the foreach block.

# % is a short-cut to ForEach-Object
Get-Team | %{
    write-host "Name: $($_.Name); Title: $($_.Value)"
}

This is pretty clean given that the foreach block is only one line. I’m going to ask you to use your imagination and pretend that our logic is more complex than that. In a situation like that I would prefer to write something that looks more like the following:

Get-Team | Format-TeamMember

But how do you write a function like Format-TeamMember that can participate in the Piping behavior of Powershell? There is documenation about this, but it is often far from the introductory documentation and thus I have rarely seen it used by engineers in their day to day scripting in the real world.

The Naive Solution

Let’s start with the naive solution and evolve the function toward something more elegant.

Function Format-TeamMember() {
    param([Parameter(Mandatory)] [array] $data)
    $data | %{
        write-host "Name: $($_.Name); Title: $($_.Value)"
    }
}

# Usage
$data = Get-Team
Format-TeamMember -Data $Data

At this point the function is just a wrapper around the foreach loop from above and thus adds very little value beyond isolating the foreach logic.

Let me draw your attention to the $data parameter. It’s defined as an array which is good since we’re going to pipe the array to a foreach block. The first step toward supporting pipes in Powershell functions is to convert list parameters into their singular form.

Convert to Singular

Function Format-TeamMember() {
    param([Parameter(Mandatory)] $item)
    write-host "Name: $($item.Name); Title: $($item.Value)"
}

# Usage
Get-Team | %{
    Format-TeamMember -Item $_
}

Now that we’ve converted Format-TeamMember to work with single elements, we are ready to add support for piping.

Begin, Process, End

The powershell pipe functionality requires a little extra overhead to support. There are three blocks that must be defined in your function, and all of your executable code should be defined in one of those blocks.

  • Begin fires when the first element in the pipe is processed (when the pipe opens.) Use this block to initialize the function with data that can be cached over the lifetime of the pipe.
  • Process fires once per element in the pipe.
  • End fires when the last element in the pipe is processed (or when the pipe closes.) Use this block to cleanup after the pipe executes.

Let’s add these blocks to Format-TeamMember.

Function Format-TeamMember() {
    param([Parameter(Mandatory)] $item)

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $($item.Name); Title: $($item.Value)"
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember 

#Output
cmdlet Format-TeamMember at command pipeline position 2
Supply values for the following parameters:
item:

Oh noes! Now Powershell is asking for manual input! No worries–There’s one more thing we need to do to support pipes.

ValueFromPipeLine… ByPropertyName

If you want data to be piped from one function into the next, you have to tell the receiving function which parameters will be received from the pipeline. You do this by means of two attributes: ValueFromPipeline and ValueFromPipelineByPropertyName.

ValueFromPipeline

The ValueFromPipeline attribute tells the Powershell function that it will receive the whole value from the previous function in thie pipe.

Function Format-TeamMember() {
    param([Parameter(Mandatory, ValueFromPipeline)] $item)

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $($item.Name); Title: $($item.Value)"
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember

#Output
Format-TeamMember: Begin
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer
Format-TeamMember: End

ValueFromPipelineByPropertyName

This is great! We’ve really moved things forward! But we can do better.

Our Format-TeamMember function now requires knowledge of the schema of the data from the calling function. The function is not self-contained in a way to make it maintainable or usable in other contexts. Instead of piping the whole object into the function, let’s pipe the discrete values the function depends on instead.

Function Format-TeamMember() {
    param(
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Name,
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Value
    )

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $Name; Title: $Value"
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember

# Output
Format-TeamMember: Begin
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer
Format-TeamMember: End

Alias

In our last refactoring, we set out to make Format-TeamMember self-contained. Our introduction of the Name and Value parameters decouple us from having to know the schema of the previous object in the pipeline–almost. We had to name our parameter Value which is not really how Format-TeamMember thinks of that value. It thinks of it as the Title–but in the context of our contrived module, Value is sometimes another name that is used. In Powershell, you can use the Alias attribute to support multiple names for the same parameter.

Function Format-TeamMember() {
    param(
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Name,
        [Alias("Value")]
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Title # Change the name to Title
    )

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $Name; Title: $Title" # Use the newly renamed parameter
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember

# Output
Format-TeamMember: Begin
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer
Format-TeamMember: End

Pipe Forwarding

Our Format-TeamMember function now supports receiving data from the pipe, but it does not return any information that can be forwarded to the next function in the pipeline. We can change that by returning the formatted line instead of calling Write-Host.

Function Format-TeamMember() {
    param(
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Name,
        [Alias("Value")]
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Title # Change the name to Title
    )

    Begin {
        # Do one-time operations needed to support the pipe here
    }
    Process {
        return "Name: $Name; Title: $Title" # Use the newly renamed parameter
    }
    End {
        # Cleanup before the pipe closes here
    }
}

# Usage
[array] $output = Get-Team | Format-TeamMember
write-host "The output contains $($output.Length) items:"
$output | Out-Host

# Output
The output contains 10 items:
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer

Filtering

This is a lot of information. What if we wanted to filter the data so that we only see the people with the title “Service Engineer?” Let’s implement a function that filters data out of the pipe.

function Find-Role(){
    param(
        [Parameter(Mandatory, ValueFromPipeline)] $item,
        [switch] $ServiceEngineer
    )

    Begin {
    }
    Process {
        if ($ServiceEngineer) {
            if ($item.Value -eq "Service Engineer") {
                return $item
            }
        }

        if (-not $ServiceEngineer) {
            # if no filter is requested then return everything.
            return $item
        }

        return; # not technically required but shows the exit when nothing an item is filtered out.
    }
    End {
    }
}

This should be self-explanatory for the most part. Let me draw your attention though to the return; statement that isn’t technically required. A mistake I’ve seen made in this scenario is to return $null. If you return $null it adds $null to the pipeline as it if were a return value. If you want to exclude an item from being forwarded through the pipe you must not return anything. While the return; statement is not syntactically required by the language, I find it helpful to communicate my intention that I am deliberately not adding an element to the pipe.

Now let’s look at usage:

Get-Team | Find-Role | Format-Data # No Filter
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer

Get-Team | Find-Role -ServiceEngineer | Format-TeamMember # Filtered
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer

Summary

Notice how clean the function composition is: Get-Team | Find-Role -ServiceEngineer | Format-TeamMember!

Pipable functions are a powerful language feature of Powershell <rimshots/>. Writing pipable functions allows you to compose logic in a way that is more expressive than simple imperative scripting. I hope this tutorial demonstrated to you how to modify existing Powershell functions to support pipes.

Powershell: How to Structure a Module

There doesn’t seem to be much guidance as to the internal structure of a Powershell module. There’s a lot of “you can do it this way or that way” guidance, but little “this has worked well for me and that hasn’t.” As a patterns and practices guy, I’m dissatisfied with this state of affairs. In this post I will describe the module structure I use and the reasons it works well for me.

I’ve captured the structure in a sample module for you to reference.

Powershell Module Structure

Posh.psd1

This is a powershell module manifest. It contains the metadata about the powershell module, including the name, version, unique id, dependencies, etc..

It’s very important that the Module id is unique as re-using a GUID from one module to another will potentially create conflicts on an end-user’s machine.

I don’t normally use a lot of options in the manifest, but having the manifest in place at the beginning makes it easier to expand as you need new options. Here is my default psd1 implementation:

# Version number of this module.
ModuleVersion = '1.0'

# Supported PSEditions
# CompatiblePSEditions = @()

# ID used to uniquely identify this module
GUID = '2a97124e-d73e-49ad-acd7-1ea5b3dba0ba'

# Author of this module
Author = 'chmckenz'

# Company or vendor of this module
CompanyName = 'ISG Inc'

# Copyright statement for this module
Copyright = '(c) 2018 chmckenz. All rights reserved.'

ModuleToProcess = "Posh.psm1"

Posh.psm1

This is the module file that contains or loads your functions. While it is possible to write all your module functions in one file, I prefer to separate each function into its own file.

My psm1 file is fairly simple.

gci *.ps1 -path export,private -Recurse | %{
. $_.FullName
}

gci *.ps1 -path export -Recurse | %{
Export-ModuleMember $_.BaseName
}

The first gci block loads all of the functions in the Export and Private directories. The -Recurse argument allows me to group functions into subdirectories as appropriate in larger modules.

The second gci block exports only the functions in the Export directory. Notice the use of the -Recurse argument again.

With this structure, my psd1 & psd1 files do not have to change as I add new functions.

Export Functions

I keep functions I want the module to export in this directory. This makes them easy to identify and to export from the .psm1 file.

It is important to distinguish functions you wish to expose to clients from private functions for the same reason you wouldn’t make every class & function public in a nuget package. A Module is a library of functionality. If you expose its internals then clients will become dependent on those internals making it more difficult to modify your implementation.

You should think of public functions like you would an API. It’s shape should be treated as immutable as much as possible.

Private Functions

I keep helper functions I do not wish to expose to module clients here. This makes it easy to exclude them from the calls to Export-ModuleMember in the .psm1 file.

Tests

The Tests directory contains all of my Pester tests. Until a few years ago I didn’t know you could write tests for Powershell. I discovered Pester and assigned a couple of my interns to figure out how to use it. They did and they taught me. Now I can practice TDD with Powershell–and so can you.

Other potential folders

When publishing my modules via PowershellGallery or Chocolatey I have found it necessary to add additional folders & scripts to support the packaging & deployment of the module. I will follow-up with demos of how to do that in a later post.

Summary

I’ve put a lot of thought into how I structure my Powershell modules. These are my “best practices,” but in a world where Powershell best practices are rarely discussed your mileage may vary. Consider this post an attempt to start a conversation.

Powershell Gems: Array Comparisons

There is a shorthand syntax that can be applied to arrays to apply filtering. Consider the following syntactically correct Powershell:

1,2,3,4,5 | ?{ $_ -gt 2 } # => 3,4,5

You can write the same thing in a much simpler fashion as follows:

1,2,3,4,5 -gt 2 => 3,4,5

In the second example, Powershell is applying the expression -gt 2 to the elements of array and returning the matching items.

Null Coalesce

Unfortnately, Powershell lacks a true null coalesce operator. Fortunately, we can simulate that behavior using array comparisons.

($null, $null, 5,6, $null, 7).Length # => 6
($null, $null, 5,6, $null, 7 -ne $null).Length # => 3
($null, $null, 5,6, $null, 7 -ne $null)[0] # => 5

Powershell Gems: Destructuring

Destructuring

What is destructuring?

Destructuring is a convenient way of extracting multiple values from data stored in (possibly nested) objects and Arrays. It can be used in locations that receive data (such as the left-hand side of an assignment).

source

Here is an example of destructuring in powershell.

$first, $second, $therest = 1,2,3,4,5
$first
1
$second
2
$therest
3
4
5

As you can see, Powershell assigns the first and second values in the array to the variables $first and $second. The remaining items are then assigned to the last variable in the assignment list.

Gotchas

If we look at the following Powershell code nothing seems out of the ordinary.

$arr = @(1)
$arr.GetType().FullName
System.Object[]

However, look at this code sample:

# When Function Returns No Elements
Function Get-Array() { 
    return @() 
} 
$arr = Get-Array
$arr.GetType()
You cannot call a method on a null-valued expression.
At line:1 char:1
+ $arr.GetType()
+ ~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

$arr -eq $null
True

# When Function Returns One Element
Function Get-Array() { 
    return @(1)  
}
$arr = Get-Array
$arr.GetType().FullName
System.Int32

# When Function Returns Multiple Elements
Function Get-Array() { 
    return @(1,2)
} 
$arr = Get-Array
$arr.GetType().FullName
System.Object[]

When returning arrays from functions, if the array contains only a single element, the default Powershell behavior is to destructure it. This can sometimes lead to confusing results.

You can override this behavior by prepending the resultant array with a ‘,’ which tells Powershell that the return type should not be destructured:

# When Function Returns No Elements
Function Get-Array() {
    return ,@() 
} 
$arr = Get-Array
$arr.GetType().FullName
System.Object[]

# When Function Returns One Element
Function Get-Array() {
    return ,@(1) 
} 
$arr = Get-Array
$arr.GetType().FullName
System.Object[]

# When Function Returns Multiple Elements
Function Get-Array() {
    return ,@(1,2)
}
$arr = Get-Array
$arr.GetType().FullName
System.Object[]

Next Page