Category Archives: Practices
Powershell: How to Write Pipable Functions

Piping is probably one of the most underutilized feature of Powershell that I’ve seen in the wild. Supporting pipes in Powershell allows you to write code that is much more expressive than simple imperative programming. However, most Powershell documentation does not do a good job of demonstrating how to think about pipable functions. In this tutorial, we will start with functions written the “standard” way and convert them step-by-step to support pipes.

Here’s a simple rule of thumb: if you find yourself writing a foreach loop in Powershell with more than just a line or two in the body, you might be doing something wrong.

Consider the following output from a function called Get-Team:

Name    Value
----    -----
Chris   Manager
Phillip Service Engineer
Andy    Service Engineer
Neil    Service Engineer
Kevin   Service Engineer
Rick    Software Engineer
Mark    Software Engineer
Miguel  Software Engineer
Stewart Software Engineer
Ophelia Software Engineer

Let’s say I want to output the name and title. I might write the Powershell as follows:

$data = Get-Team
foreach($item in $data) {
    write-host "Name: $($item.Name); Title: $($item.Value)"
}

I could also use the Powershell ForEach-Object function to do this instead of the foreach block.

# % is a short-cut to ForEach-Object
Get-Team | %{
    write-host "Name: $($_.Name); Title: $($_.Value)"
}

This is pretty clean given that the foreach block is only one line. I’m going to ask you to use your imagination and pretend that our logic is more complex than that. In a situation like that I would prefer to write something that looks more like the following:

Get-Team | Format-TeamMember

But how do you write a function like Format-TeamMember that can participate in the Piping behavior of Powershell? There is documenation about this, but it is often far from the introductory documentation and thus I have rarely seen it used by engineers in their day to day scripting in the real world.

The Naive Solution

Let’s start with the naive solution and evolve the function toward something more elegant.

Function Format-TeamMember() {
    param([Parameter(Mandatory)] [array] $data)
    $data | %{
        write-host "Name: $($_.Name); Title: $($_.Value)"
    }
}

# Usage
$data = Get-Team
Format-TeamMember -Data $Data

At this point the function is just a wrapper around the foreach loop from above and thus adds very little value beyond isolating the foreach logic.

Let me draw your attention to the $data parameter. It’s defined as an array which is good since we’re going to pipe the array to a foreach block. The first step toward supporting pipes in Powershell functions is to convert list parameters into their singular form.

Convert to Singular

Function Format-TeamMember() {
    param([Parameter(Mandatory)] $item)
    write-host "Name: $($item.Name); Title: $($item.Value)"
}

# Usage
Get-Team | %{
    Format-TeamMember -Item $_
}

Now that we’ve converted Format-TeamMember to work with single elements, we are ready to add support for piping.

Begin, Process, End

The powershell pipe functionality requires a little extra overhead to support. There are three blocks that must be defined in your function, and all of your executable code should be defined in one of those blocks.

  • Begin fires when the first element in the pipe is processed (when the pipe opens.) Use this block to initialize the function with data that can be cached over the lifetime of the pipe.
  • Process fires once per element in the pipe.
  • End fires when the last element in the pipe is processed (or when the pipe closes.) Use this block to cleanup after the pipe executes.

Let’s add these blocks to Format-TeamMember.

Function Format-TeamMember() {
    param([Parameter(Mandatory)] $item)

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $($item.Name); Title: $($item.Value)"
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember 

#Output
cmdlet Format-TeamMember at command pipeline position 2
Supply values for the following parameters:
item:

Oh noes! Now Powershell is asking for manual input! No worries–There’s one more thing we need to do to support pipes.

ValueFromPipeLine… ByPropertyName

If you want data to be piped from one function into the next, you have to tell the receiving function which parameters will be received from the pipeline. You do this by means of two attributes: ValueFromPipeline and ValueFromPipelineByPropertyName.

ValueFromPipeline

The ValueFromPipeline attribute tells the Powershell function that it will receive the whole value from the previous function in thie pipe.

Function Format-TeamMember() {
    param([Parameter(Mandatory, ValueFromPipeline)] $item)

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $($item.Name); Title: $($item.Value)"
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember

#Output
Format-TeamMember: Begin
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer
Format-TeamMember: End

ValueFromPipelineByPropertyName

This is great! We’ve really moved things forward! But we can do better.

Our Format-TeamMember function now requires knowledge of the schema of the data from the calling function. The function is not self-contained in a way to make it maintainable or usable in other contexts. Instead of piping the whole object into the function, let’s pipe the discrete values the function depends on instead.

Function Format-TeamMember() {
    param(
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Name,
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Value
    )

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $Name; Title: $Value"
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember

# Output
Format-TeamMember: Begin
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer
Format-TeamMember: End

Alias

In our last refactoring, we set out to make Format-TeamMember self-contained. Our introduction of the Name and Value parameters decouple us from having to know the schema of the previous object in the pipeline–almost. We had to name our parameter Value which is not really how Format-TeamMember thinks of that value. It thinks of it as the Title–but in the context of our contrived module, Value is sometimes another name that is used. In Powershell, you can use the Alias attribute to support multiple names for the same parameter.

Function Format-TeamMember() {
    param(
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Name,
        [Alias("Value")]
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Title # Change the name to Title
    )

    Begin {
        write-host "Format-TeamMember: Begin" -ForegroundColor Green
    }
    Process {
        write-host "Name: $Name; Title: $Title" # Use the newly renamed parameter
    }
    End {
        write-host "Format-TeamMember: End" -ForegroundColor Green
    }
}

# Usage
Get-Team | Format-TeamMember

# Output
Format-TeamMember: Begin
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer
Format-TeamMember: End

Pipe Forwarding

Our Format-TeamMember function now supports receiving data from the pipe, but it does not return any information that can be forwarded to the next function in the pipeline. We can change that by returning the formatted line instead of calling Write-Host.

Function Format-TeamMember() {
    param(
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Name,
        [Alias("Value")]
        [Parameter(Mandatory, ValueFromPipelineByPropertyName)] [string] $Title # Change the name to Title
    )

    Begin {
        # Do one-time operations needed to support the pipe here
    }
    Process {
        return "Name: $Name; Title: $Title" # Use the newly renamed parameter
    }
    End {
        # Cleanup before the pipe closes here
    }
}

# Usage
[array] $output = Get-Team | Format-TeamMember
write-host "The output contains $($output.Length) items:"
$output | Out-Host

# Output
The output contains 10 items:
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer

Filtering

This is a lot of information. What if we wanted to filter the data so that we only see the people with the title “Service Engineer?” Let’s implement a function that filters data out of the pipe.

function Find-Role(){
    param(
        [Parameter(Mandatory, ValueFromPipeline)] $item,
        [switch] $ServiceEngineer
    )

    Begin {
    }
    Process {
        if ($ServiceEngineer) {
            if ($item.Value -eq "Service Engineer") {
                return $item
            }
        }

        if (-not $ServiceEngineer) {
            # if no filter is requested then return everything.
            return $item
        }

        return; # not technically required but shows the exit when nothing an item is filtered out.
    }
    End {
    }
}

This should be self-explanatory for the most part. Let me draw your attention though to the return; statement that isn’t technically required. A mistake I’ve seen made in this scenario is to return $null. If you return $null it adds $null to the pipeline as it if were a return value. If you want to exclude an item from being forwarded through the pipe you must not return anything. While the return; statement is not syntactically required by the language, I find it helpful to communicate my intention that I am deliberately not adding an element to the pipe.

Now let’s look at usage:

Get-Team | Find-Role | Format-Data # No Filter
Name: Chris; Title: Manager
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer
Name: Rick; Title: Software Engineer
Name: Mark; Title: Software Engineer
Name: Miguel; Title: Software Engineer
Name: Stewart; Title: Software Engineer
Name: Ophelia; Title: Software Engineer

Get-Team | Find-Role -ServiceEngineer | Format-TeamMember # Filtered
Name: Phillip; Title: Service Engineer
Name: Andy; Title: Service Engineer
Name: Neil; Title: Service Engineer
Name: Kevin; Title: Service Engineer

Summary

Notice how clean the function composition is: Get-Team | Find-Role -ServiceEngineer | Format-TeamMember!

Pipable functions are a powerful language feature of Powershell <rimshots/>. Writing pipable functions allows you to compose logic in a way that is more expressive than simple imperative scripting. I hope this tutorial demonstrated to you how to modify existing Powershell functions to support pipes.

Powershell: How to Structure a Module

There doesn’t seem to be much guidance as to the internal structure of a Powershell module. There’s a lot of “you can do it this way or that way” guidance, but little “this has worked well for me and that hasn’t.” As a patterns and practices guy, I’m dissatisfied with this state of affairs. In this post I will describe the module structure I use and the reasons it works well for me.

I’ve captured the structure in a sample module for you to reference.

Powershell Module Structure

Posh.psd1

This is a powershell module manifest. It contains the metadata about the powershell module, including the name, version, unique id, dependencies, etc..

It’s very important that the Module id is unique as re-using a GUID from one module to another will potentially create conflicts on an end-user’s machine.

I don’t normally use a lot of options in the manifest, but having the manifest in place at the beginning makes it easier to expand as you need new options. Here is my default psd1 implementation:

# Version number of this module.
ModuleVersion = '1.0'

# Supported PSEditions
# CompatiblePSEditions = @()

# ID used to uniquely identify this module
GUID = '2a97124e-d73e-49ad-acd7-1ea5b3dba0ba'

# Author of this module
Author = 'chmckenz'

# Company or vendor of this module
CompanyName = 'ISG Inc'

# Copyright statement for this module
Copyright = '(c) 2018 chmckenz. All rights reserved.'

ModuleToProcess = "Posh.psm1"

Posh.psm1

This is the module file that contains or loads your functions. While it is possible to write all your module functions in one file, I prefer to separate each function into its own file.

My psm1 file is fairly simple.

gci *.ps1 -path export,private -Recurse | %{
. $_.FullName
}

gci *.ps1 -path export -Recurse | %{
Export-ModuleMember $_.BaseName
}

The first gci block loads all of the functions in the Export and Private directories. The -Recurse argument allows me to group functions into subdirectories as appropriate in larger modules.

The second gci block exports only the functions in the Export directory. Notice the use of the -Recurse argument again.

With this structure, my psd1 & psd1 files do not have to change as I add new functions.

Export Functions

I keep functions I want the module to export in this directory. This makes them easy to identify and to export from the .psm1 file.

It is important to distinguish functions you wish to expose to clients from private functions for the same reason you wouldn’t make every class & function public in a nuget package. A Module is a library of functionality. If you expose its internals then clients will become dependent on those internals making it more difficult to modify your implementation.

You should think of public functions like you would an API. It’s shape should be treated as immutable as much as possible.

Private Functions

I keep helper functions I do not wish to expose to module clients here. This makes it easy to exclude them from the calls to Export-ModuleMember in the .psm1 file.

Tests

The Tests directory contains all of my Pester tests. Until a few years ago I didn’t know you could write tests for Powershell. I discovered Pester and assigned a couple of my interns to figure out how to use it. They did and they taught me. Now I can practice TDD with Powershell–and so can you.

Other potential folders

When publishing my modules via PowershellGallery or Chocolatey I have found it necessary to add additional folders & scripts to support the packaging & deployment of the module. I will follow-up with demos of how to do that in a later post.

Summary

I’ve put a lot of thought into how I structure my Powershell modules. These are my “best practices,” but in a world where Powershell best practices are rarely discussed your mileage may vary. Consider this post an attempt to start a conversation.

Don’t Unit Test NHibernate: Use Generic Repository

I was reading this stack overflow question: How can I solve this: Nhibernate Querying in an n-tier architecture?

The author is trying to abstract away NHibernate and is being counseled rather heavily not to do so. In the comments there are a couple of blog entries by Ayende on this topic:

The false myth of encapsulating data access in the DAL

Architecting in the pit of doom the evils of the repository abstraction layer

Ayende is pretty down on abstracting away NHIbernate. The answers on StackOverflow push the questioner toward just standing up an in-memory Sqlite instance and executing the tests against that.

The Sqlite solution is pretty painful with complex databases. It requires that you set up an enormous amount of data that isn’t really germane to your test in order to satisfy FK and other constraints. The ceremony of creating this extra data clutters the test and obscures the intent. To test a query for employees who are managers, I’d have to create Departments and Job Titles and Salary Types etc., etc., etc.. Dis-like.

What problem am I trying to solve?

In the .NET space developers tend to want to use LINQ to access, filter, and project data. NHibernate (partially) supports LINQ via an extension method off of ISession. Because ISession.Query<T> is an extension method, it is not stubbable with free mocking tools such as RhinoMocks, Moq, or my favorite: NSubstitute. This is why people push you to use the Sqlite solution—because the piece of the underlying interface that you want to use most of the time is not built for stubbing.

I think that a fundamental problem with NHibernate is that it is trying to serve 2 masters. On the one hand it wants to be a faithful port of Hibernate. On the other, it wants to be a good citizen for .NET. Since .NET has LINQ and Java doesn’t, the support for LINQ is shoddy and doesn’t really fit in well the rest of the API design. LINQ support is an “add-on” to the Java api, and not a first-class citizen. I think this is why it was implemented as an extension method instead of as part of the ISession interface.

I firmly disagree with Ayende on Generic Repository. However, I do agree with some of the criticisms he offers against specific implementations. I think his arguments are a little bit of straw man, however. It is possible to do Generic Repository well.

I prefer to keep my IRepository interface simple:

    public interface IRepository : IDisposable
    {
        IQueryable<T> Find<T>() where T: class;

        T Get<T>(object key) where T : class;

        void Save<T>(T value) where T: class;

        void Delete<T>(T value) where T: class;

        ITransaction BeginTransaction();

        IDbConnection GetUnderlyingConnection();
    }

 

Here are some of my guidelines when using a Generic Repository abstraction:

  • My purpose in using Generic Repository is not to “hide” the ORM, but
    • to ease testability.
    • to provide a common interface for accessing multiple types of databases (e.g., I have implemented IRepository against relational and non-relational databases) Most of my storage operations follow the Retrieve-Modify-Persist pattern, so Find<T>, Get<T>, and Save<T> support almost everything I need.
  • I don’t expose my data models outside of self-contained operations, so Attach/Detach are not useful to me.
  • If I need any of the other advanced ORM features, I’ll use the ORM directly and write an appropriate integration test for that functionality.
    • I don’t use Attach/Detach, bulk operations, Flush, Futures, or any other advanced features of the ORM in my IRepository interface. I prefer an interface that is clean, simple, and useful in 95% of my scenarios.
  • I implemented Find<T> as an IQueryable<T>. This makes it easy to use the Specification pattern to perform arbitrary queries. I wrote a specification package that targets LINQ for this purpose.
    • In production code it is usually easy enough to append where-clauses to the exposed IQueryable<T>
    • For dynamic user-driven queries I will write a class that will convert a flat request contract into the where-clause needed by the operation.
  • I expose the underlying connection so that if someone needs to execute a sproc or raw sql there is a convenient way of doing that.
Testing Ninject Bindings

I recently had a subtle production bug introduced by creating more than one Ninject binding for a given interface to the same instance.

I wanted to be able to see what bindings existed for a given interface, but Ninject does not provide an easy way to do that.

This gist contains an extension method I wrote (with the help of a StackOverflow article) to acquire this information.

As this code relies on using reflection to get a private member variable, this code is brittle in the face of a change in the implementation of KernelBase. In the meantime, it works on my machine.

Onion Architecture: An Opinionated Approach, Part 1

The Purpose of Architecture

I believe that the value of explicitly identifying and conforming to an architectural pattern in your applications is two-fold.

  1. It defines the number of boundaries between layers and how they communicate.
  2. Future maintainers can leverage what they’ve learned about one vertical slice to understand the rest of the application.

In most cases I don’t think developers bother to identify their application architecture, and consequently their application doesn’t have one. Or rather, it has many. Each new operation is put together in a slightly different way from all the others. Nothing learned about one area of the system can be used to understand other areas of the system. Nothing in the existing implementations direct you on how to add or modify new or existing features.

What a mess!

Onion Architecture

Full disclosure: I don’t know many architectural patterns. However, I’ve spent the last 5 years of my career groping toward an architecture that was hazy in my mind. The first explicit description I encountered of this pattern that I encountered was in a lecture I attended by Uncle Bob at SCNA 2011. Later I found some other resources.

In The Onion Architecture Jefferey Palermo describes a method of structuring an application such that the work of the application is separated from the infrastructure and loose coupling provides flexibility.

There are two important rules about this architecture. I personally thinking of them as the Dependency Direction Principle and the Dependency Depth Principle.

Palermo writes:

The fundamental rule is that all code can depend on layers more central, but code cannot depend on layers further out from the core.  In other words, all coupling is toward the center.

This principle is about the direction of dependencies. A given application may have a greater or fewer number of layers in its “onion,” but the outer layers should depend on the layers underneath, and the inner layers should have no dependencies on the outer layers.

In Clean Architecture, Uncle Bob calls this The Dependency Rule

The overriding rule that makes this architecture work is The Dependency Rule. This rule says that source code dependencies can only point inwards. Nothing in an inner circle can know anything at all about something in an outer circle. In particular, the name of something declared in an outer circle must not be mentioned by the code in the an inner circle. That includes, functions, classes. variables, or any other named software entity.

By the same token, data formats used in an outer circle should not be used by an inner circle, especially if those formats are generate by a framework in an outer circle. We don’t want anything in an outer circle to impact the inner circles.

Neither of the above-linked posts calls out what I think is another important rule about dependencies. Here, I’m speaking for myself.

The Dependency Depth Principle: Application layers must depend only on the contracts exposed by the layer immediately beneath them.

By controlling dependency depth, the seams between your application layers become well-defined and easily testable. More importantly, changes to the mechanics or api of one layer do not necessitate changes to the surrounding layers.

An Example

Let’s say I have a domain service that performs searching for musical performances. This service accepts a request like this:

   1:      public class QueryPerformanceRequest
   2:      {
   3:          public int[] PerformanceIds { get; set; }
   4:          public string[] Artists { get; set; }
   5:          public string[] Venues { get; set; }
   6:          public bool? Upcoming { get; set; }
   7:   
   8:          public int? Page { get; set; }
   9:          public int? PageSize { get; set; }
  10:      }

 

and returns this:

   1:      public class QueryResponse<T>
   2:      {
   3:          public T[] Results { get; set; }
   4:   
   5:          public int Page { get; set; }
   6:   
   7:          public int PageSize { get; set; }
   8:   
   9:          public int TotalRecords { get; set; }
  10:      }

QueryPerformanceRequest is a parameter object with which the client can indicate which data is being searched for by filling out properties. Some underlying layer is responsible for executing the query against the data store.

My web application application has this controller action:

   1:          public ActionResult Search(string term)
   2:          {
   3:              using (var command = _performanceCommands.Query())
   4:              {
   5:                  var request = _builder
   6:                      .Create<QueryPerformanceRequest>()
   7:                      .From(term)
   8:                      ;
   9:                  var results = command.Execute(request);
  10:   
  11:                  var model = _builder
  12:                      .Create<PerformanceSearchResultsViewModel>()
  13:                      .From(results);
  14:   
  15:                  return View(model);
  16:              }       
  17:          }

What’s happening?

Line 3: Constructs the domain service used to perform the search.

Line 5: I like to use a fluent api for object conversion. I’ll speak more about this later. In this case, we’re just converting the incoming string into the QueryPerformanceRequest object. The converter manages the seam between the Application Services layer (MVC Controller) and the Domain Services layer (the command).

Line 9: This is the line that performs the query.

Line 11: Now that we have results from the query, let’s get them ready for presentation by converting them into a ViewModel.

Line 15: Returns the page with the required data.

Things to notice:

The domain services layer provides a contract in the form of a command to execute the query, a parameter object to describe the query, and the query results. It’s probably that somewhere inside the command database connections are being created, domain models are being searched for, and the results are being transformed back into the response. The controller layer does not directly depend on any of that logic.

Further, the controller has it’s own api independent from that of the domain services layer. Pass me a string and I’ll hand you a ViewModel. None of the details of the domain services layer leak into the View.

Each layer depends only on the layers underneath, and only on the layer directly underneath. No abstraction from an underlying layer crosses more than one seam.

If I change my controller api, the query contracts do not have to change. If I add behavior to my query contracts, the controller gets that behavior without modifications to the existing code. If I change the implementation of the query command, the controller does not have to change. Each layer is protected from changes in the other layers by layer-specific contracts and explicit management of the seams between layers.

About this series

I’m going to be writing a series of posts in which I build a simple application from the ground up. I want to express the Onion Architecture as well as my favorite implementation patterns. I think it will be a good exercise for me to explicitly identify the reasons for many of the things I do in my daily work.

I will host the application on github so you’ll be able to review the full source code as I work.

In my next post in this series, I’ll explain the demo project and start describing the innermost layer of the onion: The domain model.

Adherence to Architecture vs. YAGNI

I had an interesting discussion with my co-workers this morning about architecture.

Some background

We’re putting together a sample web application to demo the default architecture we expect from applications. The idea is that applications should either be written to use the default architecture, or they should be migrating in that direction. As we have new ideas about how things should be done, we can fork the demo application and try our new idea out. Then we can brownbag our new work to the team and share idea. If the team likes it, the pull request will be accepted into the demo project.

Onion Architecture

I was asked to start the demo application work. I am a proponent of the Onion Architecture which clearly discriminates responsibilities among layers and enforces the rule that dependencies only point inward, and only at the layer inside the current layer. In other words, dependency depth for any class should be 0 or 1.

The Problem

In Draw Abstractions From Concretes I wrote:

Waiting until you have an actual need for the abstraction proves that the additional complexity you’re adding is actually necessary and that the abstraction you’re creating is the correct one.

I’m basically just invoking YAGNI with a special focus on how you know “when you need it.” The problem is that in the demo application I’m creating contracts and services for each layer of the application even though the application itself doesn’t have enough functionality (yet) to demonstrate the need for those layers.

Our team is split.

One side believes that YAGNI precludes us from creating an Application Services layer when the Domain Services are sufficient to solve the immediate problem. The specific issue is that Application Services have their own contracts for inputs and outputs that look much like the contracts used by the Domain Services layer. Why not just expose the Domain Services directly? This just creates more complexity than is necessary to solve the problem and makes the code harder to understand.

The other side (me included) believes that choosing an architecture and sticking to it overrides YAGNI. It’s better to have an defined architecture applied consistently even in cases where it’s not strictly necessary from a purely functional perspective. The purpose of having an Application Services layer apart from the Domain Services layer is that it allows them to change independently over time. It’s immaterial that in the beginning much of the behavior is basically pass-through. Anyone who is going to work on the code base will have to learn what Onion architecture is (or whatever architectural pattern is chosen for the application) and the additional complexity will make sense.

What are your thoughts?

Draw Abstractions from Concretes

It’s often tempting to draw a more abstract solution to the problem you’re currently facing. It seems wrong to handled the the local concrete case when you know you’re going to be doing something very similar soon. Why not build a more abstract solution now and reuse it later?

There’s a principle in software called YAGNI. “You Aren’t Going to Need It.” The ideas behind YAGNI are that you shouldn’t introduce complexity into code until you’ve proven you need it. Waiting until you have an actual need for the abstraction proves that the additional complexity you’re adding is actually necessary and that the abstraction you’re creating is the correct one.

That last point bears repeating.

Wait until you have at least 2 or even 3 similar code paths before creating the abstract solution. Otherwise, you may mistake which code needs to be abstracted and end up with an abstraction that is ill-suited to the actual problems you’re trying to solve.

Clean Architecture vs. Startups

Disclaimer: I do not work for a startup. I have never worked for a startup. I am not interested in working for a startup.

Uncle Bob recently wrote an interesting post called “The Startup Trap” which prompted Greg Young at codebetter.com to respond with “Startups and TDD”.

The heart of their disagreement can be captured in two quick quotes:

As time passes your estimates will grow. You’ll find it harder and harder to add new features. You will find more and more bugs accumulating. You’ll start to parse the bugs into critical and acceptable (as if any bug is acceptable!) You’ll create modules that are so fragile you won’t trust yourself, or anyone else, to modify them; so you’ll work around them. You’ll build a festering pile of code that, with every passing week, requires more and more effort just to keep running. Forward progress will slow and falter. It may even reverse as each release becomes buggier and buggier, and less and less stable. Catastrophes will become more and more common as errors, that should never have happened, create corruptions and damage that take huge traunches of time to repair.

–Uncle Bob

What really mattered was that after our nine months of beautiful architecture and coding work we were making approximately 10k/month more than what our stupid production prototype made for all of its shortcomings.

We would have been better off making 30 new production prototypes of different strategies and “throwing shit at the wall” to see what worked than spending any time beyond a bit of stabilization of the first. How many new business opportunities would we have found?

— Greg Young (emphasis in original)

I disgree with the advice that Mr. Young seems to be giving. My initial comment on his post was:

I agree that you shouldn’t have spent a bunch of time building a new application alongside your prototype. You did the right thing in shoring it up and fixing the worst pain points. I personally do not believe in building a green-field app when you already have a working brown-field one.

I’m curious, is your prototype app still in use? Did it survive?

I can understand why Mr. Young’s attitude may be tempting for some developers to embrace, but how would we feel if we heard a comment like this from a used-car salesman? Would you want to do business with a salesman that would sell you a car that was held together with duct-tape and baling wire and then spend his time looking for other business opportunities while you’re stuck using his pile of shit?

Let me ask the question another way. Is working for a startup an excuse to churn out crap software and move on to the next big idea before the company that just paid you starts to notice that festering pile of rot you just created for them?

I’m not personally accusing Mr. Young of having this attitude, but it does seem to capture the attitude I’ve heard expressed by some developers in the startup world.

Update: Uncle Bob’s follow-up.

Refactoring: Replace DataModel with ViewModel

I have an application in which the UI is strongly bound to the data models.  I tried to make some modifications to the NHibernate mappings to better model the database and enable better querying. Since the UI was directly bound to the data models this meant that it have to follow navigation properties from one data model to another to get all the information it needed. Since the UI is no longer connected to the database, the app started generating data access exceptions all over the place. This is the reason that binding data models to the UI is not a good idea, and that data models should not properly be considered to be “domain” objects.

Consider the following simple WindowsForms application:

   1:      public partial class frmOwnerVisits : Form
   2:      {
   3:          public frmOwnerVisits()
   4:          {
   5:              InitializeComponent();
   6:  
   7:              this.Repository = new NHibernateRepository();
   8:          }
   9:  
  10:          private NHibernateRepository Repository { get; set; }
  11:  
  12:          private void btnLoadOwners_Click(object sender, EventArgs e)
  13:          {
  14:              // data is bound directly to the UI
  15:              var owners = this.Repository.All<Owner>();
  16:              this.cmbOwners.DataSource = owners;
  17:          }
  18:  
  19:          private void btnSavePets_Click(object sender, EventArgs e)
  20:          {
  21:              var owner = this.cmbOwners.SelectedItem as Owner;
  22:              foreach(var pet in owner.Pets)
  23:              {
  24:                  this.Repository.Save(pet);
  25:              }
  26:          }
  27:  
  28:          private void cmbOwners_SelectedIndexChanged(object sender, EventArgs e)
  29:          {
  30:              var owner = this.cmbOwners.SelectedItem as Owner;
  31:  
  32:              // Problem area: we're disconnected from the database at this point
  33:              // and we may not have pulled back all the visits for each pet.
  34:              var numberOfVisits = owner.Pets.SelectMany(p => p.Visits).Count();
  35:              this.lblNumberOfVisits.Text = numberOfVisits.ToString("D");
  36:          }
  37:  
  38:      }

The problem area for us at this point is in cmbOwners_SelectedIndexChanged. We want “Pet.Visits” on the data model because it enables easier querying, but having the properties there implies that they are navigable. We won’t know that we have a problem accessing the information at compile-time which results in harder-to-find and diagnose run-time exceptions.

   1:      private IEnumerable<Owner> GetOwnersWhoHaveNotBeenHereInAYear()
   2:      {
   3:          // Without fully expressed relationships, queries such as this would be much harder.
   4:          var results = this.Repository.All<Owner>()
   5:              .Where(row => row.Pets.Any(pet => pet.Visits
   6:                   .All(visit => visit.Date < DateTime.Now.AddYears(1))))
   7:              .ToList()
   8:              ;
   9:  
  10:          return results;
  11:      }

This is a simplified example. An ideal solution would be to create a complete ViewModel that represents the entire form. Let’s pretend that the example if much more complex and involves multiple cooperation user controls and tons of nasty event-driven mess. Let’s further pretend that we’ve decided that replacing Owner with OwnerViewModel would be a manageable chunk of work. How would we be sure we got everything that the UI depends on?

Steps

  1. Extract all methods that retrieve or persist data into an external helper class.
  2. Write Tests against those methods if you haven’t already.
  3. Create Empty ViewModels for each input and output to your Data Operations
  4. Create and test converter classes to translate between DataModels and ViewModels
  5. Replace inputs and outputs of your external helper class with ViewModels.
  6. If you’re working in a statically typed language, use the compiler errors to help you identify which properties and relationships are actually being used by the View.
  7. Fill in the properties and relationships used by the View on your ViewModels.
  8. Regression test the UI to try to find side-effects not covered by existing tests.
  9. Review and Refactor

 

1. Extract methods that retrieve or persist data into an external helper class.

If possible, write integration tests against the UI before making any changes at all.

The data access operations associated with Owner are “Get Owners”, “Save Pets”, and “Determine the number of visits for the owner.”

I extracted those methods into an OwnerController class that encapsulates the data operations.

   1:      public class OwnerController
   2:      {
   3:          public IEnumerable<Owner> GetOwners()
   4:          {
   5:              using (var repository = new NHibernateRepository())
   6:              {
   7:                  var owners = repository.All<Owner>().ToList();
   8:                  return owners;
   9:              }
  10:          }
  11:  
  12:          public int GetNumberOfVisits(Owner owner)
  13:          {
  14:              using (var repository = new NHibernateRepository())
  15:              {
  16:                  var numberOfVisits = repository
  17:                      .All<Visit>()
  18:                      .Count(visit => visit.Pet.Owner.OwnerId == owner.OwnerId)
  19:                      ;
  20:  
  21:                  return numberOfVisits;
  22:              }
  23:          }
  24:  
  25:          public void SavePets(IEnumerable<Pet> pets)
  26:          {
  27:              using (var repository = new NHibernateRepository())
  28:              {
  29:                  foreach (var pet in pets)
  30:                  {
  31:                      repository.SaveOrUpdate(pet);
  32:                  }
  33:              }
  34:          }
  35:  
  36:      }

Then I replaced the direct data access calls in my Form with calls into the OwnerController.

   1:      public partial class frmOwnerVisits : Form
   2:      {
   3:          public frmOwnerVisits()
   4:          {
   5:              InitializeComponent();
   6:  
   7:              this.Controller = new OwnerController();
   8:          }
   9:  
  10:          protected OwnerController Controller { get; set; }
  11:  
  12:          private void btnLoadOwners_Click(object sender, EventArgs e)
  13:          {
  14:              // data is bound directly to the UI
  15:              var owners = this.Controller.GetOwners();
  16:              this.cmbOwners.DataSource = owners;
  17:          }
  18:  
  19:          private void btnSavePets_Click(object sender, EventArgs e)
  20:          {
  21:              var owner = this.cmbOwners.SelectedItem as Owner;
  22:              this.Controller.SavePets(owner.Pets);
  23:          }
  24:  
  25:          private void cmbOwners_SelectedIndexChanged(object sender, EventArgs e)
  26:          {
  27:              var owner = this.cmbOwners.SelectedItem as Owner;
  28:  
  29:              // Problem area: we're disconnected from the database at this point
  30:              // and we may not have pulled back all the visits for each pet.
  31:              var numberOfVisits = this.Controller.GetNumberOfVisits(owner);
  32:              this.lblNumberOfVisits.Text = numberOfVisits.ToString("D");
  33:          }
  34:  
  35:      }

 

2. Write tests against the OwnerController if you haven’t already.

I suggest you always write your tests before creating the new class. As written above, the OwnerController doesn’t really allow for Unit Testing since NHibernate requires a connection to a database. NHibernate supports Sqlite for in-memory database testing and Entity Framework supports Sql Server CE. At some point I’d like to be able to stub in an In-Memory repository to the OwnerController so that I have no external dependencies for my tests, but that may be out of scope for the current operation.

3. Create Empty ViewModels for each input and output to your Data Operations

At this point, there are only 3 data models used by the OwnerController: Owner, Pet, and Visit. I have an idea about Visit so I’m not going to model it yet. I will create empty ViewModels for Owner and Pet though.

   1:      public class OwnerViewModel
   2:      {
   3:  
   4:      }
   5:  
   6:      public class PetViewModel
   7:      {
   8:  
   9:      }

4. Create and test converter classes to translate between DataModels and ViewModels

The OwnerController is going to be altered to return ViewModels instead of DataModels. However, the details of converting DataModels to ViewModels and back should really be dealt with separately. Whether you use a tool like AutoMapper or some custom mapper or converter interface for your transforms, you’ll need tests. These tests will be anemic at first since we don’t have any properties on our ViewModels yet. We’ll fill them in more as we go.

5. Replace inputs and outputs of your external helper class with ViewModels.

Depending on DataModels makes the UI brittle. This coupling needs to be completely broken. Nothing but parameter objects or view models should be exposed outside of your helper class. Data models should not leak into the UI for any reason.

In our case, the OwnerController will use the converters to create and expose viewmodels through it’s interface. The OwnerController now looks like this:

   1:      public class OwnerController
   2:      {
   3:          private readonly IBuilder _builder;
   4:  
   5:          public OwnerController(IBuilder builder)
   6:          {
   7:              _builder = builder;
   8:          }
   9:  
  10:          public IEnumerable<OwnerViewModel> GetOwners()
  11:          {
  12:              using (var repository = new NHibernateRepository())
  13:              {
  14:                  var owners = repository.All<Owner>().ToList();
  15:                  var viewModels = _builder
  16:                      .CreateEnumerable<OwnerViewModel>()
  17:                      .FromEnumerable(owners)
  18:                      ;
  19:                  return viewModels;
  20:              }
  21:          }
  22:  
  23:          public int GetNumberOfVisits(OwnerViewModel owner)
  24:          {
  25:              using (var repository = new NHibernateRepository())
  26:              {
  27:                  var numberOfVisits = repository
  28:                      .All<Visit>()
  29:                      .Count(visit => visit.Pet.Owner.OwnerId == owner.OwnerId)
  30:                      ;
  31:  
  32:                  return numberOfVisits;
  33:              }
  34:          }
  35:  
  36:          public void SavePets(IEnumerable<PetViewModel> petViewModels)
  37:          {
  38:              using (var repository = new NHibernateRepository())
  39:              {
  40:                  var petIds = petViewModels.Select(vm => vm.PetId).ToList();
  41:  
  42:                  var pets = repository.All<Pet>()
  43:                      .Where(pet => petIds.Contains(pet.PetId))
  44:                      .ToDictionary(pet => pet.PetId.Value)
  45:                      ;
  46:  
  47:                  var existingPets = petViewModels.Where(vm => vm.PetId.HasValue);
  48:                  var newPets = petViewModels.Where(vm => !vm.PetId.HasValue);
  49:  
  50:                  foreach(var petViewModel in newPets)
  51:                  {
  52:                      InsertPet(petViewModel, repository);
  53:                  }
  54:  
  55:                  foreach (var petViewModel in existingPets)
  56:                  {
  57:                      var pet = pets[petViewModel.PetId.Value];
  58:                      UpdatePet(pet, petViewModel, repository);
  59:                  }
  60:              }
  61:          }
  62:  
  63:          private static void UpdatePet(Pet pet, PetViewModel petViewModel, NHibernateRepository repository)
  64:          {
  65:              pet.Name = petViewModel.Name;
  66:              pet.Type = petViewModel.Type;
  67:              pet.Breed = petViewModel.Breed;
  68:              pet.BirthDate = petViewModel.BirthDate;
  69:  
  70:              repository.SaveOrUpdate(pet);
  71:          }
  72:  
  73:          private void InsertPet(PetViewModel petViewModel, NHibernateRepository repository)
  74:          {
  75:              var pet = _builder.Create<Pet>().From(petViewModel);
  76:              repository.SaveOrUpdate(pet);
  77:          }
  78:      }

Note that the implementation of line 23 drove us to add OwnerId to the OwnerViewModel.

The implementation of Update drove us to add Name, Type, Breed, and BirthDate to PetViewModel.

6. If you’re working in a statically typed language, use the compiler errors to help you identify which properties and relationships are actually being used by the View.

At this point, the OwnerController is starting to look okay. The OwnerVisits form doesn’t compile anymore mainly because it’s still using Owner, Pets, and Visits to display and edit its data. If we modify the form so that it interacts with OwnerViewModel and PetViewModel instead of Owner and Pet, we’ll get even more compiler errors. OwnerViewModel needs a reference to a Pets collection of type PetViewModel.

The compiler may not find everything. You’ll have to do a bit of manual regression to make sure you found everything. When you do find problems, be sure to document the problems in unit tests. This last is especially important in dynamic languages where you don’t get compiler hints.

7. Fill in the properties and relationships used by the View on your ViewModels.

The OwnerViewModel and PetViewModel now look like this:

   1:      public class OwnerViewModel
   2:      {
   3:          public int? OwnerId { get; set; }
   4:  
   5:          // snipped for brevity
   6:  
   7:          public IList<PetViewModel> Pets { get; set; }
   8:      }
   9: 
  10:      public class PetViewModel
  11:      {
  12:          public int? PetId { get; set; }
  13:  
  14:          public string Name { get; set; }
  15:  
  16:          public string Type { get; set; }
  17:  
  18:          public string Breed { get; set; }
  19:  
  20:          public DateTime? BirthDate { get; set; }
  21:      }

 

8. Regression test the UI to try to find side-effects not covered by existing tests.

In DataBinding scenarios—in both web and desktop applications—data binding is often done via magic strings. This can result in run-time errors not visible at compile time that are hard to test in an automated fashion. In manually regressing the UI behavior you should be able to detect run-time binding errors and add any properties necessary to make the UI work correctly again.

9. Review and Refactor

One thing I noticed while refactoring this code is that numberOfVisits is being calculated against the database every time the selected owner is changed in the combo box. This is inefficient. I’d like to add NumberOfVisits to the PetViewModel and write a function on the OwnerViewModel to aggregate visits per pet for display purposes. This will allow us to calculate the value in-memory and remove a function from the OwnerController. The calculation to determine the number of visits can be done as part of the conversion from DataModel to ViewModel.

Now the ViewModels look like this:

   1:      public class OwnerViewModel
   2:      {
   3:          public int? OwnerId { get; set; }
   4:  
   5:          // snipped for brevity
   6:  
   7:          public IList<PetViewModel> Pets { get; set; }
   8:  
   9:          public int GetNumberOfVisits()
  10:          {
  11:              if (Pets == null)
  12:                  return 0;
  13:  
  14:              var result =Pets.Sum(pet => pet.NumberOfVisits);
  15:              return result;
  16:          }
  17:      }
  18:  
  19:      public class PetViewModel
  20:      {
  21:          public int? PetId { get; set; }
  22:  
  23:          public string Name { get; set; }
  24:  
  25:          public string Type { get; set; }
  26:  
  27:          public string Breed { get; set; }
  28:  
  29:          public DateTime? BirthDate { get; set; }
  30:  
  31:          public int NumberOfVisits { get; set; }
  32:      }

The SelectedIndexChanged event is modified as follows:

   1:          private void cmbOwners_SelectedIndexChanged(object sender, EventArgs e)
   2:          {
   3:              var owner = this.cmbOwners.SelectedItem as OwnerViewModel;
   4:              var numberOfVisits = owner.GetNumberOfVisits();
   5:              this.lblNumberOfVisits.Text = numberOfVisits.ToString("D");
   6:          }

We now have a UI that is bound to ViewModels instead of DataModels. This decouples the UI from the database which prevents errors from occurring when attempting to access properties from related tables that haven’t been loaded. The UI can now change independently of the database. We can change the shape of the ViewModels without changing the shape of the underlying data, and vice versa. The steps to accomplish this decoupling were not particularly hard and have dramatically improved the reliability and flexibility of the code. We are one step closer to sitting our Form on top of a master ViewModel specifically designed for this UI.

Recursive Mocks with Rhino Mocks and NSubstitute

I just learned that you could do this:

Update: 2013-01-25
Note that successive chained mocking calls to RhinoMocks fail. I now have a reason to prefer NSubstitute other than it’s beautifully simple API.

Next Page