Showing all posts by crmckenzie
Design Considerations of Public Functions

Writing a function is an act of design. You must choose the placement (both physical and logical), signature, and name of the function carefully. Most people understand vaguely that functions should have minimal scope, but I’d like to draw attention to a couple of design implications of public functions that I haven’t heard discussed much. These features are "permission" and “precedent.” Simply by existing, a function indicates that it is okay to call it, and establishes a precedent for the choices that led you to create it with the characteristics you chose.

Permission

When would you not want to publicly expose a function? Or what are the reasons you might wish to refuse calling permission to a function?

Here is a typical example: Suppose you have a WidgetListEditor class that manages a list of widgets bound to a list editing control on a screen. Suppose that part of the class’s task is to sort the list of widgets prior to binding to the grid. The sorting of the list is ancillary to the primary task of managing the binding to the  grid. Clients of the code should not instantiate an instance of WidgetListEditor to sort the contents of some arbitrary list of widgets elsewhere in the system. For this reason, the sorting methods of WidgetListEditor are internal concerns, and should not be exposed publicly. If the sorting methods are needed externally to WidgetListEditor, they should be extracted into first-class components of their own and provided to WidgetListEditor via some form of Dependency Injection.

Why would it be wrong to simply allow the clients of the code to instantiate WidgetListEditor at will and use the sorting method? It exposes the rest of the code to changes in WidgetListEditor. WidgetListEditor could be modified such that it loads an expensive resource when constructed. The maintainers of WidgetListEditor have no reason to believe that adding the expensive resource should cause any problems because as far as they know, this class is only instantiated once on a couple of screens. Suddenly the system is brought to a crawl because it is in fact being used all over the place as a means of “reusing” the sorting code. This is a problem of responsibility.

Or:

WidgetListEditor needs to be changed to allow arbitrary sorting by the user. This requires that the sorting methods be changed to accept a sorting criteria–but that change is made more difficult because the methods are being consumed application-wide instead of on the one or two screens the class was built for.

Or:

WidgetListEditor no longer needs to presort the contents of the list before binding, so the developer wishes to remove the sorting code–only s/he can’t because the methods are being consumed application-wide instead of on the one or two screens the class was built for.

There’s a recurring theme here: What was the class built-for? What is its exposure to change? A public method is a kind of contract with the rest of the code. Changes to public methods must be considered with greater care because of the potential effects with the rest of the code-base. When we make a method private, we reduce the potential of damage to the system if the method needs to change, or be removed altogether. This is the design advantage of encapsulation. For this reason, every function in the system (and every class in a library) needs to have the least significant scope it is possible to give it.

There is another kind of permissions problem related to public methods that is harder to pinpoint. Sometimes a method is created that does a narrow-purpose task that is needed within the system, but has the potential to corrupt the system. Posit a financial system in which every transaction is recorded as a series of debit-credit pairs. For the most part the system does not allow direct editing of entries. Instead it functions through a series of transaction templates. You submit a transaction, and the template tells it which accounts to debit, to credit, and how much, etc.. It is discovered that a rounding error creates a small reporting variance and your manager assigns you the task of creating a manual correction screen.

Since this is a transactional system, the right way to solve this problem is to create an AdjustmentTransactionTemplate and enter it into the system, but that would take a week, and you only have a day. So, you take a short-cut. You create a new method in the data access layer that lets you simply update the entry directly. You reason that since this is the only code that calls this method, it’s safe because you will be sure to update both the debit and credit together, and with the same amount. You do this, turn in the work, and everybody is happy.

What’s the problem here? There is now a method in the data access layer that allows you to manually update a specific entry without reference to a transaction template. It’s existence implies that it’s okay to use it. The methods that accept the transaction template can validate the template to verify that the transaction will balance on debits and credits. This manual entry edit method can do no such validation. This means that it is now possible to get bad data into the system which could have catastrophic consequences. At this point, it is a matter of "when, " not "if" that will happen. If you or your manager considers "it’s faster” to be a general justification for violating the design of a system, then your project will become less maintainable as time goes by. You may need to update your design to make it faster to develop against—but that’s a different problem than I’m describing here.

Precedent

Consider either of the above scenarios. What danger does the existence of either of the public functions pose to the rest of the code base? Isn’t the danger isolated to the particular components involved? What danger does the precedent pose to a code base?

In the legal system, the decision of a higher court on a relevant point is binding as legal-precedent for a lower court. Conversely, the decision of a lower-court on some legal matter is used as guidance to a higher court faced with the same decision. The reason for this is not that lawyers are crazy, but rather that the enormity of concrete circumstances possible in a legal case are too numerous for any one person or team to keep track of. When a new set of circumstances is considered by a court, it must make the best decision it can within the context of existing law and existing legal precedent. In most cases this keeps the range of legal issues organized in an intelligible manner. However, sometimes the courts make a decision that violates our sense of justice or propriety. In these cases, we can “refactor” by encouraging the legislature to explicitly change the law.

The key point is this: existing code is the best guideline for how to add new features to a code-base. You cannot expect future developers, to look at your code-base and infer the principles you built the application on if you are violating those principles yourself. Further, if you are willing to violate the design principles in one case, you will likely find it much easier to violate them again and again. This means that even though you “think” your application was developed according to a particular design, “with a couple of exceptions,” it will not be apparent to anyone else. Objectively the, the code was not designed according to any principles at all, but to your whims. To put the point another way, intent is communicated not by what’s in our heads, but what’s in our code.

If you use WidgetListEditor for sorting some place else in your code, other developers will think that that’s what you intended. If you bypass the transaction template logic in the financial system, other developers will think it’s safe to do that as well. The question of whether code is designed well is more than just a question of whether it works.  It’s a question of whether the intent of the design is consistently expressed by the code that implements it.

Conventions vs. Design

Conventions often function as a substitute for failure to grasp design principles. For example, I’ve seen teams get up in arms about using Hungarian Notation but then write meaningless function names like “DoIt()” or “Process()”, or meaningless variable names like “intVar” or “intX”. These same teams often have little or no understanding of design principles , patterns, or concepts. I have met so many developers that have never heard of YAGNI, LRM, SRP, LSP, Dependency Injection. How many developers do you know that have heard of patterns such as Singleton, Adapter, Decorator, Factory, Monostate, Proxy, Composite, or Command?

The use of conventions is often invoked as a means of aiding code clarity. And it does, but it is not the only or even the best means of attaining that goal. We need to learn to name classes, functions, and variables in meaningful ways; to write small functions that do one tiny little thing, and do it well; to write small single-purpose classes. All of the conventions in the world will not help you maintain a code-base if developers are free to write 3000-line methods with 27 layers of nested-conditionals and loops (and yes, I have seen this). Read “Clean Code” by Uncle Bob.

Design patterns, techniques, and processes are much harder to learn and master than a concrete list of do’s and don’ts (which is what most conventions amount to). Good software design is like playing a musical instrument–it requires dedication, repetition, and practice to learn and master.

Conventions do have a place. However, conventions are usually suggested by the provider of the tools you are using. For example, Microsoft has a conventions document for the .NET Framework which is applicable to all .NET Languages. (Thankfully, they eschew Hungarian Notation). It is true that this is worth learning simply in order to be a good developer citizen. My recommendation would be to simply use the conventions established by the tool-provider, or the dominant conventions within the community around the tool. Don’t waste company time and money adding your own tweaks and changes to the conventions document. Speaking personally, I would much rather maintain a code-base that is written cleanly and with good design and that violates every convention you could name.

Conventions are not a bad thing. The problem with conventions is that they are so often discussed in place of design principles. Conventions without design principles aid nothing. Without proper focus on good design, conventions can even hurt software quality because they give developers and managers the illusion of being thoughtful and disciplined about what they are doing.

Conventions can be useful in another way. There is a good bit of discussion going on right now about Convention over Configuration. CoC is a design short-cut, and consists of making decisions about how components of a software system interact by default, and only adding additional documentation when components deviate from the norm. CoC actually bolsters my point here, because it only tends to arise as a discussion point in systems that are already using good design patterns and practices.

Posted on April 3, 2010, 12:15 pm By
2 comments Categories: Design Tags:
The Presentation

So I spoke at GSP Developers tonight on TDD. It went well, though I was a bit nervous. I got through the example I chose with less code and fewer tests than I had with any previous attempt, though I still took about the same amount of time. I enjoyed the questions from the audience, and the event organizer made some great observations as well. I think I might enjoy doing something more long-form, such as a mini code-camp for teaching TDD.

Software Internships Won’t Help

Editors Note: I have since changed my mind on this post as I have developed and run a successful internship program for many years now. However, I will preserve this as-is for posterity.


Read this post from Uncle Bob, and be sure to watch the video—that’s the scary part.

https://sites.google.com/site/unclebobconsultingllc/blogs-by-robert-martin/developer-certification-wtf

I take issue with Uncle Bob’s idea of internship. Having Senior Developers mentor interns can create a culture of inbred, non-innovative practices. The temptation is to think that you can teach smart development practices. All you can teach is a litany of concretes. The junior developer has to turn their brain on to grasp the principles involved. In short, I don’t think Uncle Bob’s solution is that new, or that it would solve any of the problems he describes.

I think the problem is deeper than just lack of skill or training. What software development needs is a revolution in values. We as developers need to value high-quality, unit-tested, organized, clean code. We need to value the drive to excellence, not just the willingness but the desire to learn, and make room for innovation. Software development managers need to refuse to tolerate anything less. We need to stop sacrificing design and test to the altar of the arbitrary dead-line. Fortunately, we don’t need to start that revolution as it is already under way—Uncle Bob’s post being a recent shot fired in that war.

Update:

The link to Uncle Bob’s original url moved. I updated it.

A Good Project Manager

I was interested to read Roy Osherove’s account of his worst team leaders recently. Jason Crawford writes about what he thinks makes good team managers. They are not talking about the same role I think.

Roy is talking about a technical lead on a team of developers and his basic problem is the technical lead’s perceived lack of or interest in technical ability. His criticisms fall into basically two categories: training, and judgment. He wants a technical lead to help make him a better developer by judging it. A lead that refuses to judge is no lead at all.

Jason Crawford draws portraits of three kinds of managers, but the best, he says, focus on communicating values. He is not suggesting that a good project manager should moralize to his employees, but rather that the PM should have a clear idea of the values of the company and ensure that the work his reports do conforms to those values. If we apply this to Roy’s technical lead, Roy’s technical lead should value technical ability to the point that he would be willing to point out mistakes and help the developer become better at his job.

I recently had occasion to write a recommendation for a former Project Manager. With permisssion, I’ll reprint the entire recommendation:

“Maggie Roberts was a great project manager. She was great at communicating with both technical and non-technical personnel. She knew enough about the technical work to describe the expected results as well as the goal she hoped to accomplish with the results, and then she got out of the way and let me deliver the results to her. When I had a better idea of how to get the desired results, she allowed me to pursue it.

My favorite thing about working with Maggie was her directness and the clarity of her expectations. She was never shy about indicating what was and was not good about the work I turned in. Her criticism was never cruel or directed toward me as a person, but targeted the work I turned in and its relationship to our client. She was not shy with her praise either. She had the same directness with pointing out great things I did as she did with errors. She always related both praise and criticism directly to how my performance affected the client. By making sure I had a clear understanding of our goals, and by being so clear about judging my work, she encouraged me to look for more creative ways of meeting our goals. She made me feel like both a technician and a partner in our quest to save our client money. Working with Maggie was a challenge because of the high standards she set, and a pleasure because the standards were clear, and she made sure I had the tools I needed to meet them. I can honestly say that I grew as a technician under her leadership.”

Maggie was not a technical lead, but a PM. In that role, she communicated values (save the client money, show each step of the work) very clearly, and she demanded quality. I had never worked with SQL Server before working with Maggie, but in six months I got two years of experience. When I created an automated Excel spreadsheet to retrieve data and perform the formatting we were doing by hand, she was very free with her praise.

What Happens to Software Under Arbitrary Deadlines?

There are four basic engineering aspects to developing a software system: Requirements, Test, Design, and Implementation. When a project is squeezed for time, it is the Design and Test aspects that get squeezed to make room for Implementation. Design activities are in effect frozen, which means that the design of the system will stay static even as new features are added. Testing simply stops, with all that that implies.

As implementation continues, new requirements are uncovered or areas of existing requirements are discovered to need modification or clarification. All of these factors imply necessary changes to the design of the system, but since design is frozen, the necessary changes don’t happen. Since testing isn’t happening, the full ramifications of this fact are not immediately felt.

The only remaining implementation choice is to squeeze square pegs of new features into the round holes of the existing design. This option requires more time to implement because time is spent creating overhead to deal with the dissonance between the design and the requirements. The resulting code is harder to understand because of the extra overhead. This option is more prone to error because more code exists than should be necessary to accomplish the task. Every additional line of code is a new opportunity for a bug.

All of these factors results in code that is harder to maintain over time. New features are difficult to implement because they will require the same kind of glue-code to marry a poor design to the new requirements. Further, each new feature deepens the dependency of the system on the poor design, making it harder to correct, and making it ever-easier to continue throwing bad code after bad. When considered in isolation, it will always seem cheaper to just add crap-code to get the new feature in rather than correct the design, but cumulatively, it costs much more. Eventually it will kill the project.

As bad as all this is, the problems don’t stop there. As long as the ill-designed code continues to exist in the system it serves to undermine the existing and all future features in two ways. 1) It becomes a pain point around which ever more glue-code will have to be written as the interaction of the ill-designed feature with the rest of the system changes. 2) it acts as a precedent in the code-base, demonstrating that low-quality code is acceptable so long as the developer can find a reason to rationalize it.

Software Design: Cognition and Design Principles

Software design is a subject fraught with disagreement among developers. The size and scope and importance of the subject demand serious attention from any developer considering himself more than just a hacker. What follows is my current understanding of software design as a subject.

Before we can properly deal with the subject of what good design is, we must first ask what is the purpose of design. Is it just to make the software work? Or is there something else. While the software must perform its basic function, I regard “working” as a second-order consideration. Functioning software is a necessary goal of design, but it is not sufficient to explain why we need design. Consider for a moment that many companies have working software that is poorly designed. Cearly, software does not necessarily have to be designed well, or even at all, in order for it to work—at least in the short term. The goal of design must be something else, something other than just the basic “does it work now?” question.

At this point one might be tempted to say that the purpose of design is “maintainability.” But what is meant by “maintainability?” What makes one software project maintainable, and another a disaster?

Consider that the computer does not care about the structure or organization of the code. The computer is not concerned with what language the software was written in, which patterns, techniques, processes, or tools were used in the construction of the software. To the computer, your program is just a sequential series of instructions: “Do x, Do y, If z Do a.” Why then should we concern ourselves with design principles, patterns, and practices? What are they for?

Software developers are often described as “writing code,” but we don’t normally think of ourselves as writers in the traditional sense. What if we did? When a writer sits down to write, the first questions s/he must answer is “who is my audience?” In deciding on an audience for his work, a writer constrains what is or is not acceptable in terms of the structure and content of his written work. These constraints are not arbitrary or optional: they are logically necessary implications of the choice of audience. If software developers are writers, then their work must also be constrained by their target audience. But if the target audience cannot be the computer, who is it?

It’s people. We do not write code for the computer. We write code for other people—for our co-workers, for ourselves, for any others that may have an interest in what the software is supposed to do at any time over the entire lifespan of the project. The purpose of software design is to communicate what the project does, and how it does it. Any set of software design principles and methods must be targeted at communicating to people. It must be constrained by the nature of the cognitive activity of the human mind.

Human Cognition

The human mind is only capable of dealing with 4 or 5 concretes at one time, and yet we are confronted with thousands of concrete objects, ideas, and choices that we must deal with on a daily basis. We must aggregate concretes into abstractions so that we can then treat the abstraction as a concrete for further thinking. For example, you may not be able to tell how many pipes this is must by glancing at it:

||||||||||||||||||||||||||||||

But if I broke it up into groups:

|||||    |||||    |||||    |||||    |||||    |||||

You should be able to tell that there are 6 groups of 5, ergo 30 pipes. The act of viewing the pipes in groups enables thinking. This is analogous to what we do when we perform abstraction. Abstraction is a process of integration and differentiation. We integrate like members of a class of existents, and differentiate them from the other members of the wider class to which they belong. For example, we can observe that certain objects are differentiated from other “media” by the fact that they have pages covered in text. “Book” as an abstract integrates all these objects under a common abstraction while distinguishing them from other kinds of media.

Well-designed software must be intelligible and discoverable and have an appropriate level of abstraction.

Intelligibility

Intelligibility means that even if the software does what it is supposed to do, it should make sense to the average software craftsman. I do NOT mean that we should “dumb-down” our code to the lowest common denominator. For any given field of endeavor there exists as general category of knowledge that can be expected of our target audience. Developers that do not bother to acquire that context have only themselves to blame for not being able to understand common software idioms. Suppose for example that your company has recently adopted an object-oriented language after years of writing procedural code, and the developers are not yet fully comfortable with concepts such as “classes” and “interfaces.” That does not mean that you should not introduce them to concepts such as design-patterns.

Just as we should avoid dumbing-down our code, we should also avoid “stupid code tricks.” A Stupid Code Trick is when a developer takes advantage of a subtle or little-known feature of a language to accomplish a goal that could be solved with a simpler code-construct. C is famous for this kind of code—some wise-guy developer will take advantage of the difference between “++i;” and “i++;,’” but in the context of a larger expression which consists primarily of pointer symbols, member signifiers, and parentheticals. Another Stupid Code Trick is to embed a long chain of function calls as arguments to a function. In this case, you end up with code that looks like:

// Stupid Code Trick
ProcessMyData(GetDataToProcess(GetMyIdentity(My.Session.User)), GetMethodToProcess(1, 12, 15), GetOutputFormat(OutputFormat.Default))

This should be re-written as:

var identity = GetMyIdentity(My.Session.User);
var method = GetMethodToProcess(1, 12, 15);
var format = GetOutputFormat(OutputFormat.Default);
var dataToProcess = GetDataToProcess(identity);
ProcessMyData(dataToProcess, method, format);

The temptation to perform a Stupid Code Trick seems to be rooted in the desire to reduce the number of lines of code. We must resist this temptation by remembering that our purpose is to be intelligible, not to reduce the number of lines of code.

Intelligible software should not confront the developer with too many concretes at once without a unifying abstraction. Let’s count the number of concretes in the above code:

var identity = GetMyIdentity(My.Session.User); // 3
var method = GetMethodToProcess(1, 12, 15); // 5 
var format = GetOutputFormat(OutputFormat.Default); // 3 
var dataToProcess = GetDataToProcess(identity); //4 
ProcessMyData(dataToProcess, method, format); // 4

I count one abstraction for the return result of the method, one for the method itself, and one more for each argument to the method. In the original version of this function, there were 20 concretes crammed on one line of code. By breaking up each function call into a separate call, we have a horizontal complexity of 3 to 4 on each line, and the entire algorithm is processed in 5 lines. Both horizontally and vertically the refactored code is within the ability of the mind to grasp at once.

This limit of 4 or 5 concretes means we should try to keep the number of arguments to functions as small as possible. the return result and method itself give us a concrete complexity of 2. anything over 3 arguments to a function and we are straining our ability to hold everything in our mind at once. The limit also means that we should keep methods as short as possible. It’s better to have many small and well-named functions than to have one massive function that does everything.

Discoverability

Another feature of the refactored code sample is its discoverability. Discoverability refers to the extent to which the code expresses its intent. I can make the code less discoverable by renaming its methods and arguments as follows:

var i = GetI(My.Session.User); // 3
var m = GetM(1, 12, 15); // 5
var f = GetOF(OutputFormat.Default); // 3
var d = GetData(i); //5
Process(d, m, f); // 4

I could make the code clearer by assigning named variables the three arguments to GetMethodToProcess();

const int numberOfCopies = 1;
const int leftMargin = 12;
const int topMargin = 15;

var identity = GetMyIdentity(My.Session.User); // 3
var method = GetMethodToProcess(numberOfCopies, leftMargin, topMargin); // 5
var format = GetOutputFormat(OutputFormat.Default); // 3
var dataToProcess = GetDataToProcess(identity, method, format); //5
ProcessMyData(dataToProcess, method, format); // 4

Discoverable functions describe what they do, accept only a few arguments, observe the Command-Query-Separation principle (CQS), and avoid temporal coupling.

// Temporal Coupling Example:
var myObject = this.GetBusinessObjectToBeValidated();
var validator = new Validator<MyBusinessObject>();
validator.Value = myObject;
var results = validator.GetValidationResults();

The 3rd line of the above example inhibits discoverability because nothing in the API suggests that you are required to set Value on the validator prior to calling GetValidationResults(). The above coupling could be resolved by either passing the object to be validated to the constructur, or (my preference) passing it as a parameter to the GetValidationResults() method.

// Temporal Coupling Example:
var myObject = this.GetBusinessObjectToBeValidated();
var validator = new Validator<MyBusinessObject>();
var results = validator.GetValidationResults(myObject);

It is now clear from the method signature of GetValidationResults that a business object is required in order to perform the operation.

Level of Abstraction

Good software design demands an appropriate level of abstraction. The cognitive principle is best expressed by Rand’s Razor: “[abstractions] are not to be multiplied beyond necessity—the corollary of which is: nor are they to be integrated in disregard of necessity.” Consider the following “Hello World” application:

var outputWriter = DiContainer.GetService(IOutputWriter);
var resourceService = DiContainer.GetService(IResourceServce);
var format = resourceService.GetDefaultResourceFormat();
var arguments = resourceService.GetDefaultResourceArguments();
var message = string.Format(format, arguments);
outputWriter.Write(message);

This application uses a Dependency Injection container to get an outputWriter and ResourceService. It then formats a message for display, then passes that message to the outputWriter. This application clearly requires a good bit of configuration in order to work properly. Gah! It violates Rand’s Razor in that it introduces an unnecessary level of abstraction into the application. This application would be better written as:

Console.WriteLine("Hello World!");

The above code might not be a Hello World application, and the level of abstraction expressed might be perfectly necessary in that context. Here is an example of the other side of the same problem. This interface integrates methods that are not cogent:

public interface ISuperClass
{
    void Login(User user);
    IEnumerable<FileInfo> GetFiles();
    TcpClient CreateTcpClient(string hostName, int port);
}

These methods have nothing to do with one another. Integrating them in an interface serves no purpose. The concomitant design principle of Rand’s Razor is the Single Responsibility Principle.

Diff – Part 2

In Part 1 of this series, I created a basic class that would find differences between two versions of the same list. I’ve made some improvements to the original implementation that I’d like to share.

To begin with, I read this post on the difference between the Any() and Count() extension methods. While List<T> does implement ICollection, I wanted to change the structure of DiffResult<T> so that if I decide to change the datatype of Additions, Deletions, and Modifications, the IsEmpty property will be more efficient. To that end, I changed IsEmpty as follows:

public bool IsEmpty
{
    get { return !this.HasChanges;  }   
}

public bool HasChanges
{
    get { return this.Additions.Any()
                 || this.Deletions.Any()
                 || this.Modifications.Any(); }
}

The next thing I wanted to be able to do is automatically synchronize changes from the newlist back to the oldlist. Here is the unit test:

[Test]
public void ResolveChangesBetweenTwoLists()
{
    // Arrange: Declare any variables or set up any conditions
    //          required by your test.
    var newVersion = new List<Entity>();
    var oldVersion = new List<Entity>();

    var identityComparer = new EntityIdentityComparer();
    var modificationComparer = new EntityEqualityComparer();

    var diff = new Diff<Entity>(identityComparer, modificationComparer);

    newVersion.Add(new Entity() { Id = 1, Name = "One"}); // addition
    newVersion.Add(new Entity() {Id =2, Name="2"}); // modification

    oldVersion.Add(new Entity() { Id = 2, Name = "Two"}); // modification
    oldVersion.Add(new Entity() { Id=3, Name="Three"}); // deletion

    // Act:     Perform the activity under test.
    diff.Resolve(newVersion, oldVersion);
    var result = diff.Execute(newVersion, oldVersion);

    // Assert:  Verify that the activity under test had the
    //          expected results
    Assert.IsTrue(result.IsEmpty); // all changes should have been resolved.
}

My first stab at the implementation is as follows:

public class Diff<T>
{
    public void Resolve(IList<T> newVersion, IList<T> oldVersion)
    {
        var results = this.Execute(newVersion, oldVersion);

        foreach (var element in results.Deletions)
            oldVersion.Remove(element);

        foreach (var element in results.Additions)
            oldVersion.Add(element);

        foreach (var element in results.Modifications)
        {
            var keyItem = element;
            var item = oldVersion.First(e => _identityComparer.Equals(e, keyItem));
            var index = oldVersion.IndexOf(item);
            oldVersion[index] = element;
        }
    }
…snipped
}

This implementation is fine as far as it goes, but there is a drawback that I don’t like. If the user of this codes wishes to determine if there are changes prior to executing the Resolve method, s/he is required to execute it a second time during the Resolve step. I like that placing Resolve() on the Diff class provides a single-step execution, so I’m going to move the real work to the DiffResult class, but leave the Resolve method where it is. I changed the implementation of Diff.Resolve() to this:

public void Resolve(IList<T> newVersion, IList<T> oldVersion)
{
    var results = this.Execute(newVersion, oldVersion);
    results.Resolve(oldVersion, this._identityComparer);
}

I added DiffResult.Resolve() as follows:

public void Resolve(IList<T> oldVersion, IEqualityComparer<T> identityComparer)
{
    this.Deletions.ForEach(e => oldVersion.Remove(e));
    this.Additions.ForEach(oldVersion.Add);
    this.Modifications.ForEach( e =>
                                    {
                                        var item =
                                            oldVersion.First(element => identityComparer.Equals(element, e));
                                        var index = oldVersion.IndexOf(item);
                                        oldVersion[index] = e;
                                    }
        );

}

The updated source code for this solution can be found here.

Diff – Part 1

Have you ever needed to know if the contents of a list has changed from a given snapshot? I recently decided that I would write a quick-and-dirty Diff algorithm. I started with the following test:

[Test]
public void NullSet()
{
    // Arrange: Declare any variables or set up any conditions
    //          required by your test.
    var source = new List<Entity>();
    var target = new List<Entity>();

    // Act:     Perform the activity under test.
    var diff = new Diff<Entity>();
    
    // Assert:  Verify that the activity under test had the
    //          expected results
    var changeSet = diff.Execute(source, target);
    Assert.IsTrue(changeSet.IsEmpty);
}

I defined <Entity> as a class with an Id and a Name. This version of the diff algorithm relied on reference comparisons to find changes between the two lists. To extend it’s functionality a bit, I decided to create an EqualityComparer to replace the reference comparison.

Here is the EqualityComparer:

public class EntityIdentityComparer : IEqualityComparer<Entity>
{
    public bool Equals(Entity x, Entity y)
    {
        return x.Id == y.Id;
    }

    public int GetHashCode(Entity obj)
    {
        return obj.Id.GetHashCode();
    }
}

And here are the next tests:

[Test]
public void OneAddition()
{
    // Arrange: Declare any variables or set up any conditions
    //          required by your test.
    var source = new List<Entity>();
    var target = new List<Entity>();
    var identityComparer = new EntityIdentityComparer();

    var diff = new Diff<Entity>(identityComparer);

    source.Add(new Entity());

    // Act:     Perform the activity under test.
    var changeSet = diff.Execute(source, target);

    // Assert:  Verify that the activity under test had the
    //          expected results
    Assert.AreEqual(1, changeSet.Additions.Count);
    Assert.IsFalse(changeSet.IsEmpty);
}
[Test]
public void OneDeletion()
{
    // Arrange: Declare any variables or set up any conditions
    //          required by your test.
    var source = new List<Entity>();
    var target = new List<Entity>();
    var identityComparer = new EntityIdentityComparer();

    var diff = new Diff<Entity>(identityComparer);

    target.Add(new Entity());

    // Act:     Perform the activity under test.
    var changeSet = diff.Execute(source, target);

    // Assert:  Verify that the activity under test had the
    //          expected results
    Assert.AreEqual(1, changeSet.Deletions.Count);
    Assert.IsFalse(changeSet.IsEmpty);
}

[Test]
public void OneAdditionAndOneDeletion()
{
    // Arrange: Declare any variables or set up any conditions
    //          required by your test.
    var source = new List<Entity>();
    var target = new List<Entity>();
    var identityComparer = new EntityIdentityComparer();

    var diff = new Diff<Entity>(identityComparer);

    source.Add(new Entity() { Id = 1 });
    target.Add(new Entity() { Id = 2 });

    // Act:     Perform the activity under test.
    var changeSet = diff.Execute(source, target);

    // Assert:  Verify that the activity under test had the
    //          expected results
    Assert.AreEqual(1, changeSet.Additions.Count);
    Assert.AreEqual(1, changeSet.Deletions.Count);
    Assert.IsFalse(changeSet.IsEmpty);
}

Here is the test I wrote to detect modifications:

[Test]
public void OneModification()
{
    // Arrange: Declare any variables or set up any conditions
    //          required by your test.
    var source = new List<Entity> { new Entity() { Id = 1, Name = "Test" } };
    var target = new List<Entity> { new Entity() { Id = 1, Name = "Test 1" } };

    var identityComparer = new EntityIdentityComparer();

    var diff = new Diff<Entity>(identityComparer);

    // Act:     Perform the activity under test.
    var changeSet = diff.Execute(source, target);

    // Assert:  Verify that the activity under test had the
    //          expected results
    Assert.AreEqual(0, changeSet.Additions.Count);
    Assert.AreEqual(0, changeSet.Deletions.Count);
    Assert.AreEqual(1, changeSet.Modifications.Count);
    Assert.IsFalse(changeSet.IsEmpty);
}

Notice that my source and target lists have an entity that, according to the IdentityComparer, are the same entity. I need a new concept to be able to tell them apart. I need to know if two entities with the same Identity have the same values. In short, I need another EqualityComparer. This time, instead of testing for equality on the Id property, it should test for equality on all other properties. Here is the EqualityComparer:

public class EntityEqualityComparer : IEqualityComparer<Entity>
{
    public bool Equals(Entity x, Entity y)
    {
        return (x.Name == y.Name);
    }

    public int GetHashCode(Entity obj)
    {
        return obj.GetHashCode();
    }
}

And here is the refactored modification test:

[Test]
public void OneModification()
{
    // Arrange: Declare any variables or set up any conditions
    //          required by your test.
    var source = new List<Entity> {new Entity() {Id = 1, Name = "Test"}};
    var target = new List<Entity> {new Entity() {Id = 1, Name = "Test 1"}};

    var identityComparer = new EntityIdentityComparer();
    var modificationComparer = new EntityEqualityComparer();

    var diff = new Diff<Entity>(identityComparer
        , modificationComparer);

    // Act:     Perform the activity under test.
    var changeSet = diff.Execute(source, target);

    // Assert:  Verify that the activity under test had the
    //          expected results
    Assert.AreEqual(0, changeSet.Additions.Count);
    Assert.AreEqual(0, changeSet.Deletions.Count);
    Assert.AreEqual(1, changeSet.Modifications.Count);
    Assert.IsFalse(changeSet.IsEmpty);
}

Here is the definition of DiffResult:

public class DiffResult<T>
{
    public bool IsEmpty
    {
        get { return this.Additions.Count == 0 
                     && this.Deletions.Count == 0
                     && this.Modifications.Count == 0; }
    }

    private readonly List<T> _additions = new List<T>();
    public List<T> Additions
    {
        get { return this._additions; }
    }

    private readonly List<T> _deletions = new List<T>();
    public List<T> Deletions
    {
        get { return this._deletions; }
    }

    private readonly List<T> _modifications = new List<T>();
    public List<T> Modifications
    {
        get { return this._modifications; }
    }
}

And the implementation of Diff:

 

/// <summary>
/// Used to find changes between two snapshots of a list.
/// </summary>
/// <typeparam name="T"></typeparam>
public class Diff<T>
{
    /// <summary>
    /// Determines if two objects are supposed to be the same object.
    /// </summary>
    private readonly IEqualityComparer<T> _identityComparer;

    /// <summary>
    /// Determines if two objects have equivalent values.
    /// </summary>
    private readonly IEqualityComparer<T> _modificationComparer;

    /// <summary>
    /// Constructor
    /// </summary>
    /// <param name="identityComparer"></param>
    /// <param name="modificationComparer"></param>
    public Diff(IEqualityComparer<T> identityComparer
        , IEqualityComparer<T> modificationComparer)
    {
        Require.That
            .IsNotNull(identityComparer, "Identity comparer is required.")
            .IsNotNull(modificationComparer, "Modification comparer is required.");

        this._identityComparer = identityComparer;
        this._modificationComparer = modificationComparer;
    }

    /// <summary>
    /// Returns a changeset of the differences between the new and original versions of the list.
    /// </summary>
    /// <param name="newList">The new version of the list.</param>
    /// <param name="originalList">The original version of the list.</param>
    /// <returns></returns>
    public DiffResult<T> Execute(IEnumerable<T> newList
        , IEnumerable<T> originalList) 
    {
        var result = new DiffResult<T>();

        var addedItems = newList.Except(originalList, this._identityComparer)
            .ToList();

        var deletedItems = originalList.Except(newList, this._identityComparer)
            .ToList();

        var modifiedItems = from newElement in newList
                            from originalElement in originalList
                            where this._identityComparer.Equals(newElement, originalElement) 
                            && !this._modificationComparer.Equals(newElement, originalElement)
                            select newElement;

        result.Modifications.AddRange(modifiedItems);
        result.Additions.AddRange(addedItems);
        result.Deletions.AddRange(deletedItems);

        return result;
    }
}

You can download the solution files here.

My First Speaking Engagement

I have accepted an invitation to speak at GSP Developers  in March. My topic will be Test Driven Design. I will focus on how practicing TDD changes the overall design of your software.

I think my strategy should be:

Give a brief overview of TDD.

Demonstrate writing a simple function TDD style.

Discuss what is meant by “testable” code, meaning how testable code relates to the SOLID design principles.

Give a brief overview of the available TDD tools for .NET.

Any advice for a first-time speaker?

Previous Page · Next Page