Proposed Quality Metrics for Engineering Managers

As a manager I’d like to have some data-driven metrics by which to decide where to focus my questions and priorities for making improvements on my engineering team.  Not all of these metrics will be measured using the same scale–some metrics are taken as pure work item counts, estimations in term of story points, and hours. As these metrics are an attention-guide for managers, it’s important that a manager is choosing the most important set to focus on at any one time. Trying to address every issue at once is doomed to failure.

The list is not exhaustive and some traditional metrics like code coverage are specifically excluded. These are simply the items that I as an engineering am currently interested in. Once I can measure them through our work tracking tool, I will choose 3-4 to focus on in the short- to medium-term.

The structure of each metric is:
Title: The name of the metric and a short description of what it means
Question: What is the question the metric is trying to answer?
Remediation: One or more common paths to improve the metric. It should be understood that this list is not exhaustive and that any given metric out of balance may require new ideas about how to remediate.
Note: Any special notes required to properly understand the metric.

Quality

Defect Rate

Description: Rate at which new defects are created.
Question: How often do our customers find defects with our products?
Remediation: Increase emphasis on automated tests and clean code.

Regression Density

Description: % of created defects that are regressive.
Question: How often are we breaking things that used to work?
Remediation: Increase emphasis on automated tests and clean code.

Defect Density

Description: % of work in the backlog categorized as defects.
Question: How much of our work is dedicated to bugs vs. features?
Remediation: Increase emphasis on automated tests and clean code.
Also, impose a bug cap – e.g., 5x engineer count. If the team reaches this cap they must stop any new work and instead address defects.
Note: This is normally thought of as defects per line of code (LOC), but if LOC can’t be used as a quality metric then defect density can’t really be measured using LOC either. I’m taking the liberty of redefining this term here.

Support Effort

Description: % of total capacity dedicated to support
Question: How much of our capacity is consumed by support efforts?
Remediation: Look for trends in support tickets and address them with engineering.
Note: It might be useful to subdivide this to track the SOX, GDPR, and other separate support efforts.

Predictability

Unplanned Work Density

Description: % of work that is unplanned; includes support tickets.
Question: How well do our plans match reality?
Remediation: identity and address bottlenecks
Note: Unplanned work is work that the team has to do unexpectedly. If the team takes in extra work due to found capacity, this is “bonus” work.

Estimation Delta

Description: The difference between the estimation and the actual time it took to complete work.
Question: How good are we at estimating?
Remediation: Estimation problems are usually the result of large batch sizes or low-quality code bases. Reduce batch size and implement quality-centric practices such as TDD. Sometimes estimation issues are caused by unclear requirements–though that lack of clarity should automatically result in a higher estimate.

Value Density

Description: % of work that is planned user stories.
Question: What percentage of our effort is value-add versus overhead? (e.g., planned stories vs. bugs, support, and distracting work)
Remediation: Reduce manual overhead through automation.

Turnaround

Lead Time

Description: Time between when a work is requested and when it is delivered.
Question: How long do customers have to wait for the features they ask for?
Remediation: improve cycle time and predictability

Cycle Time

Description: Time between when a work is started and when it is delivered.
Question: How quickly can the team deliver from the time they start work?
Remediation: Clear bottlenecks in the testing and deployment pipeline. Aggressively target unplanned work.

Throughput

Work Items Completed

Description: The number of stories and bugs closed.
Question: What is our throughput in absolute numbers?
Remediation: identity and address bottlenecks.

Estimated Work Completed

Description: The point value of the closed work items.
Question: What is our throughput from an estimation perspective?
Remediation: identity and address bottlenecks.

5 thoughts on “Proposed Quality Metrics for Engineering Managers

  1. I think lead time needs some additional refinement. Customers can request an infinitely long backlog. Perhaps something about when a piece of work is deemed a top priority to when it is delivered? Remidiation would thus include minimizing WIP.

    1. I’ve been thinking about this too. In my own thinking I’m introducing the concept of “committed” as in the moment an engineering team says “we’re definitely doing this work”. This is not the same as slotting the work into an iteration.

  2. I think the formula that you white-boarded for me would add a lot to this post. It was something like customer value delivered = (the teams total work output) – (unplanned work + support)… but I think your version was a lot better. 🙂
    We could also say that customer value = existing customer experience – (outages + bad customer experiences… like bugs). (This formula would point out that if outages and bugs are bad enough, adding a new feature won’t solve the problem.)

    1. That formula is here. I haven’t built the reporting around it yet so I haven’t published an official blog post on it.

Leave a Reply to George HoltCancel reply

%d bloggers like this: