Tag Archives: Metrics
Proposed Quality Metrics for Engineering Managers

As a manager I’d like to have some data-driven metrics by which to decide where to focus my questions and priorities for making improvements on my engineering team.  Not all of these metrics will be measured using the same scale–some metrics are taken as pure work item counts, estimations in term of story points, and hours. As these metrics are an attention-guide for managers, it’s important that a manager is choosing the most important set to focus on at any one time. Trying to address every issue at once is doomed to failure.

The list is not exhaustive and some traditional metrics like code coverage are specifically excluded. These are simply the items that I as an engineering am currently interested in. Once I can measure them through our work tracking tool, I will choose 3-4 to focus on in the short- to medium-term.

The structure of each metric is:
Title: The name of the metric and a short description of what it means
Question: What is the question the metric is trying to answer?
Remediation: One or more common paths to improve the metric. It should be understood that this list is not exhaustive and that any given metric out of balance may require new ideas about how to remediate.
Note: Any special notes required to properly understand the metric.

Quality

Defect Rate

Description: Rate at which new defects are created.
Question: How often do our customers find defects with our products?
Remediation: Increase emphasis on automated tests and clean code.

Regression Density

Description: % of created defects that are regressive.
Question: How often are we breaking things that used to work?
Remediation: Increase emphasis on automated tests and clean code.

Defect Density

Description: % of work in the backlog categorized as defects.
Question: How much of our work is dedicated to bugs vs. features?
Remediation: Increase emphasis on automated tests and clean code.
Also, impose a bug cap – e.g., 5x engineer count. If the team reaches this cap they must stop any new work and instead address defects.
Note: This is normally thought of as defects per line of code (LOC), but if LOC can’t be used as a quality metric then defect density can’t really be measured using LOC either. I’m taking the liberty of redefining this term here.

Support Effort

Description: % of total capacity dedicated to support
Question: How much of our capacity is consumed by support efforts?
Remediation: Look for trends in support tickets and address them with engineering.
Note: It might be useful to subdivide this to track the SOX, GDPR, and other separate support efforts.

Predictability

Unplanned Work Density

Description: % of work that is unplanned; includes support tickets.
Question: How well do our plans match reality?
Remediation: identity and address bottlenecks
Note: Unplanned work is work that the team has to do unexpectedly. If the team takes in extra work due to found capacity, this is “bonus” work.

Estimation Delta

Description: The difference between the estimation and the actual time it took to complete work.
Question: How good are we at estimating?
Remediation: Estimation problems are usually the result of large batch sizes or low-quality code bases. Reduce batch size and implement quality-centric practices such as TDD. Sometimes estimation issues are caused by unclear requirements–though that lack of clarity should automatically result in a higher estimate.

Value Density

Description: % of work that is planned user stories.
Question: What percentage of our effort is value-add versus overhead? (e.g., planned stories vs. bugs, support, and distracting work)
Remediation: Reduce manual overhead through automation.

Turnaround

Lead Time

Description: Time between when a work is requested and when it is delivered.
Question: How long do customers have to wait for the features they ask for?
Remediation: improve cycle time and predictability

Cycle Time

Description: Time between when a work is started and when it is delivered.
Question: How quickly can the team deliver from the time they start work?
Remediation: Clear bottlenecks in the testing and deployment pipeline. Aggressively target unplanned work.

Throughput

Work Items Completed

Description: The number of stories and bugs closed.
Question: What is our throughput in absolute numbers?
Remediation: identity and address bottlenecks.

Estimated Work Completed

Description: The point value of the closed work items.
Question: What is our throughput from an estimation perspective?
Remediation: identity and address bottlenecks.

Stay Focused on the Goal, Not the Metrics

The goal is the thing you are trying to do.
The metric is how you are measuring your progress toward the thing you are trying to do. Metrics are only as good as their ability to measure progress toward the goal.

Don’t confuse them.

An Example

Imagine a sales team for an organization selling widgets has a goal to increase sales of a particular product line by 10%. The Sales Manager decides that the best way to achieve the goal is for the sales staff to make a certain number of cold-calls over the following months. After a few weeks, one of her sales staff is falling way behind in cold-calls. The wrinkle is that this salesman is the top biller in the department. Should the sales manager berate her top performer for not doing enough cold-calls?

Absolutely not. No manager should ever punish their reports for doing well (provided the means used are legal and ethical of course).

Punishing the salesman would send the message that billing isn’t the goal, but cold-calls are. Do you want a salesman who makes lots of cold-calls but can’t bill? Since sales staff are compensated via commission, punishing the salesman would introduce a division between his performance and his pay. In the best case, the salesman simply ignores the manager and continues to bill and get paid–benefiting the company in the process. In the worst case, the salesman leaves the company for greener pastures, depriving the company of it’s top biller.

The mistake here is that cold-calls are simply a form of measuring progress toward the goal–increased sales. Cold calls themselves are not the actual goal–they are a proxy for the goal. Further, they may not be the only possible proxy. Their value as a proxy is proportional to the relationship between cold-calls and increased sales. If a salesman is generating increased sales without cold-calls then there is either another possible metric or cold-calls are a poor metric.

It’s one thing to say that cold-calls are a proven way to generate increased sales. It’s quite another to ignore that there are other possible ways to do the same thing. It’s flat wrong to take the position that cold-calls are the only way to increase sales.

What is the appropriate response? Find out how the top biller is selling so well without cold-calls. Is the top biller doing something that no one else is doing? Is there something for the other sales staff to learn? Are there new, better metrics that can be introduced? Of course, it’s also possible that the top biller could bill even more if he did more cold-calls. Finding out will require collaboration between the salesman and the manager–but this is a process of active investigation instead of passive authoritarianism.

If the manager focuses on the metric instead of the goal, she is taking on the responsibility of having all the answers and dictating them to others. The proper approach is to adopt a learning stance toward the team’s work. If the team is doing well but the metrics aren’t being met, what can the manager learn from this? If the team meets the metrics, will they do better? If not–what good are they?

Choosing Metrics

When choosing metrics it’s important to consider that people will game the system. If you’re a software engineering manager and you make Lines of Code or Test Coverage the metric, people will write verbose code and create meaningless tests. A good metric will encourage people to game the system by focusing on the thing you want to achieve. I recently heard of an example in which the Product Owner threw out story sizing as part of their Scrum process. The only thing developers got credit for was the number of stories they completed. It didn’t matter how large or small–credit was only given when the story was completed an in production. The developers began gaming the system by reducing the story size to the smallest thing they could deliver.

Perfect.