Should you try to measure developer productivity?

What exactly do you measure when you have so many data points? CTO Stephan shares his views on filtering the signals from the noise

By Merlin Carter

Managers are perpetually confronted with the need to measure performance. CTOs and tech leads are no exceptions. Fortunately, most software development tasks are logged in one way or another. So you’d think that it would be easy to measure developer productivity. But in reality, it’s anything but simple — there are just too many data points. How do you know what’s really important? In this session, our CTO Stephan shares his views on how to filter the signals from the noise.

Should you try to measure developer productivity, and if so, what do you measure exactly?

Merlin: So, Stephan, today we have a controversial theme (which is my favorite type of theme).

I want to talk about whether you should try to measure developer productivity, and if so, what do you measure exactly?

There are a few different opinions out there: Some people want to keep a very close eye on things. And they’ll track metrics like the number of commits or lines of code. I’m not really sure how helpful that is, but I suppose you have to measure SOMETHING, right? At least if you want to compare how different teams are doing.

So what’s your general take on how to measure developer productivity?

“It’s about how much value your team is delivering against your goals.”

Stephan: Interesting question, and I agree that it’s somewhat controversial. But let’s just take a look at the options:

You could decide not to measure anything at all and just trust that your team is being productive. However, I think it’s hard to get around measuring velocity. Stakeholders will always want to know how fast you can get certain features done.

I think the important question is: how quickly can you create value for the business? And what does it mean to create value?

One example of value creation could be that you increase the conversion rate for a specific call to action by 10% or so. That’s something that you can express in financial terms

Another option is to start measuring those low-level metrics that you mentioned:

  • How many lines of code have been removed, edited, or deleted,
  • How much time it takes for tickets to move through different stages in the workflow,
  • How many commits or deployments are there in a certain time frame.

But to be honest, I wouldn’t bother tracking these metrics for developer productivity because they’re completely hackable.

You can easily cheat if you know how you’re being measured. For example, suppose that I tell you that I’m measuring the lines of code that you contribute. You could increase your line count by writing really bloated code, which doesn’t help anyone.

The same principle applies to deployments or commits. You could just increase this metric by adding lots of really small changes. Again, this just adds noise, which isn’t particularly helpful.

On the other hand, these metrics can be helpful in detecting problems.

For example, suppose that one of these metrics suddenly changes dramatically. Let’s say you measure it down to the level of the individual developer. Suddenly, for one developer, there’s a big dip in the number of commits and the amount of code being committed.

That could indicate that she’s being distracted in some way. Perhaps someone is constantly asking her for help, and she suddenly has less time.

There can be many reasons for a sudden change in velocity, but at least these metrics can give you some basic clues about potential obstacles.

At the end of the day, though, it’s the final outcome that really matters. It’s about how much value your team is delivering against your goals.

By the way, that’s one thing we haven’t really mentioned — it’s fine to measure things, but what do you compare them to?

You need to have some kind of goal or estimate that you’re trying to meet. That way, if you fall behind, you can ask yourself why. Was there really a slowdown in performance, or did you set your targets too high?

What counts as “providing value”?

Merlin: Yes, that’s interesting what you said about anomalies in the metrics and unusual patterns. I think you need to keep a close eye on the data to be able to spot those things. But let’s take a closer look at what counts as “providing value” for a second.

I think, depending on your role, you’re going to have a different perspective on what providing value actually means. For example, I’ve seen some product managers get frustrated at the amount of time spent on technical debt. For them, it’s like a black box — they have trouble fitting it into the list of priorities.

How do you communicate time spent on technical debt in terms of the value that this work provides?

“…you can see how valuable it can be to reduce technical debt…”

Stephan: First off, I would say that reducing tech debt is probably the best investment you can make….although you’ll never be able to get rid of it completely

Technical debt is just like financial debt. The more debt you accumulate, the more interest you pay. If part of your codebase is messy and complex, that’s technical debt. If you keep building on top of that code instead of refactoring it, you make it even more complex, and you accumulate more debt. At some point, you have to implement a major feature that touches many parts of this messy code.

Then you realize how much interest you’ve been paying.

It takes you 5 times longer to implement the feature — 5 times longer than what it would have taken had you refactored the code when it was still at a manageable size.

So you can see how valuable it can be to reduce technical debt in the long run.

And let’s look at another example: I think all of us know some areas in our products’ codebase that nobody ever wants to touch….because nobody really understands why it was written this way. Code that is impossible to maintain is also technical debt.

And another thing is, if I was a product manager, I would be super interested in resolving stuff like this. I wouldn’t just be thinking of the next feature that needs to be released to customers.

Because it’s also part of the job to make sure that you produce a stable product that’s easy to maintain. And that requires you to invest the time to understand what goes on under the hood.

I mean, it’s a bit like owning a car. You can give it a great paint job, install fancy upholstery, and an amazing car stereo. But if you never take it to the workshop and maintain your engine, it’s going to cost you further down the line.

One day, your car will break down, and you’ll have a great-looking vehicle that doesn’t go anywhere. And the engine parts will be so worn down that it will be extremely expensive to replace them.

It’s very similar to software. You need to take the time to monitor and maintain it so that it’s stable enough for the long term.

What about TPUF (Time to Positive User Feedback) as a metric?

Merlin: Yes, I completely agree with you on that point, but I still wonder how to measure value in a way that’s understandable to say, a CEO. This reminds me of a scenario that I read about on (which is a blog about tech leadership).

Essentially, an engineering manager was asked by her CEO to assess whether engineers are delivering features fast enough for the needs of the business. And those low-level metrics like deployment frequency were meaningless to him.

So, in the article, the author proposes a metric called “Time to Positive User Feedback”. Essentially, it’s the time it takes for an idea to get implemented and be validated by the end user. To get feedback that the idea was a good idea, an improvement.

They point out that this metric makes adjacent teams work closer together because you need help from the product team to collect the feedback.

What do you think of using a metric like this to communicate value to the company leadership?

What about refactorings, improving the system’s stability or performance? Will a user provide positive feedback about this? Probably not

Stephan: Well, here’s my perspective:

About the “Time to positive user feedback”: if this is a metric that helps your company to develop better — then go for it. But I would still be careful, though….

If you decide to introduce it, you should definitely define what positive user feedback means.

Is it 1 happy user? 10 users? 50% of users that tried the feature? 90% of the users that tried the feature?

Without a clear definition, you will paint yourself into a corner by not acting in a data-driven way. Instead, the metric will be driven by opinions.

And you should be careful not to talk only about features. What about refactorings, improving the system’s stability or performance?

Will a user provide positive feedback about this? Probably not (unless your product was previously really slow and buggy).

So you can end up in a situation where the quality of your product decreases because you’ve chosen the wrong incentives.

What I would recommend instead is to create a setup that lets you iterate quickly (and by the way, a high deployment frequency definitely helps here) and that lets you conduct a lot of experiments to prove or invalidate the hypotheses of your product teams.

In a nutshell: think about the consequences of introducing a metric that hasn’t been used before.

Can you predict a team’s future performance?

Merlin: Yeah, I think people have a lot of great ideas for metrics that sound really nice in writing…but when it comes to actually setting up a process to quantify them and to measure them accurately, you end up discovering that they’re really really hard to measure.

….but now, I just want to circle back to the car metaphor that you used just before. I actually want to expand on it.

So, nowadays, cars have a lot of diagnostic features and sensors. You can even use this data to do predictive analytics. A car manufacturer can say, okay, based on the telemetry data we’ve analyzed, we think your car is due for a checkup in the next month. The transmission performance is slowly degrading, or whatever. Do you think you could do the same for a team or a piece of software?

“…there is a key question that you always need to ask: What exactly do you want to know?”

Stephan: You can totally do the same thing with applications. Remember the podcast we did on observability? I think we talked about the situation where the request time for an API slows down by a few milliseconds each week. By the end of the year, it slowed down by 250 milliseconds.

This is probably still within an acceptable range, but it offers a clue that you might need to do some maintenance on your application. Some part of your application probably needs a tune-up; otherwise, it might suddenly break when you least expect it.

But we’re still just talking about technical debt here. If we go back to the wider measuring of developer productivity, I think there is a key question that you always need to ask: What exactly do you want to know?

Often, people just want to understand more about what the engineering team is doing. After all, engineers are quite expensive from a cost perspective. So management wants to know if they’re doing meaningful things that increase value. I think answering that question is more important than measuring velocity or productivity on an individual level.

You need to be able to trust that your team will make good choices and get things done. Rather than measuring them on an individual level, it’s better to focus on providing them with the ideal conditions for being productive….and removing any roadblocks that stand in their way.

But also hold them fully accountable for the results.

Is it actually possible to say: ‘this refactoring will … save us €50,000 in the next quarter’?

Merlin: Sure, but saying you want to increase value also implies that you need to measure the value in some way. So, sure, things like conversions are easy to measure, but there are probably some “value-adding activities” that are a little more abstract and harder to quantify.

For example, we’ve talked a lot about technical debt, but we still haven’t actually explained how to quantify it in terms of value. Is it actually possible to say, “this refactoring will reduce our technical debt by 30% and save us €50,000” in the next quarter?

“…you as a manager need to set goals that are measurable”

Stephan: If you say, “we’re going to reduce our technical debt by 30%”, doesn’t that imply that you know how much technical debt you have?

I think that’s probably hard to know if you have a full understanding of your technical debt. Very often, you have a list of technical debt “items” that you’ve created during the development process, and you can maybe use [this list] as a foundation to get to this number.

I think it is easier to calculate the potential business value of reducing technical debt. You can see this when observing developers who are working on code that includes technical debt. They’re slower to complete features and other tasks.

And you can easily calculate the costs for one hour of a developer’s time. You take the salary and divide it by the number of working days and hours per day. So at the very end, you get a number that tells you, okay, for each hour that the developer spent working on this specific area or feature, it costs me ‘X’ amount of money.

Ideally, you can also estimate things that you could do but aren’t possible right now. Maybe you have an area that you would like to improve, an issue that prevents you from implementing a feature that is highly requested by customers.

In that specific case, you have opportunity costs because your feature hopefully has a business value as well — because you assume that’s going to increase the conversion rate by ‘X’ percent, which then can be measured in monetary value as well. And given that value, you have opportunity costs, which can be related back to the additional time that is needed to implement the feature.

The last thing that you could measure is how technical debt causes quality problems. So you have more bugs because things are more interconnected and more complex. People don’t understand them, and at the very end, you deployed those bugs to production.

So you could also say, “look, I’m losing customers, or my conversion rate drops because I actually deployed a bug in production”. So removing the root cause of the bug, which is the technical debt, you’re going to avoid this.

And so you all avoid potential losses. So these are some metrics that you could use to actually figure out, okay, how much money could I save.

By the way….I’m not really sure that I finished explaining why you should measure developer cost per hour.

So the idea here is to say, okay, we mentioned this 5x increase in development time.

Let’s say you assume that a specific part [of the code] slows your developers down by 50% or 20%. You can actually say, ‘okay, if I could just save half of that time going forward… I can just multiply that by my cost per hour, and then I have an immediate saving…. which I can then take as my value creation justification for refactoring technical debt’.

And generally, that’s why you as a manager need to set goals that are measurable. For example, an easy way to add value would be to reduce your infrastructure costs by 20%.

You could set that as a goal. If you’re a larger company, you could then hire an engineer to focus on achieving that goal.

And if they’re successful, the investment in the extra headcount would quickly pay off. And after that, the new engineer can go on to create more value in another part of your application.

So it doesn’t help to set a low-level goal like “number of tickets processed” or time spent in the “In progress” column or whatever. That’s just micromanagement.

And if you’re really forced to micromanage your team, you might need to ask yourself if you’ve really hired the right people in the first place.

Merlin: Yes, great point. I feel like we could have a much longer discussion about this, but we should probably try to stick to our OWN goal — which is to keep our discussion under 20mins.

But anyway, if anyone out there has any other ideas on the subject of measuring team productivity, feel free to comment. We’d love to hear how other development teams are tackling this problem.

Thanks, Stephan, for your time, and I look forward to talking to you again soon.

Stephan: Same here — thanks, Merlin!