I recently spent about 9 months working on an Agile transformation initiative for a global financial company. One thing they asked for multiple times were metrics that they could use to measure the Productivity of their software teams. They wanted objective measures to prove their Agile teams were more productive than their "classic" teams. They also wanted to measure which teams were their "top performers" and which weren't.
I'm of the belief that (counter to the belief of many technical managers) the questions is difficult to impossible to answer. Most "productivity" metrics...aren't. They're measuring something else.
This isn't an "Agile" vs. "Waterfall" metrics problem - it's a general software industry problem.
Before I go further with this, what's productivity, anyways? I'm sure there are half a dozen potential definitions we could chose. But for purpose of this article, I'm defining "productivity" as "the speed at which a team delivers working, tested code that meets the customer's need over time." In other words "how fast can we write high-quality code?" Note that I'm explicitly leaving measures of the BUSINESS VALUE of the code out of that definition - this is deliberate (I'll revisit it later).
So, that seems simple enough- - why would that be "impossible" to measure? There are a number of reasons, but the problem basically boils down the lack of a consistent and objective metric. What are the "units" of productivity? There are a number of candidates that are commonly used, and most of them are, in my opinion, wrong.
Most candidate "productivity" metrics fall into one of three traps. Some measure teams against their estimates, not against an "objective" measure. Some measure the wrong thing. Some conflate software productivity with other factors that are often outside the development team's control.
Measuring Teams Against Their Own Estimates
This is the most common "wrong" way to measure productivity - to mistake how a team does against it's own estimates for how the team is doing against an "objective" scale.
There's a simple test for this issue for me. Imagine the team was whisked away and replaced by a new group of folks that was only half as productive in an "objective" sense. At the same time, all your estimates were doubled. Would your metrics notice this?
One metric that fails this test is everyone's favorite Agile metric - Velocity. Velocity measures how many story points./ideal hours/tasks/whatever a team completes per unit time. Sounds like a measure of productivity, right?
Velocity (while it can be horribly abused) is a useful metric that's "productivity like." It allows a team to predict how much of their outstanding work will be accomplished in a given time period. It allows them to set an expectation on when a given amount of work will be done.
But it fails as an "objective" measure of productivity. First, in most Agile environments, the team "owns" it's scale. Story points especially don't translate well between teams. Ah, but what if we use something less nebulous and more tied to "real" time, like ideal hours? That doesn't solve the problem, because each team is estimating in hours FOR THAT TEAM (a skilled team might estimate a story at 20 hours that a less skilled team would estimate at 40 hours). OK, but what about more lean concepts like counting stories or tasks? Aren't those supposed to be objectively "right sized?" I'd contend they suffer from the same problem - it's just hidden a bit. Many lean teams I've seen want to break stuff down to the "half day - to - two day" level. But who's making the decision on whether an item meets that scale? Again, a task a "more skilled" team might feel is a single-day task, a "less skilled" team might consider a three day item that needs to be split.
The REAL productivity difference between teams is masked by the teams' estimation process.
Wouldn't Velocity "work" if we had an objective, shared estimation scale that all teams understand and that all teams use? Then a 4-point story for Team A is "objectively the same" as a 4-point story for Team B.
In principle, yes. In practice, for any organization of reasonable size, you've replaced one Herculean task (measuring productivity objectively) with another (getting everyone to agree on how much work some set of "reference" items is). You've also created a highly gameable system - what's to prevent teams from "overestimating" their stories to make themselves seem more productive? You get what you measure. I've never seen an organization try to "standardize" the definition of a point/ideal hour/task across a large number of teams successfully.
What about Earned Value, the waterfall measure I've seen most often held up as an "objective" measure of productivity? Heck, it's even measured in dollars. How could you get more objective than that?
Earned Value actually suffers from the exact same problem. For those not familiar with Earned Value, it is defined (slightly simplified) as the "Budgeted Value of Work Completed." In other words, we plan out and budget costs for all the work in the project. We divide that work into "achievable" chunks with metrics for when we can call each complete. We then look at how many chunks are completed (by our definition), look at how much each was budgeted to cost, and add those up for how much value we've "earned". Earned Value is a useful financial metric - it's good for understanding whether we're getting the expected return on the money we're spending to run the project.
Did you spot the problem for using Earned Value as a productivity metric? It's that Earned Value is based on the BUDGETED cost of the work. In other words, it's based on the team's estimate of the cost, which for most items on a technology project is mapped pretty linearly to a TIME estimate (multiplied by how much that developer costs per hour).
Again, consider my thought experiment of substituting a less productive team for a more productive one, that moves half as fast. When that team budgets (assuming the cost per person is the same), they'll think every task takes twice as long, so costs twice as much. Over time, the "substitute" team will deliver half as many "chunks," but each chunk will be twice as "valuable" in terms of budgeted cost.
Now, granted, doubling the cost estimate of a project has other consequences. A project with a higher cost estimate is much more likely to get killed before it starts, because the ROI isn't there.
But purely as a measure of PRODUCTIVITY, Earned Value is problematic, because we're comparing a team's output to an estimated value of that output, NOT to some notional invariant quantity.
Measuring The Wrong Thing
There have been a number of attempts at various times to measure team "output" on external scales that are certainly objective, but aren't well correlated to "productivity" by our definition.
One example (which fortunately few companies now use) is "lines of code." A developer who produces twice as many lines of code as another developer must be twice as productive, right?
The attraction of such metrics is that they're easy, that the correlation to "productivity" is intuitive, and that they're easy to understand.
However, such metrics are deeply flawed for multiple reasons.
First, they assume "more code is better." Especially in an object-oriented world, there are some problems that are literally solved best by REMOVING code rather than adding it. Penalizing a team for removing code is a bad idea. Similarly, because such metrics reward verbosity, you're incentivizing people to write three lines where one will do. This creates hard-to-read, hard-to-maintain code.
Second, such metrics tend to ignore quality. If you're measuring (and presumably rewarding) teams based purely on output, you're giving them a green light to ignore stepping back and consider quality. You're telling them to move on to the next thing, and any bugs are "QA's problem"
Finally, most metrics are problematic across technologies. The metrics I've seen have the average Java program being 5-10 times longer than a comparable Ruby program that performs similar tasks. That doesn't make Java programmers 5 times as productive.
Conflating Productivity With Factors The Team Can't Control
One idea I hear frequently for Agile metrics is to measure teams based on value delivered, not "productivity." In other words, measure teams based on the dollar value of their delivered code to the business.
From a philosophical perspective, I very much agree with this. One of the most powerful concepts in Agile is the ability to deliver the most valuable pieces first. If you're doing Continuous Delivery, you can in theory capture the value from software almost the moment it's built. Measuring value delivery is a powerful Agile concept.
However, there's a few problems with using "Value delivery over time" as a substitute metric for "Delivery of working tested software over time."
First, Value is hard to estimate and measure. We can certainly ask our product owners to put a notional dollar value on every user story. Whether this is remotely accurate is another question - how much per year is having a "forgot password" link on your login page worth?
There's also a problem that there's an assumption of linearity - each feature/story/work item delivers value independently. Realistically, features are often "more useful" in conjunction with other features. The value of a given piece of delivered functionality will increase over time as it gains more collaborators. This makes the math complex.
You could solve the "value is tough to estimate" problem by measuring value DELIVERED - how much more money are we making from the product now than we were before? First, this only works assuming you're building something that can deliver continuously and measure well (like an e-commerce site) as opposed to something with "lumpy" delivery, or that's hard to measure (like control software for embedded brake systems in automobiles).
Even when it's measurable, "delivered value" is problematic, though. Is your e-commerce team really most productive in November and December, or is that the run-up to Christmas?
Even if we could solve the problems of "how to measure value delivery accurately," is it a useful measure for determining the question we started with, which is "is Team A more productive than Team B, in terms of their ability to deliver high-quality code over time?"
Still no, especially in large organizations. This is because a.) not all projects are created equal, and b.) teams in general do not choose their projects. Let's say Team A is building a new e-commerce portal that's expected to generate $10 million per year in sales. Team B is building a replacement for the internal time tracking system that's estimated to save the company $1 million per year in licensing fees for the current vendor solution. Both projects are expected to last 12 months.
Is Team A ten times as productive? Assuming both project teams finish on time, they've delivered ten times as much value. If our goal is to assess which team is more "productive," it's problematic to bake into that number that one team "lucks into" a highly profitable project, where the other team is "stuck" on a less profitable project.
Again, I see measuring value delivery as a highly useful tool. Organizations would do well to focus on looking at their software groups as "value delivery centers," and doing everything they can to maximize the speed at which they deliver value. But we need to recognize that such metrics are problematic as "job performance" metrics for teams.
So, what do we do about it?
That's an excellent question, and one I don't have an easy answer to.
If nothing else, we (as the software industry) need to start by realizing that this is a hard problem, and that the metrics we can gather are imperfect. So we need to be careful about how we use them. Rewarding or punishing teams based solely on factors the team can't control well or that the teams don't believe "really" correlate to their job performance is a great way to break trust with your teams.
Edit - correcting some embarrassing typos