In Defense of Estimating with Smaller Numbers

Tags: agile, scrum, estimating, fibonacci

Recently, Mike Cohn blogged about his disappointment with how some teams, upon receiving a new deck of planning poker cards, immediately toss out the high-value cards: 20, 40, 100.  For the uninitiated, Planning Poker is a tool for group estimation where Product Backlog Items (PBIs) are assessed a point value based on a modified Fibonacci sequence:

 

While originally created by James Grenning, Mike is largely responsible for popularizing this approach to group-estimation, and it features prominently in his book, Agile Estimating and Planning, and his Scrum certification courses.

In brief, Planning Poker is played in rounds with each team member simultaneously "playing" a card with the point value they think best represents their assessment of the size, complexity, uncertainty and risk of  a particular user story or feature, with higher numbers predictably reflecting greater discomfort and understanding.

In subsequent rounds, stories or features are assessed relative to one another to speed things up, ie. if Story A was assessed 8 points and Story B is slightly less complex or risky, the team may elect to assess 5 points. The reasons for using points instead of hours is a well-explained topic which I won't cover here - suffice to say that it helps teams begin to get a handle on what they can accomplish in an iteration over time.

Looking at the illustration above of Mike's deck, it certainly presents a bewildering array of choices for a team:  It's like an estimating Swiss Army Knife. In my experience, irrespective of deck size, teams tend to coalesce their estimates within the following scale:

 

As they become less certain and want to express this in the backlog, they move further up the scale:

 

 

21 point stories tend to be those that the team feels are not quite ready for prime-time or haven't matured to be reliably estimated.  They may be too vague or the team doesn't possess enough technical or business domain knowledge to break down.

But what about going beyond this? Into the realm of 40, 100 and 'infinity' cards?  What distinguishes between them?  Personally, I've found that teams tend to play these cards as a form of comic relief - the stories are just so complex, so improbable as to be beyond any sense of reliable conveyance.  Often I've been asked  if I could include a "WTF?" card to play in these situations. 

Basically, when the team plays these cards it's a cry for help:  They haven't the foggiest how to deal with the story in its present form, so it should either be decomposed or sidelined until they can become more familiar with what's required to deliver it.  However, this raises this question:  Why do we have these cards in the series?

These Amps Go To 11

In the famous heavy metal rock band mockumentary, This is Spinal Tap, there is a well-known scene where band guitarist Nigel Tufnel tries to impress an interviewer with how loud his guitar amplifiers are because they can be turned up one more "notch" than standard amplifiers which only go to "10".  When the interviewer asks him why not make 10 the loudest setting,  a bewildered Tufnel replies: "These go to 11."

 

While hilarious, Nigel's attachment to his amps illustrate the same cognitive bias that underpins the 40, 100 and infinity estimation buckets:  At a certain point both the "loudness" of the amps and the uncertainty and risk of an estimate aren't better represented by inconsequential placeholder values:  It's all relative.  And that's the point.

Ambiguous Precision

In his post, Mike argues that while not always used, these upper-scale estimates should be retained because to do otherwise "is like deciding to strike 'millions' and 'billions' from our vocabulary just because our bank balances are in the thousands".  This "ambiguous precision" is necessary to provide extremely coarse-grade estimates that may not be taken to the bank, but can help to inform whether a release will land in this quarter or one further down the road depending on the customer's tolerance for error.

Perhaps - but I'm inclined to disagree because once we get into the 'millions' and 'billions' in our estimates they begin to mean less and less to us and our customers because we are more concerned with the "thousands".  By retaining the additional "zeros" we're not adding any clarity, and indeed it becomes somewhat semantic when we start to try and delineate the difference between a 40, 100 and infinite point story - how many angels can dance on the head of a pin?

What we need to bear in mind is that these are estimates. As Barry Boehm and Steve McConnell pointed out decades ago with their famous Cone of Uncertainty graph, they are most unreliable the further out from implementation that they're devised:

By rationalizing that we need grossly ambiguous numbers to forecast distantly future releases puts a little too much stock into why we estimate with a non-linear point scale in the first place.  It's not about any degree of precision - we just need to convey that the further in the future we try to look ahead, the more likely we are to be really, really wrong.

In his paper introducing Planning Poker, James Grenning notes:

As the estimates get longer, the precision goes down. There are cards for 1,2,3,5,7,10 days and infinity. This deck might help you keep your story size under 2 weeks. Its common experience that story estimates longer than 2 weeks often go over budget. If a story is longer than 2 weeks, play the infinity card and make the customer split the story.

Similarly, in his 2005 book, Mike suggests:

Because we are best within a single order of magnitude, we would like to have most of our estimates in such a range.  Two estimation scales I've had good success with are:

1, 2, 3, 5 and 8 [Fibonacci] 

1, 2, 4 and 8

In other words, a team's confidence in their estimates is expressed as a function of the inverse relationship between uncertainty and risk and reliability and precision - this is the intended consequence of using a non-linear point scale:

 

However, there is a point of diminishing returns that occurs when we add more numbers to the scale - it becomes increasingly difficult and consequently pointless to assess confidence.  While we could argue that a 100-point story is an order of magnitude more complex and uncertain than a 40-point story, or two orders of magnitude more complex than an 21-point story, it doesn't convey  what's more important:  Irrespective of the value, it's a signal of diminishing confidence from the team.

In my experience, the early warning signal is tripped when I see 21 point stories starting to appear in the Product Backlog that's been prioritized for an upcoming Sprint Planning session, indicating that as far as the team is concerned, the feature may not be in a reliable state for inclusion in the Sprint.  This doesn't mean it's an automatic out, but rather one that needs further discussion and a shared understanding before committing - sometimes the high estimate is tossed out just to allow the team to push back on being overloaded  This is an indication from the team for assistance.  It could remain on the backlog, or become a candidate for refining and decomposing.  In any event, it is a signal.

However, even with a "21" point story, teams usually have some idea of what to do.  What about features that are added to the backlog as rough ideas?

Introducing The 'X' Factor:

In mathematics, the first three to four letters of the alphabet are designated for known quantities while the last three are reserved for unknown quantities.  In keeping with this tradition, I'd like to suggest an alternative estimate signal for features and stories that teams cannot reliably estimate with reasonable precision: x.

 

By replacing the 40, 100 and infinity cards with an 'x' card, teams have options for how they want to flag stories that need attention:

  1. They can assign'x'an agreed-upon value, eg. 50 or 100, if it makes everyone feel better having this ambiguous certainty, or;
  2. They can leave 'x' as an explicit signal that a story isn't developed-enough for inclusion into a Sprint and requires more work to either break down, refine acceptance criteria or even provide a definition of 'done', or;
  3. They can indicate stories for exclusion from the backlog.

In all cases, seeing an 'x'-estimated story suggests an immediate and natural conversation needs to occur about what to do next.And this is cuts the core about why we use story cards and planning poker instead of byzantine use cases and GANTT charts:  They are cues for continual conversations about what needs to be built.

With respect to the forecasting quandry that Mike relates in his blog post, given that any estimate in the upper-echelons is incredibly speculative to begin with, 'x' can be assigned a raw value and added accordingly.  However, my preference would be to discuss the situation with the customer:  If there is a preponderance of 'x'-factor stories on the backlog that the team and Product Owner aren't able to delineate, we can only reliably forecast against those which we can reliably estimate.

Anything else is pure conjecture - and needs to be explicitly stated.


What do you think?  Do you agree with using 'x' as a replacement for the 40, 100 and infinity buckets?  Is there a better sequence?

blog comments powered by Disqus