Agile Estimation Theatre

We’ve all experienced what I like to call Agile Estimation Theatre. Imagine it’s the beginning of the sprint and your team is holding a sprint planning meeting. The project manager has built up a set of stories a few of which have subtasks. Now you will spend the next two hours trying to create estimates for each of those stories. 

The project manager starts by reading the first story. Someone asks a question about whether ECS or EC2 are a better solution for the problem which devolves into a 20 minute conversation about the merits of various AWS services. The project manager is forced to stop discussion to leave any possibility of estimating all the stories slated for the sprint. You open your planning poker tab and evaluate your options 1,3,5,7,13,21,34, 55. After a moment you decide on 13 since you are sure the task will take at least a day or two. One of your coworkers then asks whether the numbers are days or hours. That question leads to a 10 minute discussion of what exactly story points are and how they are definitely not a unit of time. The teams votes range from 5 to 34. And then project manager overrides the score to be 21 since that’s what the senior engineer picked.

The two hours continue to drag on. By the blessed end of the meeting your team has ‘estimated’ 4 out of 20 single sentence stories. Since the sprint starts tomorrow the project manager holds a quick meeting with the senior engineer to estimate the rest of the stories before 5pm.

The above tale represents what I like to call agile estimation theatre. We have a sprint planning meeting where we’re supposed to involve the entire team in estimation, so we schedule a two hour meeting and hope for the best. Sometimes it actually works, but often we end up with questionable estimates for single sentence tasks.

The fundamental problem with Agile Estimation Theatre is not investing enough time. To get good estimates you need to have complete stories. You need to break down subtasks and specify them more thoroughly than a single sentence can possibly handle. Then you need to have your team spend time reviewing the stories individually and coming up with estimates. The problem is that it doesn’t make sense to spend several days out of a two week sprint on estimation. To do it right you need to have well specified stories. Building out the stories will take at least a few engineer days. Next you need to have the entire team spend 2 hours each estimating stories individually. Finally you add together the individual estimates and do the sprint planning meeting. Now repeat that process again every two weeks.

The breakdown is that the expected value of the estimates is typically less than the cost of spending a few days to produce good estimates. The organization intuitively knows this which is why we spend two hours in sprint planning instead of two days. But because we want to do scrum or agile ‘properly’ we need estimates. The end results is that we denature the estimates to comply with the process.

Request for stack 2024

Every few years it’s good to ask, “What is the best stack to build new software in?”. Lately I’ve been thinking about the resurgence of vertical scaling over horizontal scaling. If you app is only going to serve millions of requests per day, there is no reason not to vertically scale it. The days of 2GB ram, 1 vcpu servers are long gone. Now you can get terabytes of ram and thousands of virtual cpus. 

https://medium.com/@fengruohang/postgres-is-eating-the-database-world-157c204dcfc4
https://motherduck.com/blog/big-data-is-dead/

My prediction is that going forward AI will consume 99% of compute spend  with actual code consuming only 1% of the total compute. Those things in mind lets look at the stack. 

Application Server 

  • Rails if it has a UI
  • Java if it does not

Cache 

  • Redis

Database 

  • Postgres

Analytics Database

  • Postgres

Each ‘cell’, to use Amazon terminology, consists of one application server, a cache and a database. Just three servers. Then for reliability setup another ‘cell’ in a different availability zone. In total you have 6 servers. 

For someone who has run projects with 100s of cloud ‘servers’ 6 seems rather few. But with today’s cloud we can have 1000 vcpus across 100 ‘servers’ or across 6 ‘servers’. And the performance of the application built with only 6 will be better. 

Today’s default micro-service architecture is useful for improving deployment cadence in large teams. But it no longer provides scaling advantages compared to old school vertical strategies. If your project only includes 40~ servers with 2 cpu each, why not just use 3 servers? 80 vcpu is 80 vcpu, whether you draw 40 boxes or 3 boxes around them. 

Unless you are using 1000+ vcpu you might want to consider just using one box for your entire application. 

A clear sign you are overdoing micro services 

Fine grained services

Microservices have been the thing for over 15 years. They are great in large companies with CI/CD environments. But as your situation drifts farther away from the ideal microservice use case traps abound. 

Building a new service for one endpoint 

If you find yourself having a conversation where you need to create a new endpoint somewhere, but adding it to any of your existing services would break the concept of that microservice. Turning it instantly into a ball of mud with no clear purpose. You have fallen into this trap. microservice does not mean each service has only one HTTP endpoint. That use case is better served with Cloud functions like AWS lambda. 

The problem here is that we have gone too far in splitting up the monolith. Splitting a monolith with 100 HTTP endpoints into a dozen or so services with eight endpoints each is great. Splitting up a monolith with 100 endpoints into 100 services is counter productive. Instead of having an actual purpose the single endpoint microservice becomes the xyz endpoint microservice. 

Endpoints are things that microservices empower. An endpoint in of itself should never justify the creation of a microservice. 

KPIs for Software Engineers

Key Performance Indicators are a common business practice. They are a quantifiable measure of performance for a specific objective. Occasionally, I am asked to create my own KPIs as an individual contributor. I think it is a bit strange, after all I work for the company, you’d think they would tell me what the KPIs are!

Ideally we want to avoid metrics which are created arbitrarily by team members. Hours worked is a great example of this especially in remote work environments. In hourly workplaces employees check in and check out to validate hours worked. I have never seen a software company do anything comparable. Typically, the software company employee is asked to fill in timesheets based on their personal memory with zero accountability. That scenario does not make for a good KPI. 

For our KPI metrics we have the following criteria:

  • countable 
  • verifiable by external system
  • valuable 
  • Individually attributable 

Countable 

KPIs need to be quantifiable. It should be easy to number how many of X someone completed

Verifiable

There should be impartial systems tracking KPI completion. We want to use systems like PagerDuty, JIRA, Github, etc to source our KPI data. 

Valuable

KPIs should relate to valuable activities for the company. We want our staff focused on things that are important. 

Individually attributable

KPIs should be based on individually attributable work. We want to avoid subjective judgements of how much of task X engineer 1 did vs engineer 2. We also do not want to encourage infighting over who gets credit for which tasks. 

Here are a few measurable things we could use for KPIs. 

  • lines of code 
  • commits 
  • bugs fixed
  • tests written 
  • Pull requests opened | accepted / rejected
  • Pull request comments
  • Pull requests reviewed | approved / rejected
  • story tickets completed
  • stories estimated 
  • Projects lead 
  • Projects delivered / Epics delivered 
  • architecture documents published
  • architecture documents reviewed / comments / approved / rejected
  • Documentation pages published
  • oncall response time when paged
  • pages recieved daytime/off hours
  • Prod deployments 
  • meetings attended
  • Pair programming sessions attended
  • Junior dev questions answered (answers documented in wiki / private stack overflow)

KPIs are a useful concept for businesses to track their performance. But they are often really ideal for business groups to examine their performance. Individual contributors rarely can claim responsibility for things like increasing the subscription renewal rate from 1% to 10%.

While this list is not exclusive it should include most of the trackable numerical things that software engineers do in their job. Then if you need to come up with some KPIs for yourself you can just pick from this list.

If you can think of some more good metrics for software engineers let us know!

Up in the air

We are in a phase where planning becomes quite difficult. ChatGPT has started a capitalistic AI war. Microsoft swept in to shepherd commercialization. Google is on the back foot for now. Amazon will launch something I have no doubt. ChatGPT style tech would make Alexa viable by solving the fractal conversation problem. 

The players are moving, immense amounts of capital has been unleashed. But for us on the outside it’s a difficult time. You can’t really plan for the future. Because the technology is advancing rapidly and is already transforming jobs in various industries.

GPT-4 has been in the news, but Midjourney has quietly advanced to the point where it is transforming job tasks in the graphic design industry. I read a complaint by a graphic designer this weekend describing how his job has become more prompt engineering than graphic design. Instead of needing to draw things he and his peers can now use AI image generation and then clean it up in photoshop. 

Video created by demonflyingfox using MidJourney V4.

In 2022 I ordered physical versions of two AI generated images that I thought were incredible examples of what AI could do. In 2023 these images are somewhat quaint. AI image generation can do so much more now. 

We don’t really know where things are going. How do you prepare exactly when the potential paths are so divergent? 

Some people claim AI will replace programmers. Others say we will never not need people to dig deep into the technical details. Personally, I lean towards the second. If AI coding hasn’t peaked yet we will likely see a 1000x increase in the amount of code being written. ChatGPT is quite good at explaining things but will it be useful at explaining interactions between multiple programs it has written? We can’t know at this point. 

Image of a line going exponential. Credit to Luke Muehlhauser who created and watermarked this image.

We are in the straight line at the far right now. We’ve discovered something about meaning in these large language models. A mapping between language and image, and mappings between language and language. It’s not AGI, but much like Deep Blue its obviously eclipsed human capabilities in some way. 

Neal Stephenson’s ‘The Diamond Age’ is a book I was intrigued by in my younger years. In it a girl is given a AI powered book which acts as her tutor from a very young age.  Much like that fictional book ChatGPT likely will become every child’s tutor going forward. Much like the iPhone, you won’t be able to buy a better one. Children have already used ChatGPT to make homework and writing assignments obsolete. The education system likely will not survive this advancement. 

The sum total of human knowledge has been put into this machine. Everyone who ever wrote anything is part of it. Buckle up. Don’t panic. Hold on. Let’s see what happens next.