Software Engineering – Sledgeworx Software

Why AI Investments makes sense

With AI investment arguably exceeding $1 trillion over the last few years many people are concerned about a bubble. Unlike the constant doomsayers claiming ‘AI has peaked’ despite the last improvement being released a week ago, a bubble could be a concern. But frankly I don’t think we are in that much of a bubble.

Anthropic has revenues around 7 billion a year at this point. Google has been profitable for a long time. OpenAI has raised the most money relative to its revenues but it has 700 million weekly active users. Once they roll out ads and monetization the economics will change quite a lot.

Amazon wasn’t profitable for 20 years. OpenAI has only been a for profit corporation for three years. A ton of money is being spent. It feels like a lot of other bubbles we’ve had in the last few decades. Let’s go through the logic of further investment.

Let’s say you are investing in racks of Nvidia chips. What determines your return on investment for those chips? Mainly utilization and your margin charged to cloud users. If there is a lot of demand for AI inference you will have high utilization. If your utilization is high you will be able to raise prices to increase your margin. It boils down to demand for inference and to an extent training.

So an investment on racks of Nvidia chips depends on demand for AI inference to be profitable. So we have to ask will AI inference demand go up or down? What could make demand go down? What would make demand go up?

An example of something that makes inference demand go up is the invention of chain of thought ‘Reasoning’ AI models These models use more inference to produce the same amount of higher quality output. If you were buying racks of Nvidia chips and you heard about chain of thought you would try to double your order.

Something that might make inference demand go down is the original DeepSeek announcement. They managed to make a competitive model using far less training resources than anyone else. We had a minor stock market crash in reaction.

Here is my basic argument for why investment in racks of chips are a good idea at this time.

The ‘smarter’ models get the more demand for AI there will be.

The more ways we figure out how to compose LLMs the more demand for AI there will be.

We’re currently in an immense competition between 5+ frontier AI labs to improve LLM based AI to the absolute limit. At this point we are seeing improvements month over month. When AI was useless, prior to GPT3, inference demand was very low. Today we have millions of people using ChatGPT each week. The better models get the more people want to use them.

Next we are seeing more and more ways to compose models. Claude Code takes a single human command and splits it up into tasks which are converted into a myriad of smaller inference tasks. One human prompt might result in a dozen AI generated prompts. Instead of one inference call you are getting a dozen inferences calls serving just one human request. This approach lets AI serve human demands that weren’t possible for LLMs before with even more inference.

Basically, the smarter the output of AI, the more value humans get out of it, the more demand there will be for inference. Even if we make efficiency improvements in inference that should simply increase demand. If you are making a profit per request you will increase the number of requests you make as the price per requests goes down.

What’s the other side on this? Well, if AI stops getting smarter a lot of companies are going to make far lower returns on investment than they hoped. But I think that is a really stupid bet to make. You don’t bet against further improvements in AI when improvements are coming out month over month.

In conclusion as long as LLM performance continues to improve we aren’t in an AI bubble. Once gains start to slow the bubble is over. My view is that if we get to a point where improvements are coming more slowly than once a year we will have hit the plateau in LLM based AI. But for now we are seeing month over month improvements in AI performance. I don’t think we are in a bubble.

Vibecoding Demo

I released a short demo of AI assisted vibe coding to the youtube channel. Basic games like this are one of my favorite use cases for vibecoding. LLMs are pretty good at it and the pay off is quick. Back in my teens I tried to learn programming to make games. I’d have gotten a lot farther if we’d had AI then instead of trying to figure out C++ from a two inch think book!

To learn more about AI assisted ‘vibecoding’ check out the book Code Without Learning to Code at https://codewithoutlearningtocode.com.

An Excellent AI use case: Generating PlantUML and Mermaid schemas.

If you’ve ever worked with PlantUML or Mermaid before it’s easy to forget the domain specific language used to build the diagrams. You sketch out your whiteboard diagram, discuss it with coworkers and are ready to start on your architecture document only to sit there stumped trying to remember how to convert your drawing into PlantUML.

Well the good news is that Claude can do it for you. Just take a picture of your whiteboard and Claude can convert it into Mermaid or PlantUML for you. I tried both and Claude is better at Mermaid but can do PlantUML.

Unfortunately, Claude’s first attempt had an error. I fiddled with it a bit in the editor, but Claude was able to fix it faster than I was.

Claude fixed the issue and I was able to generate the following PlantUML visualization.

Here is the original image compared against the PlantUML Claude generated. The relationships are correctly mapped even if the physical placement is a bit different.

Lastly, I asked Claude to do the same thing with the Mermaid diagram visualizer. Mermaid does the same things as PlantUML, it’s just newer and easier to use.

To my surprise Claude not only managed to do it in one attempt. But Mermaid is built into the Anthropic UI.

If you’re ever stuck trying to remember the PlantUML DSL just ask Claude to do it.

KPIs for Software Engineers

Key Performance Indicators are a common business practice. They are a quantifiable measure of performance for a specific objective. Occasionally, I am asked to create my own KPIs as an individual contributor. I think it is a bit strange, after all I work for the company, you’d think they would tell me what the KPIs are!

Ideally we want to avoid metrics which are created arbitrarily by team members. Hours worked is a great example of this especially in remote work environments. In hourly workplaces employees check in and check out to validate hours worked. I have never seen a software company do anything comparable. Typically, the software company employee is asked to fill in timesheets based on their personal memory with zero accountability. That scenario does not make for a good KPI.

For our KPI metrics we have the following criteria.

Countable

KPIs need to be quantifiable. It should be easy to number how many of X someone completed

Verifiable

There should be impartial systems tracking KPI completion. We want to use systems like PagerDuty, JIRA, Github, etc to source our KPI data.

Valuable

KPIs should relate to valuable activities for the company. We want our staff focused on things that are important.

Individually attributable

KPIs should be based on individually attributable work. We want to avoid subjective judgements of how much of task X engineer 1 did vs engineer 2. We also do not want to encourage infighting over who gets credit for which tasks.

Here are a few measurable things we could use for KPIs.

lines of code
commits
bugs fixed
tests written
Pull requests opened | accepted / rejected
Pull request comments
Pull requests reviewed | approved / rejected
story tickets completed
stories estimated
Projects lead
Projects delivered / Epics delivered
architecture documents published
architecture documents reviewed / comments / approved / rejected
Documentation pages published
oncall response time when paged
pages recieved daytime/off hours
Prod deployments
meetings attended
Pair programming sessions attended
Junior dev questions answered (answers documented in wiki / private stack overflow)

KPIs are a useful concept for businesses to track their performance. But they are often really ideal for business groups to examine their performance. Individual contributors rarely can claim responsibility for things like increasing the subscription renewal rate from 1% to 10%.

While this list is not exclusive it should include most of the trackable numerical things that software engineers do in their job. Then if you need to come up with some KPIs for yourself you can just pick from this list.

If you can think of some more good metrics for software engineers let us know!

Agile has gone from ‘people over process’ to ’no process’

In practice ‘agile’ means ‘we have no process in place and each team does whatever random thing the manager wants to try next’. Sometimes that is SAFE sometimes its SCRUM, usually it’s a combination of different things. This isn’t necessarily a bad thing, but there are trade offs.

The first is standardization. If every team follows a different process it’s difficult to understand what is going on at a management level. Which teams are productive? Which teams are in downward spirals? If you don’t have a standard to judge against you can’t find out.

Secondly, away team work is much harder. Working with a team that uses the same development process, pipeline setup, programming language and frameworks is easy. On the other hand working in the code base for a team which uses a different language, framework, architecture, etc is very difficult. Not supporting away team work severely limits your ability to integrate internal software components.

Thirdly, Estimates are not possible in this kind of environment. Since the process changes constantly historical data becomes useless. In response to this most companies don’t even keep historical data. The main use for estimates is ensuring that ‘burn down’ charts follow the 45 degree angle managers love.

https://www.estimating.dev/there-is-no-team-calibrating-teams-vs-individual-estimators/

Subjective expert predictions are a valid form of estimating software tasks. But if you don’t have historical data to calibrate against estimates devolve into gaming the system.

When you change how estimates are made, when tickets are considered done and the sprint cadence every 3-6 months there is no way you can have cross company data on productivity. The lack of process empowers management to obfuscate the productivity of their teams. The pursuit of the best process gives technical organizations a great excuse as to why they have no idea if their processes have improved over the last two years or not.

In this type of environment all judgements have to be made based on subjective gut feelings and corporate politics. You don’t know which VP’s processes are better than the other because neither has any accountability. You don’t know which technical department is more efficient because neither the estimates or logged hours can be trusted.

‘We’re being agile’ has become the excuse to follow whatever process you want. Instead of ‘people over process’ it has become ‘no process’.