Burnout or health problem?

I experienced a lot of burnout last year. Usually, I can just outlast burnout and it goes away when I start a new project. This time it lasted most of the year and I didn’t really get back to normal until 2021. In my case the reason my ‘burnout’ lasted so long and was hard to get rid of is because it was actually a health problem. I got into this industry because I love coding but I didn’t really feel that way last year and even considered leaving the industry. I had a lot of troubling focusing with the single minded obsession you need to beat down tricky bugs. At the time I thought I had burnout, I was depressed, it had to be something like that right? 

Well in this case it wasn’t burnout or depression, but my diet. I’ve had issues with various foods making me sick in the past, but never really isolated the cause beyond ‘don’t buy bread with preservatives in it’. I eat out a lot and while I’d thought about doing an exclusion diet several years ago, and have built apps in the past to help people isolate food ingredients they have issues with, I’d never actually done an exclusion diet myself. 

In the end I started an exclusion diet in January after moving into my new apartment here in Phoenix. The effects in my case have been totally worth it, as I now know which foods I can and cannot eat if I want to think clearly. I can’t really say I understand the medical causes behind things, but a number of foods give me a combination of brain fog, headaches and stomach discomfort. It isn’t life threatening beyond making me suicidal on occasion. But I really can’t afford to eat anything with Sage in it without ruining an entire day. 

So far I’ve had to exclude Sage, preserved meats, and wheat or gluten products. I can eat most other foods without a problem.  

Solutions Engineer, Cloud Architect what does it all mean?

There are a ton of titles in the software industry. Two slightly misleading ones are Solutions Engineer and Cloud Architect. These sound like technical roles but they are not on the software engineering career ladder. Solutions Engineer and Cloud Architect are pre-sales roles. 

What is Pre-Sales? 

Pre-Sales is the part of a project that happens before the sale. If you are selling a software product or digital transformation, you need someone technical to evaluate where the prospect’s software stack is. The Solutions Engineer is the technical expert who joins the sales team and helps build out what needs to be included in the deal. That entails researching the prospect’s tech stack, interviewing their engineers and doing technical demos for the potential client. Solutions Engineers sometimes need to travel to client offices as part of the job. You might do some coding to create demos and prototypes but that is about it. Don’t expect your technical skills to grow while working as a Solutions Engineer. But you will get lots of opportunities to interact with clients, make connections and learn about what different companies are doing on the tech side. 

Then what is a Cloud Architect?

A Cloud Architect is a solutions engineer working with potential clients of public cloud services. A Cloud Architect would work with clients on what services they need and how to migrate their websites and services to the cloud. They will demo cloud services and help develop plans for customers to migrate their workloads to the cloud. 

Why become a Solutions Engineer?

Becoming a Solutions Engineer is a great way for software engineers with decent social skills to move into a more sales focused role. You will have opportunities to present and learn about customers that you probably do not as an engineer on a product team. 

Knowing more about the sales process can help you prepare to run your own agency or move up the corporate ladder on the business side. 

Software advances are slower than you expect.

Most people think of technological advances using the eureka metaphor. But software doesn’t work like that. Take a clear technological advance like a self-driving tractor. They are on the market now but there was no eureka moment, no sudden breakthrough. Self-driving technology just advanced to the base level required for tractors on clearly mapped fields, then a team of software engineers built a working system over several years.

There’s no invention or breakthrough moment, just a slow build in none software capabilities than an investment in building out software to leverage those capabilities. 

There were no software engineering advances that enabled the self-driving tractor. Instead machine vision improved until it was good enough to unblock the software solution.

What advances in software are in sight?

Reproducible builds are one of my favorite advances in software recently. Strictly speaking reproducible builds have been available for quite some time, but that doesn’t mean it isn’t a real advance. Having a trustworthy build environment where you can debug the inputs vs the outputs for different machines is a big benefit. Without reproducible builds software engineers end up spending extra time fixing build failures and figuring out dependency resolution overrides. 

Golang is in my opinion an advancement in software engineering. Instead of focusing on productivity for a single or small number of engineers, Golang is focused on maximizing productivity for large software projects with thousands of engineers. It does this by simplifying the language as much as possible to ease readability and eliminate magical effects. Everything in Golang is very procedural and easy to follow. 

Easy to use gradual typing is a more recent advancement in the Python and Ruby worlds. You can now build your MVP in a dynamic quick evaluating language and then after you reach product market fit you can add types to the code base. Typing is no longer and either/or thing but a sliding scale where you can chose the most opportune time to transition. Overall I think gradual typing has huge advantages compared to the old standard of rewriting the codebase in Java. 

How to make oncall great on your team.

Being on-call for software can suck. It seems like everyone has a horror story of being woken up at 3am for an outage in their service, and then having to work until daylight to get things working again. I have been on an on-call rotation for about two years now and it has gone very well. We typically get paged during off hours once per week and resolving a page usually takes under three hours. Here are some thoughts on what makes an on-call experience great. 

Page Frequency 

The biggest thing in my opinion is that the page frequency is on the lower end at 1-2 off hours pages per week. Off hours just means outside of the typical business hours of 9-5. A wake up would be getting paged when you were asleep in the evening. Having a low rate of pages is important because it means your on-call can get adequate rest during the week. In the worst case scenario where your on-call gets woken up twice during a week long on-call shift, they will still be operating reasonably well by week end. Page frequency is especially important if you on-call rotation is smaller. 

Rotation size

Oncall rotation size is important. You need people to spread the load around. I’ve worked in rotations ranging from 3-11 people and 5-10 is the sweet spot. At that size you have about a month off between on-call shifts. Beyond 10 people and you will start to get rusty since people are only on-call about every 2 months. Having more people also makes it easier to support vacations without anyone feeling like they didn’t get a break from being on-call. Smaller rotations are bad because the engineers on the team will not get enough time between shifts to complete project work. Feature development will stall and your team basically becomes an ops team. Additionally, the bus factor is too low in a small rotation, if one person goes on vacation and the other has a power outage you might ended up with no one to respond to a page. 

Clear duties

On my team we have a clear list of things we do in response to a page. Anything else will be left for business hours. 

The things we do are;

1.  Scale up or down the fleet

2. Turn on or off a feature toggle

3. Rollback a deployment 

4*. Rollforward a fix

Note that rolling forward or patching production are the last resort. Making a code change is the slowest way to address an outage and the highest risk. Whenever possible you want to make code changes during normal office hours. 

Good Runbooks

Having good runbooks reduces the cognitive load when dealing with a service outage. They can also save significant amounts of time to fix common problems just by having the steps taken to fix last time recorded. In your regular on-call shift review meeting its best to add new entries to the runbook to cover pages during that week 

How to know your oncall team is in a good place?

  1. If it is easy for people on the team to get someone to cover for them when they go on vacation.
  2. People on the team volunteer to be oncall for peak events 
  3. People don’t complain about being oncall during 1 on 1 reviews

January Links

Scientists Say You Can Cancel the Noise but Keep Your Window Open

They will integrate these speakers into windows/walls and make it smaller

Concept of ‘feature store’ for typed ML model inputs (tensors, vectors, etc)


VM performance tests, very good blog series.


Compressing for pub/sub results in great savings.