Everyone uses (failing) software all the time.

Because you use it all the time at least one piece of software is broken for you at all times.

I stopped using Facebook after my freshman year of college, but recently got pulled back in by a Facebook group. As a result I now have the pleasure of enjoying a 10+ second loading phase every time I open the homepage. 

Recently, I tried to buy a CODE mechanical keyboard on the wasdkeyboards.com website. But every time I submitted my order it failed. I tried different browsers. I had to look into the console to find out that a http request was failing to find a paypal advertising domain that my PiHole blocks on the network. To buy my keyboard I had to tether wifi from my smartphone. A non-technical user wouldn’t have been able to find out why the order failed because there was no error message. There was a spinning symbol that just disappeared after a while without a message to the user. 

Everyone uses software all the time now. We have smartphones, smart TVs, smart refrigerators and smart homes. If you use 100 programs a day, 99% uptime means one program is down for every person. If every application manages 99.9% uptime, one out of a hundred people is experiencing software brokenness everyday. 

Then realize that billions of people have smartphones now. 

99.99% * 1,000,000,000 = 100,000. 

If your software has a billion users and works 99.99% of the time, its down for 100,000 people all the time. 

Software updates are a liability for customers

Software Updates are a liability for customers. Imagine a business, say a restaurant, which uses some POS (point of sale) software to run its business. Every time the software is updated the business’s employees have to relearn some of the software. That costs money. They would rather buy software that never needs to be updated.

You might say “Well, what if there are security vulnerabilities”. Fixing those vulnerabilities still costs money. Customers will have to update their software which costs money even if there is no functional change. Its essentially saying, “Hey, Customers, we shipped you a defective product, please spend money to get the update which fixes this vulnerability”. 

Many companies have moved to SAAS (software as a service) which solves the problem of getting customers to install updates by forcing them to. Now, instead of customers constantly digging in their heels and refusing to update anything, we can ship them updates constantly. 

Customers hate software updates. They are slow, buttons move around, and people lose the fluid mastery that they painstakingly learned. Software should empower customers, every time you ship an update that breaks fluid mastery, customers are kicked off of Olympus. 

Links Post — Q4 2019

Here are some interesting links I’ve come across over the last few months. 

htop explained 


4 Types of Documentation 


Making a Language Server 


Don’t move to the Cloud to increase CPU utilization

I have worked with a bunch of companies that launched major initiatives to move their hardware onto the public cloud. None of those companies managed to get their CPU utilization over 10%. At my current job we run Java with 10-20% cpu utilization and 90% memory reserved for the JVM. The standard software development approach does not result in amazing hardware utilization rates. 

The consulting clients we worked with at my previous job expressed a lot of interest in increasing efficiency. We theorized complex tagging and did proof of concepts with Cloudability. But I do not recall actual savings coming out of it. Although there was a lot of complaining about the AWS bill being high. 

One Fortune 500 company I worked with had a dev environment with a huge number of hosts (1000+) that were basically never utilized at all. That company only used continuous delivery in development, not for production deployments. The other obvious issue at that company was their reluctance to rely on AWS Autoscaling groups to handle load spikes. They allocated for peak load despite running in the cloud. 

One concern that came up a couple times was that Autoscaling groups have a scaling response rate in the order of minutes. In the event of a traffic spike, it might be 5-10 minutes before extra hosts come online. 

If you are worried about unexpected instantaneous peaks write a fallback. Serve a landing page our of cache and sit tight. There is no magic solution to instantly increasing your traffic by 100x without scaling preemptively. 

The largest websites in the world scale ahead of time. We know when we will get lots of traffic historically. You know when your Super Bowl add is going live. Scale up a week before hand. Run load tests to make sure you can handle the traffic. 

Run your servers at 30-60% utilization. Build a fallback page for big instantaneous peaks. Most importantly know ahead of time what your traffic is going to look like so you can prepare. 

The top cause of outages is changing code.

The last week of my oncall shift has been pretty quite. The holiday season has elevated traffic around 50% higher than normal, but I haven’t really noticed. There haven’t been any service outages and it almost feels like I’m not really oncall. Why has it been so quite? We haven’t deployed any code for two weeks. Not deploying code means we aren’t deploying any bugs to production. 

After a deployment you will probably notice any defects over the next couple days. Once you have fixed those it is smooth sailing for that version. 

Continuous Deployment makes it easy to deploy bugs 10x a day. Agile gives you a justification to deploy 10x a day. Ask yourself, what are you deploying each day? A CSS change to a button? A new option in a drop-down. A re-write of the graphing functionality because no one can understand the current implementation? A new feature like Google Docs integration?

If you could only deploy 1 feature each week what would make the cut?

Sorry, we didn’t mean to break that for you! But we aren’t going to fix it.

Using Business to Consumer SAAS means getting your UX broken all the time.

Sorry, but we are moving the product in another direction. We have changed the interface to a whole new design. Yes, we don’t have feature parity with the old experience, but we will get there soon. Soon in this case means in six months, after we finish the international rollout we will start to fill in the missing features. 

Do you remember the drama around Windows 10’s new UX? It was and is obviously worse for power users, but Microsoft didn’t care. How about when Microsoft Office added the new Ribbon UI and no one could find anything anymore? B2C software doesn’t care about power users. If you buy one copy of Office and use it 100x as much as the average dude, a business analyst is thinking about how to get you to buy 100 copies. 

Then there was the years where new versions of Mac OS were so bad that they had to stop charging for OS updates because no one would buy them anymore. Customers don’t like UI updates in general. Every UI update means learning a new set of commands. People don’t want to take a tutorial the first time they open an app. Getting people to redo the tutorial every time you release a new update is basically impossible. 

A UX update on a product I worked on broke the application for a bunch of our power users. Developing the MVP version of the new UX took several teams months. We had been maintaining a blacklist of users while we worked on the new User Experience. Once the new experience was released and stable we decided that we were ready to launch the new experience to everyone and eliminate blacklists. This led to a flurry of customer service calls by customers who no longer could use the application for its basic purpose. 

Customers probably don’t want a UX break unless it’s at least a 10x improvement on whatever you had before. If find yourself releasing a new experience that is LESS functional than the current experience. STOP.

git format-patch and git am

# format-patch creates a patch file
# the -1 $COMMIT arguments take the last commit and put it into the file
git format-patch -1 14ab6d…..

#Applies the commits in the patch file onto the current branch
git am ../folder/2019-10-17/0001-SLEDGE-32

#This git log command pulls the history of a file into one patch
git log --pretty=email --patch-with-stat --reverse -- path/file_or_dir 
git am <  path/to/file_or_dir 

PlantUML a text based diagramming language

One of the senior engineers at my job is a big fan of PlantUML, so I recommended it to one of the junior guys who needed a diagramming tool. I’ve been taking a look myself since I have never had a goto diagramming tool. 

PlantUML is text based language. You can define structs and their relationships with other items. There are a lot of keywords, which can be a bit confusing, but it generates pretty good diagrams.

Here is the text for a system diagram and the image it generates below.

actor actor [
  a user
database postgres
queue celery 
stack redis

node django [
  Django webservice

node worker [
  Turtle Detector

boundary boundary [

cloud cloud [

actor --> cloud

cloud --> boundary

boundary --> django

django --> postgres
django --> celery

celery -> worker
worker -> redis
redis --> django

Here is the code for a smaller class relationship diagram.

class User {
  +customerId : String
  ~submissions : Submission[]

class Submission {
   -size : int[][] 
   #image : int[][]


User <|-- Submission

class TurtleModel {
   ~model : Pytorch.GAN

Turtle Generator Project Idea

New programmers sometimes ask me “what project should I work on next?”. This project is one I drafted up for myself because I wanted to build a more complex application with Django and pytorch.

The Turtle Generator Project is an attempt to create a website on which people can submit pictures of turtles and vote on whether machine generated turtles are “turtle” or “not turtle”. The user submissions and votes form a GAN or generative adversarial network, both classifying and producing pictures of turtles, although we may use user submissions as part of our dataset of turtles.

Pages / Components

Draw or Submit turtle component

Drawing component where a visitor can use their mouse or touchpad to to draw a turtle and submit it to the turtles dataset.

Vote component

A component consisting of a picture of a turtle generated by our backend algorithms and a button which says “turtle” or “not turtle”.


Django frontend + PostgresDB
serves web pages and handles user interaction
votes and turtle drawing submissions are submitted to the MachineLearner system via Celery
submits “turtle image” request to celery queue — gets turtle back

Possibly Reactjs or just django templates

Machine Learner
online machine learning system based on pytorch
takes celery tasks and either
generates a picture of a turtle
adds a vote submission to the training data ( classification )
adds a turtle picture submission to the training data

Celery + Redis
Message queue used to handle queuing training tasks

Software as a Service benefits both companies and consumers

I was surfing reddit this week and I saw a great comment from u/Swordbow. 

“Thus as the obligation stretches across time, the cash flow must stretch across time.” 

–Quote from u/Swordbow on reddit

Computer software isn’t like classical machinery in that bugs and vulnerabilities can be discovered after release that destroy the security of the application. A year after release you might have to perform an urgent patch to protect your customers from hacking. If your company goes bankrupt during that year, customers are in a very bad situation. Software needs to be maintained and unlike typical machinery it has to be maintained by the creators not the end users. 

Selling new releases of software is hard. Programs can be copied at zero marginal cost and don’t really wear out. Getting customers to buy a new version every year is a lot harder than getting customers to pay a monthly subscription. With Software as a Service instead of getting paid when you sell a new version of your software, you get paid as long as customers use your software. And you don’t have to worry about dealing with upgrades. 

Companies can reduce consumer surplus by switching to a SAAS model that requires customers to pay as long as they use the software. With SAAS all customers pay for every update.