Don’t move to the Cloud to increase CPU utilization

I have worked with a bunch of companies that launched major initiatives to move their hardware onto the public cloud. None of those companies managed to get their CPU utilization over 10%. At my current job we run Java with 10-20% cpu utilization and 90% memory reserved for the JVM. The standard software development approach does not result in amazing hardware utilization rates. 

The consulting clients we worked with at my previous job expressed a lot of interest in increasing efficiency. We theorized complex tagging and did proof of concepts with Cloudability. But I do not recall actual savings coming out of it. Although there was a lot of complaining about the AWS bill being high. 

One Fortune 500 company I worked with had a dev environment with a huge number of hosts (1000+) that were basically never utilized at all. That company only used continuous delivery in development, not for production deployments. The other obvious issue at that company was their reluctance to rely on AWS Autoscaling groups to handle load spikes. They allocated for peak load despite running in the cloud. 

One concern that came up a couple times was that Autoscaling groups have a scaling response rate in the order of minutes. In the event of a traffic spike, it might be 5-10 minutes before extra hosts come online. 

If you are worried about unexpected instantaneous peaks write a fallback. Serve a landing page our of cache and sit tight. There is no magic solution to instantly increasing your traffic by 100x without scaling preemptively. 

The largest websites in the world scale ahead of time. We know when we will get lots of traffic historically. You know when your Super Bowl add is going live. Scale up a week before hand. Run load tests to make sure you can handle the traffic. 

Run your servers at 30-60% utilization. Build a fallback page for big instantaneous peaks. Most importantly know ahead of time what your traffic is going to look like so you can prepare. 

The top cause of outages is changing code.

The last week of my oncall shift has been pretty quite. The holiday season has elevated traffic around 50% higher than normal, but I haven’t really noticed. There haven’t been any service outages and it almost feels like I’m not really oncall. Why has it been so quite? We haven’t deployed any code for two weeks. Not deploying code means we aren’t deploying any bugs to production. 

After a deployment you will probably notice any defects over the next couple days. Once you have fixed those it is smooth sailing for that version. 

Continuous Deployment makes it easy to deploy bugs 10x a day. Agile gives you a justification to deploy 10x a day. Ask yourself, what are you deploying each day? A CSS change to a button? A new option in a drop-down. A re-write of the graphing functionality because no one can understand the current implementation? A new feature like Google Docs integration?

If you could only deploy 1 feature each week what would make the cut?

Sorry, we didn’t mean to break that for you! But we aren’t going to fix it.

Using Business to Consumer SAAS means getting your UX broken all the time.

Sorry, but we are moving the product in another direction. We have changed the interface to a whole new design. Yes, we don’t have feature parity with the old experience, but we will get there soon. Soon in this case means in six months, after we finish the international rollout we will start to fill in the missing features. 

Do you remember the drama around Windows 10’s new UX? It was and is obviously worse for power users, but Microsoft didn’t care. How about when Microsoft Office added the new Ribbon UI and no one could find anything anymore? B2C software doesn’t care about power users. If you buy one copy of Office and use it 100x as much as the average dude, a business analyst is thinking about how to get you to buy 100 copies. 

Then there was the years where new versions of Mac OS were so bad that they had to stop charging for OS updates because no one would buy them anymore. Customers don’t like UI updates in general. Every UI update means learning a new set of commands. People don’t want to take a tutorial the first time they open an app. Getting people to redo the tutorial every time you release a new update is basically impossible. 

A UX update on a product I worked on broke the application for a bunch of our power users. Developing the MVP version of the new UX took several teams months. We had been maintaining a blacklist of users while we worked on the new User Experience. Once the new experience was released and stable we decided that we were ready to launch the new experience to everyone and eliminate blacklists. This led to a flurry of customer service calls by customers who no longer could use the application for its basic purpose. 

Customers probably don’t want a UX break unless it’s at least a 10x improvement on whatever you had before. If find yourself releasing a new experience that is LESS functional than the current experience. STOP.

git format-patch and git am

# format-patch creates a patch file
# the -1 $COMMIT arguments take the last commit and put it into the file
git format-patch -1 14ab6d…..

#Applies the commits in the patch file onto the current branch
git am ../folder/2019-10-17/0001-SLEDGE-32

#This git log command pulls the history of a file into one patch
git log --pretty=email --patch-with-stat --reverse -- path/file_or_dir 
git am <  path/to/file_or_dir 

PlantUML a text based diagramming language

One of the senior engineers at my job is a big fan of PlantUML, so I recommended it to one of the junior guys who needed a diagramming tool. I’ve been taking a look myself since I have never had a goto diagramming tool. 

PlantUML is text based language. You can define structs and their relationships with other items. There are a lot of keywords, which can be a bit confusing, but it generates pretty good diagrams.

Here is the text for a system diagram and the image it generates below.

@startuml
actor actor [
  a user
]
database postgres
queue celery 
stack redis

node django [
  Django webservice
]

node worker [
  Turtle Detector
]

boundary boundary [
nginx
]

cloud cloud [
cloud
]

actor --> cloud

cloud --> boundary

boundary --> django

django --> postgres
django --> celery

celery -> worker
worker -> redis
redis --> django
@enduml

Here is the code for a smaller class relationship diagram.

@startuml
class User {
  +customerId : String
  ~submissions : Submission[]
  
}

class Submission {
   -size : int[][] 
   #image : int[][]

}

User <|-- Submission


class TurtleModel {
   ~model : Pytorch.GAN
}
@enduml