The top cause of outages is changing code.

The last week of my oncall shift has been pretty quite. The holiday season has elevated traffic around 50% higher than normal, but I haven’t really noticed. There haven’t been any service outages and it almost feels like I’m not really oncall. Why has it been so quite? We haven’t deployed any code for two weeks. Not deploying code means we aren’t deploying any bugs to production. 

After a deployment you will probably notice any defects over the next couple days. Once you have fixed those it is smooth sailing for that version. 

Continuous Deployment makes it easy to deploy bugs 10x a day. Agile gives you a justification to deploy 10x a day. Ask yourself, what are you deploying each day? A CSS change to a button? A new option in a drop-down. A re-write of the graphing functionality because no one can understand the current implementation? A new feature like Google Docs integration?

If you could only deploy 1 feature each week what would make the cut?

PlantUML a text based diagramming language

One of the senior engineers at my job is a big fan of PlantUML, so I recommended it to one of the junior guys who needed a diagramming tool. I’ve been taking a look myself since I have never had a goto diagramming tool. 

PlantUML is text based language. You can define structs and their relationships with other items. There are a lot of keywords, which can be a bit confusing, but it generates pretty good diagrams.

Here is the text for a system diagram and the image it generates below.

@startuml
actor actor [
  a user
]
database postgres
queue celery 
stack redis

node django [
  Django webservice
]

node worker [
  Turtle Detector
]

boundary boundary [
nginx
]

cloud cloud [
cloud
]

actor --> cloud

cloud --> boundary

boundary --> django

django --> postgres
django --> celery

celery -> worker
worker -> redis
redis --> django
@enduml

Here is the code for a smaller class relationship diagram.

@startuml
class User {
  +customerId : String
  ~submissions : Submission[]
  
}

class Submission {
   -size : int[][] 
   #image : int[][]

}

User <|-- Submission


class TurtleModel {
   ~model : Pytorch.GAN
}
@enduml

Turtle Generator Project Idea

New programmers sometimes ask me “what project should I work on next?”. This project is one I drafted up for myself because I wanted to build a more complex application with Django and pytorch.

The Turtle Generator Project is an attempt to create a website on which people can submit pictures of turtles and vote on whether machine generated turtles are “turtle” or “not turtle”. The user submissions and votes form a GAN or generative adversarial network, both classifying and producing pictures of turtles, although we may use user submissions as part of our dataset of turtles.

Pages / Components

Draw or Submit turtle component

Drawing component where a visitor can use their mouse or touchpad to to draw a turtle and submit it to the turtles dataset.

Vote component

A component consisting of a picture of a turtle generated by our backend algorithms and a button which says “turtle” or “not turtle”.

Architecture

Django frontend + PostgresDB
serves web pages and handles user interaction
votes and turtle drawing submissions are submitted to the MachineLearner system via Celery
submits “turtle image” request to celery queue — gets turtle back

Possibly Reactjs or just django templates

Machine Learner
online machine learning system based on pytorch
takes celery tasks and either
generates a picture of a turtle
adds a vote submission to the training data ( classification )
adds a turtle picture submission to the training data

Celery + Redis
Message queue used to handle queuing training tasks

Creating a good Project README

Project README flies are typically an after thought in the software development process. If a question comes up repeatedly it gets added in an unstructured fashion. This is unfortunate, because the people who need READMEs the most are new engineers who joining the team. They don’t know any of the team’s jargon. They probably don’t have a good understanding of what the project does. And they probably don’t understand the internal architecture of the project. 

You want the first part of the README to be an introduction to your project. Answer the question “Why do we have this service?”. 

To help new engineers use as little jargon as possible, and define terms in the README. 

Include a summary of the architecture of the project in the README. It should cover what abstractions you are using and why you picked the ones that you did. If you use any patterns that are not included in every project at your company make sure to mention them in the README. The last thing you want is for people to take over the project from you, not be able to figure out why you chose these abstractions and then removing them from the codebase. 

You README should also include the  steps to get the project running. What permissions and credentials do new engineers need to run builds and integration tests? Who should they contact to get those permissions? Make sure to include the common failure cases that new engineers ask questions about. 

Include a summary of the typical build process for the project. If you use make, write explanations for every make command you support and when they should be used. If you use a standard build tool like Maven, mention the extensions and plugins you use. “We use the Jacoco Plugin to ensure 80% code coverage, if you add a Spring configuration class you can add it to the ignored list for Jacoco.”

If you have integration or end to end tests in a different package reference in your README. Include an example of typical usage of the external package and expect people to read the README for that package if they run into trouble. Make sure to include common failure cases in the test suite. If external dependencies commonly cause your integration tests to fail, call out how a new engineer can determine that is the case and what they should do in response. 

Example Table of Contents for a README 

Introduction
    Why does this project exist? 
    Where can I find additional documentation?
    Where can I find our CI/CD infrastructure?

Architecture
    What is the basic architecture of the system? 
    MVC, SPA, messaging, RPC
    Do we have any managed thread pools?
    What are our asynchronous tasks?

    What patterns do we use in our codebase? 
    Explain any unusual patterns you use and why you need them.


How to get builds running
    What tools are needed to run builds?
   What build commands and flags should a new engineer be using?

How to get Tests running
    What tools to use to run unit/functional/integration/end-to-end tests
    Are any external packages needed
   How to retrieve the external packages
   Basic commands for any external packages 

   How to know if the tests passed or failed. 

Rework book review

I read REWORK by Jason Fried and David Heinemeier Hansson the founders of basecamp. The book is a series of short 200-500 word ‘sections’ that elaborate on a point. No wasted space or pages full of empty words where the point has already been made. As a result the book flows incredibly well. It is a quick and light read. The ideas in the book are commonsense lessons learned from running a successful small business. A lot of the ideas are shared with agile and the ‘lean startup’ schools of thought. But REWORK is a superior book to the ‘The Lean Startup’. Comparing the two books its clear Hansson and Fried understand the space better. 

A few points from the book stuck with me so I will go over them. 

Don’t write it down

The top customer complaints will come up so often you will never be able to forget them. You shouldn’t need a long list of customer issues, if you are listening to your customers regularly you won’t be able to ignore the top issues. If you get ten customer complaints each day and five of them are the same issue, you know what to work on. 

The myth of the overnight sensation

“And on the rare occasion that instant success does come along, it usually doesn’t last —there’s no foundation there to support it.” — page 196.

I liked this phrasing of the overnight sensation. These days social media constantly spams us with success stories and lavish lifestyles we could be living. But if you are relying on luck to succeed it might not come a second time, and then you don’t have anything left. 

Don’t scar on the first cut

Policies are only meant for situations that come up over and over again. You create a policy to make a common problem easier to solve. Without a policy you have to rely on judgement and escalating up the chain of command. That is expensive, but having a policy takes all the flexibility out of the situation. Don’t create policies unless its obvious that the issue is common and thinking about it is wasting people’s limited time. 

Four letter words

Don’t use the words “Easy”,  “Fast”, etc. Things are rarely done fast or easily. If they could be we would have done it already. Using those words implies things that we probably don’t know. 

Inspiration is perishable

If you want to do something, you have got to do it now. You can’t do it later because you won’t be inspired to do it later.