What I learned from a year of Devops

In 2017 I had the opportunity to spend a year working as a devops or platform engineer. I have mainly worked as a software engineer before so moving in to an automate and operate role was a bit of a leap. This was a fully remote engagement where I was embedded with and helped bootstrap the client’s first platform team.

The project was building out continuous integration and delivery for a client of ours that had no AWS experience. Before they brought us in they ran all their systems in their own datacenters in a windows and .Net environment. We came in to assist with the move into the cloud and to help transition the company from .NET to Java, javascript and microservice development.

The first few months we focused on building out the CI/CD with Jenkins pipelines and a great deal of AWS cli scripting. Once we got the basics working teams started to come out of microservices training and began developing against it. This was the start of operational support for us and started a bit of a scramble while we tried to balance new features and the stability of the platform with hiring and onboarding.

We used jenkins pipelines, docker and cloudformation to provide our users with a solid customizable pipeline solution. Using our default templates development teams could easily bootstrap their pipeline with CI/CD from dev to canary deploys in production. If they needed more than a stateless microservice we enabled them to provide cloudformation templates in their github repository that would be run with each deploy to ensure the AWS environment was bootstrapped for their needs.

We started out with the intention of using Jenkins pipelines with ansible to automate things, but the client’s team was more experienced with CloudFormation and as a result I ended up writing most of our initial CI/CD code in a combination of groovy and AWS cli calls. This proved unwieldy and eventually led us to using Groovy + Cloudformation for nearly everything. Cloudformation works but it is locked into AWS and its programming model is a somewhat awkward. Cloudformation’s saving grace is the first class integration and editor. Next time I would recommend starting with a commitment to Terraform or Ansible.

In the 3rd quarter we started work on implementing Canary deployments. Our solution ended up being a combination of a customized client side load balancing http client and jenkins pipelines. I started us off with a proof of concept that proved easier to write than we expected which put us on good footing for the rest of the project. One of the client’s employees took advantage of the space we had to rewrite the shared jenkins pipeline library in more idiomatic language which turned out to be a great improvement.

We went live in Q4 and I moved on to another project. I am moving back into application development, I ended up doing 100% automation scripting instead of the 50-50 split I was expecting. So it will be good to get back to writing applications.

Team Skill Shaping

When running a team you need to balance bus factor and performance.

On an average software team of perhaps 10 people, you will naturally have people develop expertise in a particular part of the codebase. One developer will be an expert in the frontend, another in the SQL queries, the next in the controllers, etc. What you want to avoid is a bus factor of 1. Some teams try to keep every developer knowledgeable in every area of the system, this is a waste. If you have to work on every part of the system you will not be able to master any single part. To get a bus factor of n by rotating developers through different parts of the system you give up the efficiency from letting a developer master a subsystem. My solution is that you should focus on getting a bus factor of 2 for each major subsystem. Have people focus on the two subsystems that they are interested in or are available and leave things there. Just try to avoid having two developers working on the same two subsystems. You are unlikely to have two developers get hit by the ‘bus’ at the same time.

Aim for a bus factor of 2 while trying to avoid a lot of overlap on subsystems, then leave things there.

Changing KPIs — A tale of moving from individual contributor to team lead.

The biggest change after my move to team lead is that my KPIs (key performance indicators) have changed significantly. I still troubleshoot bugs, create architecture, discuss and persuade teammates of architectures. I get to write some code here and there. But the work that I am evaluated on has changed significantly. Instead of being evaluated on my ability to get coding done, to resolve bugs and be a good team member, I am evaluated based on the team’s performance. Was I able to keep everyone on the team from being blocked this sprint? Was I able to keep people on the team coordinated such that they didn’t duplicate code or write incompatible interfaces? Do we have the architecture and stories hashed out far ahead enough to keep working towards the release?

Its been kind of a shock to me because I will be giving my update in standup, trying to remember what I did yesterday and its something along the lines of “I sat in on a couple meetings, reviewed PRs and helped classify several bugs.” I worked all day and am exhausted now, but I didn’t commit any code or make any progress on the story I assigned to myself. It feels like I’m not getting anything done, what happened, I used to be good at my job.

But despite feeling like nothing is getting done, I am still hitting my KPIs as a team lead. My bosses are happy, the team seems happy enough with my work, the scrum master has what he needs, etc. Its not that I am not getting any work done, its that my ‘work’ has changed to something different. I am focused more on coordinating the team’s effort and planning what we need to do next, keeping abreast of the features coming down the roadmap, keeping track of technical debt and the maintenance work we need to do.

Starting a new Go project is delightful.

I started a couple projects recently, a mock crypto exchange and my latest project a unicode manipulation library. But what struck me is that its really simple to get started. You need the go runtime and a GOPATH setup.

Then you specify the package and the main method and that is a valid program.

package main

func main(){}

Above is my program so far. It is a short program with a bit of exploratory code that converts strings into their unicode rune ids. Back when I mainly used Java, I would have had to setup an IDE, integrated maven and think about package structure. In Go I just have a main.go file, if I need dependencies I create a /vendor directory and usego get github.com/gin-gonic/gin to pull the dependency.

Overall the lightweightness of the Go tooling makes it very easy to build small programs.