Multiple promotions for solving the same problem

One failure mode I’ve seen in software organizations is multiple promotions for building the same solution to a problem. Suppose, in your software organization you have 10 teams working in a similar problem space. In one year you see two promotions for building an asset management pipeline. When I saw the second promotion announcement, I was thinking to myself “Wait, didn’t Tyler get promoted for building an asset management pipeline?”

Any Vice-President who’s organization has this issue should be thinking hard about where it went wrong. That engineering culture is completely broken and can only be fixed by rooting out the leadership and then replacing senior engineers. 

Why is this such a big deal? Because its a symptom of several serious problems in your organization.

No information sharing between teams

One team built a solution to this problem in Q1, 6 months later another team built another solution to that problem. Why couldn’t they have shared asset management pipelines? If its a valuable thing to have why did one team go 6 months without an asset management pipeline?

Promotions are being gamed

Two people being promoted for building similar solutions to the same problem is a sign that your promotion process is being gamed. Redundant projects being lauded as keystone accomplishments is ridiculous. The manager of the 2nd team should have at least caught that this project was already a keystone. 

Important shared infrastructure is being ignored

If two teams are building the same infrastructure to solve similar problems it should be a shared service. Otherwise, you are paying to build twice and to support the system twice. This is software we should be able to shard this or multi-purpose the pipeline.

In this particular organization, the root issue is that the asset management system is horrible. Because the quality of that system is bad you have dozens of teams working around pain points with hacks. The solution is not to build asset management pipelines, the solution is to recognize how important this asset management system is and invest in it appropriately. 

Every asset management pipeline this organization built was a waste of effort that should have been invested at a higher level. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software

Peak to Trough

The importance of auto-scaling 

peak to trough traffic

The cloud enables us to acquire hardware on demand for our services. I have never had to rack a server or worry about hardware failure. My entire software career has been in the cloud. As an industry most of us don’t need to worry about forecasting hardware requirements months in advance. We just increase the number of virtual machines we need in the PAAS dashboard. 

This week I was investigating some unusually large peaks in our daily traffic. I was changing the bounds and timeline of the graph and noticed that we had a 10x difference peak to trough. Usage peaks for about 2 hours each day at 10x trough, about 6 hours are also peak but at 5x trough. At night our traffic drops significantly because our users are sleeping. 

My current team, like all teams I have worked with in my five year career, does not use auto-scaling. We experimented with it last year but had issues with auto-scaling interfering with our deployments in unpredictable ways. 

So we scale for our instantaneous peak of 10x our lowest traffic around 2am. Meaning we use at least 5x as much hardware as necessary. 

The drawing underestimates the impact of the instantaneous peaks which essentially double the traffic to this service. 

Auto-Scaling would be a great fit for this service. Most cloud platforms have supported this use case for years and would result in decent savings. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software

Story development

It’s a common statement that once you are a senior engineer you don’t get to code anymore. It’s not that senior engineers are forbidden from coding, it’s still on the job description.

But senior engineers get pulled into so many tasks they rarely have time for coding. 

A senior engineers might get pulled into a critical outage, a roadmap meeting, defending architectural boundaries from other teams, assisting team members with their tasks, reviewing code, coordinating large projects with other teams. 

None of those tasks involve coding on the part of a senior engineer. And none of those tasks involve story development.

Story development is the process of taking feature requests and refining them into technical tasks.

Unless your team is stacked with experienced engineers or in a realm with little domain knowledge, story development will fall on the senior engineer.

Maintaining a ‘sprint ready’ backlog for a team of 10 engineers takes more than a 1 hour meeting once a week.

My philosophy is that, as the senior person, I should prioritize the tasks that allow the other nine people on the team to work efficiently. If the backlog is full of two sentence feature requests, the next sprint is going to be full of junior engineers figuring out the requirements. 

Don’t ignore the backlog to fight fires. Figure out what it will take to empower the non-senior part of the team to fight the fires. Then you can focus on the hire value tasks. Building the roadmap, evolving the architecture and developing stories. 

People have given up on performance in favor of Scalability

Scalability has been all the rage since the cloud made horizontal scaling easy. No longer do we have to order parts, lease colo space or rack servers. Instead there is an infinite supply of Virtual Machines out there we can rent at the press of a button. Because of this there is a tendency to start development with an architecture that will scale well horizontally. My entire career has been during the post AWS period. Pre-mature optimization is the root of all evil, but make sure to create a stateless service so we can scale it up later when its slow.

Web Servers

Its interesting to look at examples of projects that did not focus on scaling horizontally.

For example we have stackexchange’s public numbers on their performance. 

https://stackexchange.com/performance

They claim that they handle up to 450 requests/s on 9 servers. From the infographic it looks like these are 1U or 2U servers with 64GB of RAM and although its unspecified I’m guessing they have 12-24 physical cores per machine. 

These machines have around 10 times as much RAM as the VMs my team runs in production and probably over 10x the cpu performance. They handle more traffic per server with lower CPU utilization. A rough estimate from these numbers is that the stackexchange .NET service is 2.5x to 10x as performant as my Java service. That could just be the bare metal vs Virtual Machine cost since our stack has significantly less CPU. 

You might think that stackexchange is operating at an absurdly low CPU utilization at 5%, but I haven’t seen anyone operating cloud servers above 20% utilization with a sample size of 4 companies. 

Big Data

This study was done comparing single threaded performance on a modern CPU vs distributed big data algorithms. 

Single thread outperformed distributed big data computations on many (most?) problems. 

https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf

They found that optimized single threaded code outperformed distributed code in the datasets they tested. Admittedly, not all datasets will fit on a single machine. But we have to remember a single machine can now have over a TB of RAM and 100s or more TB of SSD. Single threaded performance is clocking over 5GHZ now. A single server can handle all your big data needs until your dataset exceeds dozens of Terabytes. 

I’m working on learning awk to experiment in this area. It is a relatively simple domain specific language for text processing and formatting. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software


Pros and cons of working at a consulting company

It seems like we get lots of questions on the subreddit r/cscareerquestions about working at a consulting company. I worked at a small company that did a mix of staff augmentation, application development, devops, and agile process consulting. Here are some thoughts on the pros and cons.

Pros

  • Work on lots of projects and technologies 
  • Willingness to train you on the job
  • Teams rotate frequently – get to know lots of people 
  • Don’t have to worry about legacy code as much
  • No Oncall rotation 

Cons

  • Don’t run code in production 
  • Travel a lot
  • Don’t own your code or service 
  • Experience is shallow and short term

I worked at a software consulting company for about four years right after college. I had the choice between the consulting company that did work in the cloud or a company that made a medical software system, I went for the cloud. The company had around 100 employees with some shifts up and down. As far as work, we did the whole gamut from staff augmentation, agile training, ‘digital transformation’, migrated people to the cloud, built platforms as a service, and more. 

During my consulting days, I worked on a ton of projects for different companies, wrote code in different languages, and traveled a bit. For the first two years I worked on a variety of java projects for a cable company. We did around 90% of our business with that company at the time. Working with cable company employees is kind of a drag, they were focused more on ‘resting and vesting’ and frequently missed deadlines. 

My next project was with a new client in Boston and we started flying in. I was tapped for staff augmentation to help the client create a devops team. They ran their own datacenter and mainly had system administrators. I worked with this team on typical infrastructure automation as we set up a Platform as a service for the companies projects on AWS. I learned a lot at this client, managed to insult one of their directors and almost got fired, and that I didn’t particularly enjoy devops. 

Next I worked on a few Machine Learning proof of concept projects for a client and became a certified kubernetes administrator. My former employer invested heavily into Kubernetes and it paid off with press and connections. My last year I spent as a team lead running development of an internal project that we hoped to resell to clients. That project is what finally motivated me to find a new job.

I worked around 40 hours a week except when I was onsite with a client when it was more like 60 hours. Getting onsite with clients is actually pretty exciting, and I didn’t have to do it too often. I traveled around once a month for one of the years at this job. 

Pros

Work on lots of projects and technologies 

I worked at a consulting firm early in my career and getting experience with lots of technologies and experiencing different company cultures really helped. I have a much better idea of the industry after working with 10 companies in 4 years than if I had worked at one company for that time. 

Willingness to train you on the job

My employer was extremely willing to train people on the job. Part of our business model was billing out cheap junior developers with just enough supervision to get things done. The company paid for me to get certifications in Cassandra and Kubernetes. 

Teams rotate frequently – get to know lots of people 

My longest project lasted 9 months. In consulting once a project ends, the team disbands and you start up on a new project with a new team. Over time you get to work with almost everyone in the company directly. You will also work with lots of different clients and need to build rapport and a good working relationship quickly.

Don’t have to worry about legacy code as much

I didn’t work with a lot of legacy code as a consultant. We were mainly brought in to help launch new projects. A six month project really doesn’t generate that much legacy code. Then once times up, you move on to another project. 

No Oncall rotation 

As a consultant I never had an oncall rotation. We didn’t have any services of our own. 

Cons 

Don’t run code in production 

One of my greatest frustrations as a consultant was that I didn’t get any experience running code in production during that job. We wrote the code then handed everything off to our clients operations teams. I think this made it harder for me to find a new job. Since I started running my own code in production I have learned a ton.

Travel a lot

I didn’t have to travel that much as a consultant. My best year I did around 10k miles. Of course as you move up the ladder, you will end up traveling more. 

Don’t own your code or service 

As a consultant you are working for another company. Their teams will own the code. Their Product department makes the calls. You will often just get the requirements and be shaking your head at how wrong they are. Decisions will be made by the client that you don’t like. But the client will get what they want 99% of the time. 

Experience is shallow and short term

The flip side of working on such short projects an so many different technologies is that you never get to master any of them. One project is in react the next project is in Angular, the next project is devops, with totally different tooling to learn for each of them. Consulting is not great for building deep knowledge as a software engineer.