Designing your development environment

What should your development environment look like? People talk about how its hard to setup development environments, this or that component are tricky etc. 

But what components should you actually have? For a backend API server in a major language you will probably have an application that serves requests while writing to a database, logging and emitting metrics. 

If you follow best practices you will have unit tests, a deployment pipeline, integration and end-to-end tests. All that stuff is great, but what about the development environment. 

You should have a best in class IDE. Whether that is Visual Studio, IntelliJ or Emacs make sure you have the core features syntax highlighting, go-to-definition, safe renaming, and a debugger. 

You want some way to run integration tests on a local developer’s machine. In my workplace we ssh into VMs which we can run the entire development stack on. If you can manage that without the VM, do it, its 10x better. 

You want to be able to run a remote debugger against a fully running version of your application. Ideally, you should be able to test manually end-to-end against a version of your application running on your local machine. 

If you have only one service this is easy. If your Microservice is 1 of a 100, making that happen is tricker but worth it. 


Code coverage should be verified during your build. 

Code style linters should automatically apply stying fixes. The build should never break over styling issues. 

Builds should be FAST. Every minute of build time can safely be assumed to result in wasted developer time. The ideal build time is under 15 seconds, including unit tests. 

The longer your build time the more distractions will edge into your workflow. If builds take a minute or two devs will click to look at Slack or their browser. If builds take over 5 minutes devs will be talking to their coworkers and getting distracted. If builds take 15+ minutes its bad. That means less than 4 code changes can be verified per hour. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software

Why do we have unit tests for things that can never happen

So you finally have established good development practices for your team. Linters enforce code coverage of 90% and everyone is adding tests. Month over month test count increases and obviously code quality is increasing as well…

Until a partner contributing code to your project notices a test that validates something that can never happen. This test in particular validates the behavior of an endpoint when a particular input is null. The issue is that our platform ensures this input is never null, any requests to our service without that input will be a platform error, and never make a request to our API. 

Our slack chat room had a lengthy discussion on why we have that unit test, but the answer is code coverage. Someone added it because code coverage declared that code path to be untested. 

Can we delete this test? Probably not without writing new tests for other functionality or reducing our coverage requirement. 

One option here is to add a Spring MethodValidationInterceptor which validates ‘base’ inputs to our service and then write tests for that. This approach moves responsibility for ‘things that can never happen’ to a centralized class which can be tested. Then we know what is prevented from happening within the domain of our unit tests. 

But at the platform level this really will never happen. Do we really need every service in our dependency tree to validate this particular case and throw an exception? 

Never get in the way of people with the momentum to get things done

Most of us have productivity swings. Some weeks you are energized and getting things done. Then the next project just seems to drag on and on. Slumps and burnout happen. But usually at least one person on the team still has some momentum going. 

Don’t slow them down. Do what you can to assist in their effort. Don’t think you have to operate as effectively as they are this week. It can be intimidating to see another person delivering more value than you. Don’t hit yourself thinking “If only I was more motivated I could outperform Jack.” 

Software engineering is not a coding competition. Success or failure is evaluated at the product level by our customers. Which member of your team delivered the most story points this sprint is complete irrelevant to customers. They don’t know and wouldn’t even care if they did. 

Customers care about what your engineering organization does as a whole. Which features are shipped, how stable is the service, how good is the user onboarding, these things matter to customers. You aren’t signing each line of code for customers to look at. They don’t have a favorite programmer on the product. 

Your goal is to enhance your team’s efforts as much as possible. That is achieved by following the momentum and supporting it, to get things done . 

Everyone uses (failing) software all the time.

Because you use it all the time at least one piece of software is broken for you at all times.

I stopped using Facebook after my freshman year of college, but recently got pulled back in by a Facebook group. As a result I now have the pleasure of enjoying a 10+ second loading phase every time I open the homepage. 

Recently, I tried to buy a CODE mechanical keyboard on the wasdkeyboards.com website. But every time I submitted my order it failed. I tried different browsers. I had to look into the console to find out that a http request was failing to find a paypal advertising domain that my PiHole blocks on the network. To buy my keyboard I had to tether wifi from my smartphone. A non-technical user wouldn’t have been able to find out why the order failed because there was no error message. There was a spinning symbol that just disappeared after a while without a message to the user. 

Everyone uses software all the time now. We have smartphones, smart TVs, smart refrigerators and smart homes. If you use 100 programs a day, 99% uptime means one program is down for every person. If every application manages 99.9% uptime, one out of a hundred people is experiencing software brokenness everyday. 

Then realize that billions of people have smartphones now. 

99.99% * 1,000,000,000 = 100,000. 

If your software has a billion users and works 99.99% of the time, its down for 100,000 people all the time. 

The top cause of outages is changing code.

The last week of my oncall shift has been pretty quite. The holiday season has elevated traffic around 50% higher than normal, but I haven’t really noticed. There haven’t been any service outages and it almost feels like I’m not really oncall. Why has it been so quite? We haven’t deployed any code for two weeks. Not deploying code means we aren’t deploying any bugs to production. 

After a deployment you will probably notice any defects over the next couple days. Once you have fixed those it is smooth sailing for that version. 

Continuous Deployment makes it easy to deploy bugs 10x a day. Agile gives you a justification to deploy 10x a day. Ask yourself, what are you deploying each day? A CSS change to a button? A new option in a drop-down. A re-write of the graphing functionality because no one can understand the current implementation? A new feature like Google Docs integration?

If you could only deploy 1 feature each week what would make the cut?