Business Intelligence what is it?

Business Intelligence is the process of exposing the results of customer workflows in an easy to consume fashion for Business Analysts, Accountants and Executives. 

When your product is software that your customers’ use, it can be hard to know how customers are using it in the large. For BI you log whatever the customer did in a format that is easy to query via SQL. For purchasing flow you log.

{
    "customerId": "ABC",
    "itemId": 1234,
    "time": "12:34 AM",
    "price": "$13.87",
    ……
    "sessionId": 987
}

Then analysts can figure out usage patterns, discover and fix bad customer experiences and generally figure out what customers are doing. 

How is this different from normal logging? 

Programmers care about exceptions, lines of code and values of variables. The business cares where the customer was in the workflow, what did the customer click on, what happened after that. If there was a silent retry that the customer didn’t notice the business analysts don’t care. 

They care about what customers are doing and how the software responds more so than what the software is doing. 

BI logging is typically preserved durably over at least months and needs to adhere to a schema that makes aggregating data together easy later on. This can be achieved with normal structured logging or by adding a separate code flow just for ‘Business’ logs. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software


Multiple promotions for solving the same problem

One failure mode I’ve seen in software organizations is multiple promotions for building the same solution to a problem. Suppose, in your software organization you have 10 teams working in a similar problem space. In one year you see two promotions for building an asset management pipeline. When I saw the second promotion announcement, I was thinking to myself “Wait, didn’t Tyler get promoted for building an asset management pipeline?”

Any Vice-President who’s organization has this issue should be thinking hard about where it went wrong. That engineering culture is completely broken and can only be fixed by rooting out the leadership and then replacing senior engineers. 

Why is this such a big deal? Because its a symptom of several serious problems in your organization.

No information sharing between teams

One team built a solution to this problem in Q1, 6 months later another team built another solution to that problem. Why couldn’t they have shared asset management pipelines? If its a valuable thing to have why did one team go 6 months without an asset management pipeline?

Promotions are being gamed

Two people being promoted for building similar solutions to the same problem is a sign that your promotion process is being gamed. Redundant projects being lauded as keystone accomplishments is ridiculous. The manager of the 2nd team should have at least caught that this project was already a keystone. 

Important shared infrastructure is being ignored

If two teams are building the same infrastructure to solve similar problems it should be a shared service. Otherwise, you are paying to build twice and to support the system twice. This is software we should be able to shard this or multi-purpose the pipeline.

In this particular organization, the root issue is that the asset management system is horrible. Because the quality of that system is bad you have dozens of teams working around pain points with hacks. The solution is not to build asset management pipelines, the solution is to recognize how important this asset management system is and invest in it appropriately. 

Every asset management pipeline this organization built was a waste of effort that should have been invested at a higher level. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software

Peak to Trough

The importance of auto-scaling 

peak to trough traffic

The cloud enables us to acquire hardware on demand for our services. I have never had to rack a server or worry about hardware failure. My entire software career has been in the cloud. As an industry most of us don’t need to worry about forecasting hardware requirements months in advance. We just increase the number of virtual machines we need in the PAAS dashboard. 

This week I was investigating some unusually large peaks in our daily traffic. I was changing the bounds and timeline of the graph and noticed that we had a 10x difference peak to trough. Usage peaks for about 2 hours each day at 10x trough, about 6 hours are also peak but at 5x trough. At night our traffic drops significantly because our users are sleeping. 

My current team, like all teams I have worked with in my five year career, does not use auto-scaling. We experimented with it last year but had issues with auto-scaling interfering with our deployments in unpredictable ways. 

So we scale for our instantaneous peak of 10x our lowest traffic around 2am. Meaning we use at least 5x as much hardware as necessary. 

The drawing underestimates the impact of the instantaneous peaks which essentially double the traffic to this service. 

Auto-Scaling would be a great fit for this service. Most cloud platforms have supported this use case for years and would result in decent savings. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software

Designing your development environment

What should your development environment look like? People talk about how its hard to setup development environments, this or that component are tricky etc. 

But what components should you actually have? For a backend API server in a major language you will probably have an application that serves requests while writing to a database, logging and emitting metrics. 

If you follow best practices you will have unit tests, a deployment pipeline, integration and end-to-end tests. All that stuff is great, but what about the development environment. 

You should have a best in class IDE. Whether that is Visual Studio, IntelliJ or Emacs make sure you have the core features syntax highlighting, go-to-definition, safe renaming, and a debugger. 

You want some way to run integration tests on a local developer’s machine. In my workplace we ssh into VMs which we can run the entire development stack on. If you can manage that without the VM, do it, its 10x better. 

You want to be able to run a remote debugger against a fully running version of your application. Ideally, you should be able to test manually end-to-end against a version of your application running on your local machine. 

If you have only one service this is easy. If your Microservice is 1 of a 100, making that happen is tricker but worth it. 


Code coverage should be verified during your build. 

Code style linters should automatically apply stying fixes. The build should never break over styling issues. 

Builds should be FAST. Every minute of build time can safely be assumed to result in wasted developer time. The ideal build time is under 15 seconds, including unit tests. 

The longer your build time the more distractions will edge into your workflow. If builds take a minute or two devs will click to look at Slack or their browser. If builds take over 5 minutes devs will be talking to their coworkers and getting distracted. If builds take 15+ minutes its bad. That means less than 4 code changes can be verified per hour. 

If you like my writing, please buy my book on Amazon.
The Sledgeworx Guide to Getting into Software

Why do we have unit tests for things that can never happen

So you finally have established good development practices for your team. Linters enforce code coverage of 90% and everyone is adding tests. Month over month test count increases and obviously code quality is increasing as well…

Until a partner contributing code to your project notices a test that validates something that can never happen. This test in particular validates the behavior of an endpoint when a particular input is null. The issue is that our platform ensures this input is never null, any requests to our service without that input will be a platform error, and never make a request to our API. 

Our slack chat room had a lengthy discussion on why we have that unit test, but the answer is code coverage. Someone added it because code coverage declared that code path to be untested. 

Can we delete this test? Probably not without writing new tests for other functionality or reducing our coverage requirement. 

One option here is to add a Spring MethodValidationInterceptor which validates ‘base’ inputs to our service and then write tests for that. This approach moves responsibility for ‘things that can never happen’ to a centralized class which can be tested. Then we know what is prevented from happening within the domain of our unit tests. 

But at the platform level this really will never happen. Do we really need every service in our dependency tree to validate this particular case and throw an exception?