Software Leviathans and the weird dominance of good enough.

One day in Spring 1989, I was sitting out on the Lucid porch with some of the hackers, and someone asked me why I thought people believed C and Unix were better than Lisp. I jokingly answered, “because, well, worse is better.” We laughed over it for a while as I tried to make up an argument for why something clearly lousy could be good.

https://www.dreamsongs.com/WorseIsBetter.html

It has long been wondered why Java took the crown for the ‘enterprise’ language. I can’t really argue on that topic since I came onto the scene long after Java was all there was. This article is about why software leviathans are written in Java more than anything else. 

You have a huge software project to build. What language do you build it in? The prototype was written in ruby on rails by one guy and an Adderal prescription. Now they want you to scale this thing to a 1000+ engineers over 5 years of development. You might think “aha this is my chance, lets save an order of magnitude lines of code and use lisp”, except this story happened in the past and they chose Java. 

Why is it always Java? Sure it’s reasonably fast, but Facebook made PHP work, can’t we at least use Haskell? Since we have the benefit of hindsight, we know that most of the biggest software systems are built in Java. Google built so many leviathans in Java that they bankrolled a new language like Java but with less features. Amazon is based on Java. Netflix is java again. Facebook made their own language and Microsoft is old enough to have existed before Java, but still made their own version of Java, C#. 

The real question should be, “What is Java’s secret?”. 

Java requires a lot of boiler plate

Java just plays well with the major constraints in a software leviathan and at Leviathan scale that is all that matters. 

This one is the corollary of “Java doesn’t support meta-programming”. Creating your own DSL is great, 1000 engineers creating their own DSL is 999 nightmares. Software Leviathans are too big for any one engineering team to understand. Any DSL you create makes your code unintelligible to the rest of the people working in hell with you. I can understand boilerplate written by a monkey, but a DSL written by another software engineer could take me days to understand. When your team gets poached to go work on a startup where the code base isn’t humongous, it’s a lot easier to bring in Java programmers to replace you lot than it would be to get Haskell engineers to figure out your undocumented dialect. 

Google got to the point where they figured Java had too much meta-programming ability so they created Go which is basically Java without inheritance. That is what happens when you work in a leviathan project. You begin to resent the ability of your peers to do anything unusual, because you know it’s just going to be more work for you. 

Adding more onboarding time to understand 1) the functional language and 2) the DSL your team created might push our already long 6 month on-boarding period closer to the 1 year mark. I wrote an article about onboarding time and functional languages aimed at startups, but honestly I don’t think the hiring market is the real reason Java dominates the top end. FAANG is already willing to train new grads to work on their giant software projects. 

It boils down to comprehension honestly. Humans can only comprehend so many things and at leviathan scale the max is a tiny fraction of the entire system. 

In a software leviathan your team constantly works with other teams’ systems. How does this API work? There isn’t any documentation and one 30 minute office hours isn’t going to explain that hair ball. If you all use the same language and that language is Java there is a chance you can open up their code base and figure out what is going on. They probably didn’t do anything you wouldn’t expect like pre-allocating all of their memory and storing all objects into a ring buffer. But if they did do something crazy you can probably figure it out. Besides Java doesn’t have anything like Scalaz so you won’t be surprised by a functor where you weren’t expecting it. 

Lets take the opposite side, away team work. You have been given the glorious task of implementing a new feature. But it’s impossible to do it cleanly without an API change in another team’s system. That team fully supports the change and has contributed 2 paragraphs to your architecture document describing the change to make in their system. But the change isn’t on their team’s roadmap so you are going to have to do it. 

Getting their service to run and pass integration tests in your virtual development machine takes a week. Now you need to navigate their system where they have conveniently used dependency injection to ensure that you can’t know which of the 5 implementations of this interface is in play. Do you still wish the other team could use Clojure? You might never figure out the DSL. 

Have you ever looked at somebody else’s Lisp code and wondered what was inside the variables? Now imagine this is your job and you will spend the next month making a 200 line change to a 100,000 line of code API service you didn’t know existed until this week. Except this will happen every quarter for the rest of your career. 

People complain about how Java forces you to write the type of things everywhere, but for software leviathans this is a benefit. I can see helpful type signatures everywhere, whether I’m reading your code in my IDE, an email, an excerpt in an arch doc, or in a Slack message you sent me at 3 am. 

Java and Go are great in Software Leviathans. You don’t have to worry about stumbling upon a programming mystery created 10 years ago by a disgruntled new grad. You can expect a consistent syntax and language whichever microservice you are working on. The code has self-documenting types that are ‘easy’ to understand. Honestly, they are a lot of benefits which make a tough coding environment a little more manageable.

Software Leviathans

Dis-economies of scale, why FAANG pays high salaries, the dominance of Java

The top end of software engineering jobs are dominated by what I’ve started thinking of as ‘Software Leviathans’, large software systems that are staffed by thousands of engineers. A few that come to mind are Amazon Alexa, Amazon.com, Google Search, Salesforce, Facebook.com. These are not “monoliths’ or large services that do everything. Instead they are the result of combining 100s of smaller ‘micro-services’ into one massive software product. 

These leviathans do many many things, few people on the planet can claim to know all of the features of facebook.com. It is quite possible that there exists no single list that enumerates every feature in that product. 

Similarly, development on these systems happens in parallel across many teams. It it is essentially impossible for any one person to keep track of everything that is being added to the system. 

Leviathans are too big for anyone to understand. It doesn’t matter what architecture or runtime choices are made. It could be one massive JVM, a million lambda functions, a hundred thousand docker containers or thousands of micro-services. Even if you work on the leviathan, you won’t have any real understanding of the total state of the system. Each engineer will be aware of and communicate with a tiny fraction of the total number of people working inside the leviathan. 

Leviathans are heterogeneous systems. The do not do ‘one thing well’. Leviathans do everything you can think of. Google.com is a search engine, but it’s also a calculator, an advertising system, a web scraper, a hotel booking tool, a flight booking tool, and many more. Leviathans grow in parallel, across myriad tentacles of functionality. New features emerge all the time usually to the surprise of other engineers on the project. 

Leviathans are difficult to work in. Despite appearing to be a sea of constant change from the outside. Any change made inside the Leviathan is extremely expensive in engineering hours. There are thousands of potential interactions each engineering team has to consider when evaluating changes to their system. The architecture must be constrained heavily to support parallel development in environments where coordination between different teams is impossible due to scale. Engineers working on a software leviathan spend a relatively small fraction of their time actually writing code as compared to debugging issues, research, coordinating changes, and documenting. 

Leviathans are interesting because they are the ‘core’ services powering the digital world these days. Their scale is at top of the chart in the software engineering world and as a result they expose the limitations of software engineering. 

Software diseconomies of scale are at their most evident in these software leviathans. They are massive projects with huge numbers of the best engineers working on them. But development is slow per engineer and code quality is not clearly superior to industry best practices. 

Why I stopped going on twitter, using time tracking apps to monitor your time with Qbserve

I’ve been an avid twitter user for years, but had to stop this winter. I have been listening to ‘Deep Work’ while driving cross country and have done a lot of thinking about how to do better work. One of the things recommended in the book is to quit social media or at least exclude it from the part of your day when you work. I’ve typically just blocked twitter from my network during the workday then used it as much as I wanted afterwards. 

Well another thing I did in the pursuit of ‘deep work’ is to review my Qbserve stats for the last few months. My twitter numbers were way higher than I expected. I have been spending thousands of dollars worth of time using Twitter producing fun content that twitter then monetizes. I could have gotten a part time job or learned to paint. 

Track your time. There are a bunch of apps that can do it. I use Qbserve because it stores data locally and felt like a less heavy weight solution. I have also used RescueTime, but found logging in again when I need to restart tracking to be a pain. 

Once you have tracking going it gives you a lot of insight into what you are doing on your computer. Some people might think “ah, if I’m on the computer I’m working, what else would I use it for” but for millennials and digital natives who spend most of their lives on a computer it can really help. 

For example I know how much clock time I spent reading Xianxia, translated chinese pulp fiction, on wuxiaworld.co this year, four whole days. That is nearly double the amount of time I spent on news.ycombinator.com which came in at 1 day and 13 hours. I also know how much time I spent writing, note taking and journaling this year, around 30 hours so far. Admittedly, I haven’t run the app 24/7 and didn’t start until March so I only have around 8~ months worth of data.

I don’t think I would have made the realization of how much time I was spending on twitter, without a time tracking app. It is a lot like Television for normal people, it is just on all the time when you are home, you don’t really think about it’s effects on your life. Most people underestimate how much time they spend watching television, but you don’t have to underestimate how much time you spend on Youtube, just get Qbserve and review the data occasionally. 

In the week or so since I quit, I’ve already read a couple books and started writing on my blog again. 

Burnout

I’ve dealt with burnout many times in my 6 years as a software engineer. Usually, it’s when I get bored of a project or there is a slow period where I don’t have a lot of work to do. Counter to what you would expect having less work makes it harder to get that work done than when there is more to do. I think it is because when there is less of it, the work feels less important and subconsciously it feels like I’m not really needed. Like when you are in a meeting trying to estimate the impact of an issue with the whole team, but only three out of ten people are actually able to do anything before people are duplicating work. Then you end up waiting for other people to do basic things like read logs and tell you what they say. You could just read them yourself, but do we need 3 people reviewing the same logs right now?

I intentionally stayed on my current team for a relatively long period of time just to see what it was like. Earlier in my career I worked as a consultant where my longest time on one project was 9 months. Here at a product company we have been working on essentially the same problems for years. This is great in a way because I have been able to develop deep expertise in my systems and tooling, but the cost is of course burnout. 

The pandemic has made this year significantly worse by forcing remote work. I’ve lived in studio apartments since college and rely on having an office to provide a distinction between working and other activities. Efficiencies, like eating at my desk, which make sense normally, serve to muddle work and play when everything happens in a 500sqft box. 

Having everything muddled together makes it much harder to maintain flow. The absence of which makes everything more difficult. Especially, when your general happiness is influenced by your self-perceived productivity and usefulness as mine is. A large reason for my career success so far is how I maintain focus in the office. I don’t let myself do certain activities in the office like use Facebook, Reddit, Twitter or almost anything non-work related. Figuring out how to extend those norms to a single room lifestyle has been very difficult. 

I haven’t been able to wait out burnout. In the past a team or job change alleviated the problem. This year it just got worse and worse by the end of my stay in Seattle I didn’t want to fix it.  

The good news is that I left Seattle, living there has never felt right to me, the winters are horrible. In the short term I will be itinerant, but eventually I will acquire a new permanent space which will be larger. I’m hoping to move into a house or condo, but might end up in a one or two bedroom apartment with a dedicated office.

Fixing one part of my life that I knew I didn’t like has helped. It hasn’t fixed everything, but I’ve had a lot of time to think of ways to improve my working situation which I think will pay off.  

Links Post October

Here are some links I have seen lately that were pretty interesting.

In window noise canceling speakers

In window noise canceling speakers.

They will integrate these speakers into windows/walls and make it smaller. Increasing the quality of life in dense cities. It is something thats needed since cities just keep getting larger.

Concept of ‘feature store’ for typed ML model inputs (tensors, vectors, etc)

https://www.logicalclocks.com/blog/feature-store-vs-data-warehouse

VM performance tests

Finally we have a pair of great posts from tratt.net about VM warmup with lots of data.

https://tratt.net/laurie/blog/entries/why_arent_more_users_more_happy_with_our_vms_part_1.html

https://tratt.net/laurie/blog/entries/why_arent_more_users_more_happy_with_our_vms_part_2.html